Extracting Dates from Text Strings

Worksheet calculations involving date and time data generate a lot of questions to our Tech Support department. Origin can detect some basic date format during import but in many cases data are imported as text instead of date. If data is left-aligned in a column, it’s treated as text, such as the 1st column in worksheet below. But Origin provides a collection of built-in string and date functions to convert text strings to date and do calculations for further analysis and graphing.

scv_end

In this blog post we will:

  • parse a date string
  • convert string to date and display it as date
  • perform a simple calculation with date data

If you would like to work through this with me, download this zip file

Please note: this blog is written before Origin 2017 is released so it used our traditional col(A) syntax to refer to column A in formula. Origin 2017 introduced new spreadsheet cell notation to directly using column short name A in formula. Please see Origin2017_new_notation.ogw in the zip file to see how easy it is in Origin 2017.

scv_2017

 

If you have Origin 2016 or earlier, please open scv_date_range_str.ogw in Origin to follow the steps below.

This is what you should see:

 Origin worksheet with date string

The values in column A show a range of dates separated by the hyphen (“-“). Though they look like date, they are actually being treated as text string. So you can’t direcly get the differences of days in that data range. We need to first do some manipulating of the string in column A to extract the two dates separated by the hyphen (“-“).

  1. From the menu, choose Column: Add New Columns and add 2 more columns. You should now have 4 columns.
  2. Next, Enter “From Date” in column B’s long name. Enter formula in F(x) cell of column B to get the From dates of data range.
Left(col(A), Find(col(A),"-"))

Looking closely at this simple line of code, you will see that we have called two string functions, Find and Left:

  • The Find function searches a string for a second string and returns the position from the first character of the found string. We search the string in column A for the hyphen (“-“) character and the function returns an integer value N equaling the number of characters up-to-and-including the hyphen.
  • The Left function takes the N value returned by Find() and returns the left-most N characters from our date string. In other words, what is returned by Find() is treated as an argument of Left(), which in turn, returns its own value.

You can find more string functions from Origin’s extensive library of built-in functions

At this point, we notice we have a slight problem in that our expression returns the “-” character at the end of first half of our date string. We can fix that by modifying the Find() portion of our expression to subtract 1 from the value returned by Find(). Thus, our expression becomes …

Left(col(A), Find(col(A),"-")-1)

… and we now find that the first portion of our date string has been extracted cleanly.

2329_SCV_parse_string1-425x270

To extract the second portion of our date string, add column C and do something similar:

  1.  Click in the F(x)= cell of column C and enter the following:
Mid(col(A), Find(col(A),"-")+1)

This time we have replaced the Left function with the Mid function:

  • The Mid function takes a string and returns M characters starting from position N in the string. The argument M is optional and if M is not specified, Mid() returns everything to the end of the string. This is precisely what we would like to do, so we do not specify M.
  • The Find function, as before, is used to find the position of the hyphen (“-“) but this time, we need to start our count not from the hyphen, but from the next character (the actual starting point of the second date in our date string), so we add 1 to the N returned by Find() and pass this value to Mid().

Our worksheet should now look like this:

2329_SCV_parse_string2-620x281

Column B and C are still being treated as text strings (left-aligned). To rectify this, enter col(C)-col(B) in F(x) cell of column D. It failed to calculated differences of days between dates in column C and B.

scv_1

We can use Date function in Origin to convert a date-time string and returns a Julian-date value, used to perform mathematical operations on date and time data. See the updated formula and calculated differences of days in column D. Also note values in column B and C are right aligned to indicate they are not strings.

scv_2

To show the correct display of Dates in column B and C, select column B and C header to select both. Choose Format: Column… to open the Column Properties dialog. On Properties tab, change Format to Date. Default Display is MM/dd/yyyy (this actually displays as today’s date in your interface). You can choose other display if needed. Click OK, you will see

scv_end

 

We could end our discussion here but it is worth pointing out that there is a more elegant solution to this problem. Since our goal was simply to return the difference in days between two dates, we could have combined all of our parsing and math operations into a single column formula and saved ourselves a couple of steps. Let’s see how this is done:

  1. Go to the File menu and choose Recent Books and the file scv_date_range_str.ogw should there at the top of the list.
  2. Click the file name and a new instance of that window is opened.
  3. In column B, in the F(x)= cell, enter the following formula:
date(Mid(col(a),find(col(a),"-")+1)) -
date(Left(col(a),find(col(a),"-")-1))

Exit editing mode by clicking outside the cell and you will see that the same operation that we labored over earlier has been performed on the date string in column A, using just a single formula. How is this possible?

scv_date

If you look at the formula that you entered into column B, you should recognize elements of the formulas that we constructed earlier to parse and subtract our dates, but with a couple of differences:

  • We have nested the formulas that we arrived at earlier, using Find(), Left(), and Mid(), inside another function — the Date function.
  • If your string isn’t in format of “M/d/yyyy”, then you will need to specify the format in date() function. E.g.
    date(Thu Sep 24 09,"ddd MMM dd yy")

So there you have it: Using Origin’s Set Values feature and its extensive library of built-in functions, we were able to extract dates from a column of text strings, convert those extracted strings to Julian-date values and subtract the Julian dates to give us the number of days that had elapsed between two dates.

That’s about all we have time for. If you are interested in learning more about the primary features discussed in this post, see the following topics in our online documentation center:

Thanks for your time. I hope this has been helpful!

Leave a Reply

Your email address will not be published. Required fields are marked *