Accessing Metadata using Python

Introduction:

Nowadays the data file generated by scientific instruments usually contains the information such as the instrument model, operator name, measurement date, measurement parameters, etc. This information is often saved in the data file as header lines (meta data) and  it can be useful later on either as the dataset identifier or as parameters for data analysis. In this blog, I am going to show an example of using Python code to handle these header lines (meta data) during data import — save the header line (meta data) into the user-tree in worksheet and use it for analysis.(See figure below) Make sure you have installed Origin 2021b before proceeding. 

Details:

Please first download the zip file, decompress it to the location of your choosing. It contains three data files. The subfolder contains an import filter file (.oif) and a Python file (.py) which you will create by yourself following this blog.  

Note: If you open one of the data files in any text editor, you should see that the main header lines (see the figure above) contain the measurement information such as operator name, experiment data, etc. We will write Python code to parse these header lines and store it into the worksheet user-tree node.

Step1:

First I am going to create a Python-based import filter. Launch Origin and then open the Import Wizard (menu Data:Import From File:Import Wizard…) and follow these steps.

  1. Select F1.dat in the downloaded folder to import.
  2. Set Data Type to: User-Defined.
  3. Set Target Window to: Worksheet.
  4. Set Import Mode as Start New Sheets.
  5. Click Next button.

Do these things on the User Defined Filters page (second page) of the Import Wizard:

  1. Specify Python Code as the type of code used.
  2. Copy and paste the Python code below into the box to read the meta data from the first 4 lines and save to worksheet tree. 
  3. Click Next button.

import originpro as op
import pandas as pd
import os
 
def read_file(file): 
    # Get the active worksheet. 
    wks = op.find_sheet()
    
    with open(file, 'r') as f:        
        # Read the meta data from the first four lines and save to worksheet tree.
        for i in range(4):
            line = f.readline()
            ll = line.split(":")
            node = "tree.data."+ll[0].replace(' ', '_')
            wks.set_str(node, ll[1].strip())
        
        # Skip one line
        line = f.readline() 
        
        # Read the Longname line
        line = f.readline()
        cols = line.split() 
        
        # Read the Unit line
        line = f.readline()
        wks.set_labels([line], 'U')
        
        # Read in the data
        rows = []
        for line in f.readlines():
            rows.append(line.split())
 
    df = pd.DataFrame(data=rows, columns=cols)
    
    # Save data to worksheet
    wks.from_df(df)
    
    # Set column desigination
    wks.cols_axis('XYE') 
    
    # Set sheet name to file name
    wks.name=os.path.basename(file)
 
 
# The file chosen by Origin import filter is placed into the fname$
# LabTalk variable. Must bring it into Python.
fname = op.get_lt_str('fname')
data = read_file(fname)

Do these things on the Save Filters page (third page) of the Import Wizard:

  1. Check Save Filter.
  2. Save the Filter in the data file folder.
  3. Give the filter a file name, in this case “MyFilter”.
  4. Click Finish button. 

Date gets imported and the worksheet is renamed with file name. To check the metadata, you can left-click on the edge of the grey area on the worksheet and choose Show Organizer. Navigate to F1.dat node on the left panel. 

Step2:

Now go to the data folder and make sure the filter file is created with name MyFilter.oif.  Next I am going to import all three data files  under the folder with Multiple Files Connector tool.

  1. Create a new workbook in Origin, then select from menu Data: Connect Multiple Files…
  2. Set Data Connector to Import Filter. Origin will automatically locate the filter in the data folder. Set Source to Files in Specified Folder. Set as the files folder. Check the Same Book box. Set Column Designations to XYE. Click OK.

Three worksheets with imported data are created, each has a data connector icon on it connecting to individual data file. Turn on the Worksheet Organizer of each sheet, make sure the metadata of each data file is properly imported. 

Step3:

In this step, I am going to run a piece of Python code to perform logistic regression on all three datasets and create a result sheet. Each record contains the data file name, metadata and the fitted parameters. 

  1. Select menu Connectivity:Open Untitled.py…
  2. Copy and paste the code below into the code editor window and save as a new file with name for example CurveFit.py.
  3. Hit F5 to execute.
import originpro as op
import pandas as pd
import numpy as np
 
col1 = "Operator"
col2 = "Temperature"
cols = ["File", col1, col2, "A1", "A2", "x0", "p"]  
result = []
 
wb = op.find_book()
for wks in wb: 
    # extract the worksheet information
    oprter = wks.get_str("tree.data."+col1)
    tmpt = wks.get_float("tree.data."+col2)
 
    # perform curve fitting
    model = op.NLFit('Logistic')
    model.set_data(wks, 0, 1)
    model.fit()
    ret = model.result()    
    
    # save fitted result into dataframe rows. 
    result.append([wks.name, oprter, tmpt, ret['A1'], ret['A2'], ret['x0'], ret['p']])
    
df = pd.DataFrame(data=result, columns=cols)
 
wks = op.new_sheet(lname='result')
wks.from_df(df)

The data workbook and the result workbook are shown below:

 

Leave a Reply

Your email address will not be published. Required fields are marked *