Performing Principal Component Analysis for Chemometric Spectra Using Origin’s App

posted in: Apps, Data Analysis | 2

Introduction

For Origin 2017, a Principal Component Analysis for Spectroscopy App is available. The App is specifically designed to perform principal component analysis for spectra (IR, Fluorescence, UV-Vis, Raman, etc.).  In chemometric analysis, researchers want to know which variables (frequency, wavelength or time) are important in distinguishing samples by their spectra, which samples can be allocated to a group, and to detect outliers in these samples. This App can help resolve these issues.

Before using the Principal Component Analysis for Spectroscopy App, spectra data for samples must be arranged in a worksheet, each column representing a sample spectrum. Frequency, wavelength or time for the spectra can be in the X column. Sample names and group info for samples can be set in column headers of samples, e.g. Long Name or Comments in each Y column.

Example

This App provides a built-in sample. Once you install the App, right-click on the Principal Component Analysis for Spectroscopy icon in the Apps Gallery window, and choose Show Samples Folder from the short-cut menu. This opens the folder for the sample project file. Open the project file PCASpecEx.opj in the folder. You will see that it includes a Workbook and a Notes window. Book1, Sheet1 contains the input data, and the Notes window shows the input data’s source.

Input Data

The input data consists of 20 samples from 120 samples in the original source data. The 20 samples consist of 10 olive oil samples, 5 non-olive vegetable oil samples and 5 non-olive vegetable oils mixed with olive oil samples. The first column in the sheet (A(X)) holds time data for spectra. Other columns (B(Y) – (U(Y)) are spectra data, and group info for 20 samples are saved in Comments for each Y column. When plotted as a line plot, the spectra for 20 samples look like the following:

pcaspecdata

Steps

  1. Open the sample project file PCASpecEx.opj. Click the Principal Component Analysis for Spectroscopy icon in the Apps Gallery window to open the dialog.
  2. In the dialog’s Input tab, select the (first) X column in Sheet1 as Frequency/Wavelength. Select the other Y columns as Spectra Data. Set Long Name for Spectra Names, and use Comments as Group Info.pcaspecinput
  3. In the Settings tab, choose the Covariance Matrix for Analyze option. If the Correlation Matrix option is chosen, each row for 20 samples would be normalized.pcaspecset
  4. In the Plots tab, choose Sample 6 for the Reference Spectrum in Loading with Reference Spectrum Plot, and check the Loading Plot and Score Plot options.pcaspecplot
  5. Click the OK button and a report sheet, a result sheet and a plot data sheet are created.

Results

  1. Looking at the Report Sheet, the Eigenvalues table shows that the first four principal components explained 96% of total variance.pcaspecrep
  2. In the Loading with Reference Spectrum Plot (note that you can double-click on the plot to pop up the embedded graph), the first layer shows the sixth (reference) sample’s spectrum; the second layer illustrates loading for the first component; and the third layer for the second component. The graph below shows that times 7.95 and 8.47 are important variables in PC1, while times 3.96 and 5.92 have more influence in PC2. The vertical annotation lines in the graph were added using the Vertical Cursor gadget in Origin (Gadgets: Vertical Cursor).pcaspecref
  3. The Loading Plot shows coefficients of each variable (time) in PC1 and PC2. You can use the Data Reader (Tools toolbar) in Origin to find variables of larger coefficients (important times) in PC1 and PC2. Note that the sign in loading of a principal component doesn’t matter, and can be multiplied by -1.pcaspecload
  4. The Score Plot illustrates scores of 20 samples in PC1 and PC2. The 20 samples were divided into three groups as specified in Group Info. It is clear from the graph that olive oils and non-olive oils can be divided easily in principal component space, while mixed oils intersect with the other two. Confidence ellipses of scores for the three groups of samples are also shown, and some extreme points are labelled. If the number of samples is large, you can turn off labels by unchecking the Enable option in the Plot Details dialog’s Label tab (to open Plot Details, double-click on the pop-up plot or choose Format: Plot Properties).pcascore
  5. This App can create 3D component plots. Click the green lock in the upper-left corner of the graph and choose Change Parameters. In Plots tab, choose 3 for Number of Components to Plot (you may have to go to Settings and increase Number of Components to Extract to 3 or more. You can also change the Reference Spectrum on the Plots tab to see other samples.

Conclusion

Principal component analysis is an effective method to find important frequency or wavelength regions from a group of samples, and help classify samples in the principal component space. It can also be used to determine number of compounds from spectra of mixtures, and it can be incorporated with the partial least squares method to solve quantitative problems. The result can also be used for further classification, e.g. hierarchical cluster analysis and discriminant analysis.

2 Responses

  1. Alvaro Nunez Vargas

    Hello,

    Do you know how calculate derivate and Savitzky-Golay for all columns, not one by one

    • Hello, do you mean to calculate derivatives using Savitzky-Golay method for all columns?

      Suppose all your data in a worksheet. Set column as X or Y. You can choose a Y column, select Analysis: Mathematics: Differentiate from Origin menu. Click on the green lock icon for the output column, and select Repeat this for All Y Columns from the menu. It will calculate derivatives for all Y columns.

      Thanks.
      Sam

Leave a Reply