HTML Connector to Import Web Data

Introduction

Origin introduced HTML Connector since Origin 2019b to import HTML table from an HTML file or website. But it is greatly improved in Origin 2023b to support multiple tables, partial import, post import script, and so on.

We will use Boston weather info. from https://www.extremeweatherwatch.com as examples.

Import a table

  1. Choose Data: Connect to Web… to open the dialog and enter the following web link and click OK.
  2. https://www.extremeweatherwatch.com/cities/boston/year-2022
  3. In Data Connector Browser dialog, double click on the _1 table so the table will be added to bottom panel.
  4. Click OK.
  • Navigator panel on left listing all tables available to import
  • Table name shows on left panel
  • Table name is used as sheet name.
  • web link shows in Book title.
  • the last part of link e.g. year-2022 shows on top of the left navigator panel
  • Source row with last part of link as well.
  • Info. in workbook organizer

Customize the import

User can customize the import such as only import certain columns, or add some Origin’s LabTalk script to run after import to automate part of the work.

Partial Import

Import certain columns only.

  1. Click the connector icon and choose Options.
  2. Enter 1 3:5 to import 1st column, skip 2nd column, then import 3rd to fifth columns.

Post Import Script

Run script to add a Mean label row that shows mean value of each imported column

  1. Click the connector icon and choose Post Import Script…
  2. Enter the following script
wks.userparam(++Mean);
loop(i,1,wks.dc.c2) { wcol(i)[Mean]$="=Mean(this)";}

Import Other Tables with same Customization

  1. Select e.g. _2 to _12 tables (Feb to Dec) on left panel
  2. Choose Connect by Duplicating Active Sheet
  3. It will create new sheet for each table, with same partial import and post import script executed.

Rename Sheet Names

  1. Click downarrow on left side of all sheets to expand the Worksheet Organizer.
  2. Choose Book -> Data Sheet node.
  3. Click on Name box to modify _1, _2, …, 12 to Jan, Feb, … Dec.

Change Data Source

  1. Click the connector icon and choose Data Source…
  2. Change the link to another year, e.g. https://www.extremeweatherwatch.com/cities/boston/year-2003
  3. Click OK.
  4. All sheets will be updated to 2003’s weather data with post import script run.

Import Other Table to Same Sheet

Let’s see I want to bring in only highest daily temperatures (column 2) in each month in same sheet and plot them. You may have noticed 1st column shows both month and day info. as January 1, January 2, which I don’t want.

First customize one table import:

  1. Start a new workbook.
  2. Choose Data: Connect to Web… and enter https://www.extremeweatherwatch.com/cities/boston/year-2022
  3. Double click _1 table and click OK to import it.
  4. Click Options and enter 2, also set the Use Partial Table Name to 2:0 so that table name will be 1 instead of _1. This will be useful since later in post Import script, wlabel command can change such number into 3 letter month, e.g. Jan.
  5. Click OK so it only imports High column.
  6. Click Post Import Script… and enter the script to generate Mean row. set column A as Y, and set table row to show 3 letter month name. Note: U will not see table row until more tables are imported into same sheet.
wks.userparam(++Mean);
loop(i,1,wks.dc.c2) { wcol(i)[Mean]$="=Mean(this)";}
wks.col1.type = 1; // Set as Y
wlabel ty:="table" fmt:="CM0"; //change table row to show 3 letter month name

Then Import High column from table _2 to _12 into same sheet:

  1. Select _2 to _12 table on left panel and choose Add to Active Sheet.
  2. Only High column is added with mean value calculated.
  3. When multiple tables import into same sheet, a Table row will show so user can tell which table the column is from. The above wlabel script changed table name 1, 2, 3 into Jan, Feb, …
  4. Highlight all columns and choose Plot: Basic 2D: Line.
  5. Since no X column, it will just use row index as X to plot them.
  6. Click on Legend and choose Data Plot Legend Translation Mode button on the mini toolbar (2nd button on 2nd row) to change to Table.
  7. Choose Attach to Plots button on legend mini toolbar to make legend show after each plot.

LabTalk Script for HTML Connection

There is LabTalk access to HTML Connection. See the following examples.

Import high column of each table (1 to 12) into same sheet, with mean value calculated and table row showing 3 letter month name.

newbook; //start a new book
wbook.dc.add("HTML"); //specify the connector
wbook.dc.source$="https://www.extremeweatherwatch.com/cities/boston/year-2022";
wks.dc.sel$=" Tables/_1|Tables/_2|Tables/_3|Tables/_4|Tables/_5|Tables/_6|Tables/_7|Tables/_8|Tables/_9|Tables/_10|Tables/_11|Tables/_12"; //specify tables to import
wks.dc.script$="wks.userparam(++Mean);loop(i,1,wks.dc.c2) { wcol(i)[Mean]$="=Mean(this)";}";//specify the post import script
Tree tr=wks.dc.optn$; 
tr.partial_col$="2";//specify only import 2nd column
tr.nameoptn$="2:0";//partial table name from 2nd char on
tr.ToString(wks.dc.optn$); 
wks.dc.import(); //import;
wks.col1.type=1; //set 1st column as Y
wlabel ty:="table" fmt:="CM0"; //show table name in 3 letter month format

Import Table 12 and then append Table 1 as new rows below Table12 columns in same sheet

newbook;
wbook.dc.add("HTML");
wbook.dc.source$="https://www.extremeweatherwatch.com/cities/boston/year-2022";
wks.dc.sel$="Tables/_12";
wks.dc.import();
wks.DC.flags=256; //0: append new columns, 256: append new rows.
wks.dc.add("Tables/_1");
wks.labels(-D2); //hide the table row, 2nd user-defined parameter row
wks.userparam(++Mean);
loop(i,1,wks.dc.c2) { wcol(i)[Mean]$="=Mean(this)";}

One Comment on “HTML Connector to Import Web Data”

Leave a Reply

Your email address will not be published. Required fields are marked *