Exporting Data (Legacy/Compatibility Mode)

While performing data extraction, an agent saves data to an internal data store and then exports that data to a specific export target when the agent completes or if the user manually stops the agent.

Sequentum Enterprise supports many different types of data stores such as databases, CSV files, XML files and spreadsheets. You can use a data export script to gain complete control over the entire export process - including the export of data to data stores that Sequentum Enterprise doesn't support.

Some sections in this article apply only to Sequentum Enterprise versions older than 2.70.0. For Sequentum Enterprise versions 2.70.0 and later, refer to the article Exporting Data.

Configuring Export Targets

Note: This section applies only to Sequentum Enterprise versions older than 2.70.0.

Sequentum Enterprise can export data to many different data formats. By default, data is exported to an Excel spreadsheet, but you can configure the agent's export target(s) by clicking the ribbon menu button Data > Export Target Settings.

ExportingTarget.gif

Configuring Export Targets

 

An agent can be configured to export to multiple targets. To activate an export target, simply select the export target tab, then tick the "Enable <target> Export" checkbox as shown in the image above. Enabled export targets have a check mark icon next to the export target tab heading.

Supported Export Targets

Note: This section applies only to Sequentum Enterprise versions older than 2.70.0.

Sequentum Enterprise supports data export to files, database, and via scripting. Here is a list of all supported export targets:

  • Files - listed below are the supported file-type export targets:
    • Microsoft Excel - 2003 and 2007+ editions of Excel are supported.
      • Microsoft Excel 2003 - data will export to a single Excel spreadsheet, but it may save the data into more than one worksheet. Any images will be saved to the disk.
      • Microsoft Excel 2007 and later - data will export to a single Excel spreadsheet. You can save multiple data tables to a single worksheet using grouping and outlining. Any images will either be saved to the disk or embedded into the worksheet depending on the configuration.
    • CSV - data will export to one or more CSV files. The default character encoding is ASCII, but you can specify another type of encoding.
    • XML - data will export to a single XML file.
    • PDF - data will export to a PDF document. The data will be displayed the same way as for Excel 2007. Only content that is visible in Excel will be written to PDF, and since Excel has a limit to the maximum cell height, long text content may not be exported correctly to PDF.
    • JSON - data will export to a single JSON file.
    • Parquet - data will export to Apache Parquet files. Nested data is not supported, so one Parquet file is generated for each data table.
  • Database - the following database types are supported:
    • SQL Server - data will export to one or more SQL Server database tables.
    • MySQL - data will export to one or more MySQL database tables.
    • Oracle - data will export to one or more Oracle database tables.
    • OleDB - te data will export to one or more database tables. You can use any database that has an OleDB provider.
    • MongoDB - data will export to one or more MongoDB database collection.
    • Azure Cosmos DB - data will export to one or more collection in an Azure Cosmos DB instance. Only the MongoDB API is currently supported.
    • PostgreSQL - data will export to one or more PostgreSQL database tables.
  • Custom Export Scripts - export script gives user the flexibility of controlling the entire export process.
    • Script - Access to internal export data is available.
    • Raw Data Script - Access to raw internal data is available.

Sequentum Enterprise always stores data in the internal database before exporting the data to the chosen output format. This means you can always export data to a new format without having to extract the data again.

Exporting Data to File-Type Targets

Each file-type export target has its own set of configurable properties, which can be configured in its own tab on the Data Export Configuration window. One common property among file-type export targets is the Filename Transformation, which is basically one of the supported script languages that gets executed during export to give users the flexibility of dynamically generating export filenames.

Exporting Data to a Database

When exporting the data to a database, Sequentum Enterprise automatically creates new database tables if they don't exist. The option Allow drop schema can be set to specify when an agent is allowed to automatically delete and re-create existing database tables if they do not have the correct number of columns.

selectingAnExportTarget.png
SQL Server export target.

Use the button Export Initial Dataset to write any existing extracted data to your database tables. This is mostly useful when using change tracking and exporting changed data only, so you get any existing data in your database tables before starting to export changes only.

Use the button Delete Existing Data to remove all database tables associated with this agent. This will delete all existing data including any historical data.

Keeping Historical Data

An agent can keep historical data when exporting to a database. The historical data will include the most recent data set and all previously extracted data sets. An agent can save the most recent data set in one set of database tables, and the historical data in a separate set of database tables, or the agent can save only historical data in a single set of database tables.

Exporting Data to Oracle

Sequentum Enterprise supports Oracle databases but only through the Oracle OleDB provider. You must first install the Oracle client software before the OleDB provider will work. The OleDB provider given by Microsoft supports Oracle only up to version 8i, so you should use the OleDB provider from Oracle.

Use an Oracle OleDB connection string such as this one:

Provider=OraOLEDB.Oracle;Data Source=//localhost:1521/OracleTest;
User Id=test;Password=test; 

The Data Source parameter in the connection string can be set to TNS, but only if you configure the TNS configuration file properly.

Exporting Data to MS Access

Sequentum Enterprise is a 32-bit application and will only be able to communicate with a 32-bit OleDB provider. If you have installed the 64-bit version of Microsoft Office, then you won't be able to use the 32-bit OleDB provider, and Sequentum Enterprise won't be able to export data to Microsoft Access. If you are using the 32-bit version of Office, you can use a connection string such as this:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=
C:\Data\MyDatabase.accdb;Persist Security Info=False;

You must also change the Parameters option from "@" to [@].

Exporting Downloaded Images and Files

An agent will often download images and files during data extraction. By default, the agent saves these files in the internal database. However, you can configure the agent to save all files to disk by unchecking the Embed files in database box on the Internal Database configuration screen. When writing files to disk, the agent will store the full path to each file in the internal database.

If you are exporting data and there are corresponding files in the internal database, then these files will also export to the external database. If you uncheck the Embed files in database box, then only the file paths will export to the external database. If you are exporting data to a file format that doesn't support embedded files, such as Excel 2003, then only the file paths will export to the external database and the files will be written to disk.

Typically, Sequentum Enterprise will use the name of a file when saving it to disk. However, since a website could potentially use the same name for two different files, Sequentum Enterprise will set the file name with a unique identifier (GUID) if the file already exists on disk.

You can also use a file name transformation script to name the files, and configure the script to use some of the other extracted data elements for the file name. For example, you could name a product image with the product name or product ID.

Character Encoding

Although the user interface text is English-only, Sequentum Enterprise can extract data from websites in any language or encoding. If you have problems with the encoding of your data extraction, then the problem is most likely with your export target. For example, if you are exporting data to a MySQL database that uses Latin1 encoding, then you'll find a question mark in the place of each Unicode character.

In accordance with CSV specifications, CSV output files have ASCII encoding and do not support Unicode. However, if you're using a CSV import tool that can read CSV files with other character encodings, then you can specify the encoding when choosing the CSV export target.

MySQL Character Encoding

Many users install their MySQL databases with Latin1 character encoding and then encounter some characters appearing as question marks. If you want to export Unicode characters to a MySQL database, you'll need to use Unicode character encoding such as UTF-8. If you configure your MySQL database to use Unicode, you must also specify the encoding when configuring the Export Target for the agent.

Changing the Default Data Structures

All container commands have a set of properties that you can use to specify how to export data extracted by commands in the containers:

Property Description

Export Method

Specifies how to export data extracted by commands in the container. The following options are available:

Separate - Extracted data is written to a separate child table. The data in the child table will be linked to data in its parent table by a key.

Add columns and Rows - Extracted data will be merged with data extracted by the parent container command. Each data entry extracted by this container command will add a new row in the data table, and new data fields will add new columns in the data table.

Add columns Only - Extracted data will be merged with data extracted by the parent container command. Data entries extracted by this container will not add new rows to the data table, but new data fields will add new columns to the data table. If this container extracts more than one data entry, only the last data entry will be used.

Add Columns and Merge Rows - Extracted data will be merged with data extracted by the parent container command. If this container extracts more than one row, the values for each row will be combined to generate a single row, which will then be merged with data from the parent container command. The Capture command option Merge Rows Method specifies how the row values are combined.

Convert Rows Into Columns - Extracted data will be merged with data extracted by the parent container command. Each row extracted by this container adds new columns to the data table. The option Export Rows to Columns Name Command can be used to specify which Capture command extracts the data that will be used as the new column name, and Export Rows to Columns Value Command can be used to specify which Capture command extracts the data that will be stored on the column.

Export Rows to Columns Name Command

A capture command that will provide the data for column names. This property is optional but must be set together with a command name for the Value Command property.

Export Rows to Columns Value Command

A capture command that will provide the data for column values. This property is optional but must be set together with a command name for the Name Command property.

Export ID Name

Specifies the name of the primary key column in the exported table (database table, spreadsheet, CSV file, XML Node), if this container generates a new table. The export name post-fixed with ID is used if this property is empty. If multiple agents are exporting data to the same table, then you must set this option to the same value for all those agents.

Export Name

Specifies the name of the exported table (database table, spreadsheet, CSV file, XML Node),  if this container generates a new table. The command name is used if this property is empty. If multiple agents are exporting data to the same table, then you must set this option to the same value for all those agents.

Plural Export Name

Specifies the plural name of the exported table if this container generates a new table. The Export Table Name with the added character s is used if this property is empty. This property can be used to control the name of XML nodes when exporting to XML.

Export Empty Row If No Data

Exports a single empty data row if this container extracts no data. Parent and sibling data will be lost if merged with an empty data set, so this option ensures that parent and sibling data is exported when this container extracts no data.

 

Separate Export Method

When using this method, data extracted by commands in the container will save separately. If, for example, the data exports to a database, then the data extracted by the container will go into separate database tables.

The figures below show a simple agent having one list container, along with the data export it generates. By default, the Movie List container will merge its data with the data from the imdb container, and the result will be a single data table containing the entire data extraction.

imdb.png

A simple agent with one list command

You can separate the data from the Movie List container from the imdb container data by setting the property Export Method to Separate in the Movie List command.

Add Columns Only Export Method

When using this export method, data from the container will combine horizontally with data from its parent container as shown in the example below. Only the last data row will be used of the container extracts more than one data row.

Consider an example in which the parent container and the current container generate the following data sets. Here is the data for the parent container:

Column1

Column2

Data1

Data2

Data3

Data4

Data5

Data6

Data7

Data8

Data9

Data10

Here is the data from the current container:

Column3

Column4

Column5

Data11

Data12

Data13

This would be the result of a merge of these two data sets:

Column1

Column2

Column3

Column4

Column5

Data1

Data2

Data11

Data12

Data13

Data3

Data4

Data11

Data12

Data13

Data5

Data6

Data11

Data12

Data13

Data7

Data8

Data11

Data12

Data13

Data9

Data10

Data11

Data12

Data13

 

Add Columns and Merge Rows Export Method

When using this export method, data from the container will be combined into a single row, which is then merged with data from its parent container as shown in the example below.

Consider an example in which the parent container and the current container generate the following data sets. Here is the data for the parent container:

Column 1

Column 2

Column 3

Data7

Data1

Data2

Data8

Data3

Data4

Data9

Data5

Data6

Here is the data from the sub-container:

Column 4

Column 5

Data10

Data12

Data11

Data13

The result will look like the table below if the option Merge Rows Method is set to Concatenate for both Column 4 and Column 5:

Column1

Column2

Column3

Column4

Column5

Data7

Data1

Data2

Data10,Data11

Data12,Data13

Data8

Data3

Data4

Data10,Data11

Data12,Data13

Data9

Data5

Data6

Data10,Data11

Data12,Data13

 

Convert Rows Into Columns

When using this export method, each data entry extracted by the container adds new data fields to the data extracted by the parent container. By default, the names of the new data fields will be the same as the names of the Capture commands extracting the data plus an index number. For example, if the Capture command name is image, the field names will be as follows:

image_1, image_2, image_3, ...

If the container extracts data such as name/value properties, then you may want the new data fields to have names that correspond to the extracted data. For example, consider the following data extracted by a container command:

Name

Value

Width

100

Height

50

Depth

25

You may want this data to generate the following data fields:

Width

Height

Depth

100

50

25

You can achieve this by setting the property Export Rows to Columns Name Command to the name of the command that will extract the values you want to use as field names. Also, set the property Export Rows to Columns Value Command to the name of the command that will extract the values you wish to use as field values.

Limitations

Using the export method Convert Rows Into Columns can lead to a different number of columns in the exported data table every time you run an agent. This is usually only desirable when exporting data to Excel spreadsheets, and may sometimes be useful when exporting data to CSV files.

Exporting Data with Scripts

With data export scripts, you can customize the export process for extraction data sets. An export script is ideal for exporting to special data structures in a database or exporting data to a third-party or custom-data store.

An export script gives much more control than the standard export process because the script can control when to insert, update, or delete data in the export target.  See the topic Data Export Scripts for more information.

Export From Multiple Agents

Sequentum Enterprise can export data from multiple agents into the same external database tables without having to use a custom export script.

All agents must export data in the exact same format, and export names must be set for all container commands exporting to separate tables, and the export names must be the same for all agents.

The Database Export option Shared Database Tables must be set to True.

A table column Agent ID will be added to the external database tables, so Sequentum Enterprise can keep track of what data belongs to which agents.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.