Exporting Data

The new data export pipeline was introduced into Sequentum Enterprise 2.70.0. Agents that were created from older versions will retain their original export settings and will remain configurable via the old Export Configuration GUI.

Note: For Sequentum Enterprise versions older than 2.70.0, refer to Exporting Data (Legacy/Compatibility Mode) article.

The main changes in the new Data Export implementation are:

  • Introduction of data export pipeline model
  • The old Data Distribution is now merged into Data Export as a form of export target (export to remote storage or Cloud storage)
  • Selective execution of each export target in the pipeline via run-time environment setting
  • New Manage Export Targets GUI for easy and intuitive editing of the data export pipeline.

manage-export-targets-gui.png
Preview of the Data Export Pipeline management GUI (with sample configuration) 

The image above show an example of an agent's data export configuration. The export configuration shown consists of two main export pipelines, separated by a grey horizontal line:

  • The first pipeline exports to a Microsoft Excel spreadsheet, then to a Parquet file, then to a JSON format, and finally, sends all three exported files to a Google Drive and AWS S3 bucket.
  • The second pipeline exports to data in a CSV formatted file, then to a Parquet file, and adds the two exported files to a Zip file.

Configuring Export Targets

As with older versions of Sequentum Enterprise, newly created agents are, by default, configured to export to a Microsoft Excel (2003) spreadsheet, but you can configure the agent's export target(s) by clicking the ribbon menu button Data > Export Target Configuration.

Additional export targets can be added to the export pipeline via the Add Export Target drop-down, or by right-clicking to go to the context menu.

NewExportTargets.gif
Adding export targets to the data export pipeline (Click to enlarge)

Export targets can be moved around the export pipeline on the screen using your mouse to drag and drop.

Supported Export Targets

Sequentum Enterprise supports data export to files, remote storage (cloud), database and custom scripting.

export-targets-dropdown.png
Supported Export Targets 

Here is a list of all supported export targets:

  • Files - listed below are the supported file-type export targets:
    • Microsoft Excel - 2003 and 2007+ editions of Excel are supported.
      • Microsoft Excel 2003 - data will export to a single Excel spreadsheet, but it may save the data into more than one worksheet. Any images will be saved to the disk.
      • Microsoft Excel 2007 and later - data will export to a single Excel spreadsheet. You can save multiple data tables to a single worksheet using grouping and outlining. Any images will either be saved to the disk or embedded into the worksheet depending on the configuration.
    • CSV - data will export to one or more CSV files. The default character encoding is ASCII, but you can specify another type of encoding.
    • XML - data will export to a single XML file.
    • PDF - data will export to a PDF document. The data will be displayed the same way as for Excel 2007. Note: Only content that is visible in Excel will be written to PDF, and since Excel has a limit to the maximum cell height, long text content may not be exported correctly to PDF.
    • JSON - data will export to a single JSON file.
    • Parquet - data will export to Apache Parquet files. Nested data is not supported, so one Parquet file is generated for each data table.
  • Folders & Compression
    • Folder - exported files can be stored in specific folder(s). The folder is created if it does not already exist.
    • Zip File - exported files can be compressed into a Zip file.
    • Backup - creates backup of all exported files into a specified folder.
  • Remote Storage (Cloud) - exported files can be delivered to the following:
    • AWS S3 Bucket
    • Azure Storage
    • Dropbox
    • Email
    • FTP
    • Google Drive
    • Google Storage
  • Database - the following database types are supported:
    • SQL Server - data will export to one or more SQL Server database tables.
    • MySQL - data will export to one or more MySQL database tables.
    • Oracle - data will export to one or more Oracle database tables.
    • OleDB - the data will export to one or more database tables. You can use any database that has an OleDB provider.
    • MongoDB - data will export to one or more MongoDB database collections.
    • Azure Cosmos DB - data will export to one or more collections in an Azure Cosmos DB instance. Note: Only the MongoDB API is currently supported.
    • PostgreSQL - data will export to one or more PostgreSQL database tables.
  • Custom Export Scripts - export script gives user the flexibility of controlling the entire export process.
    • Script - Access to internal export data is available.
    • Raw Data Script - Access to raw internal data is available.

Sequentum Enterprise always stores data in the internal database before exporting the data to the chosen output format. This means that you can always export data to a new format without having to extract the data again.

Please refer to the Exporting Data (Legacy/Compatibility Mode) article for additional information on the following topics:

  • Keeping Historical Data
  • Exporting Data to Oracle
  • Exporting Data to MS Access
  • Exporting Downloaded Images and Files
  • Character Encoding
  • MySQL Character Encoding
  • Changing the Default Data Structures
  • Separate Export Method
  • Add Columns Only Export Method
  • Add Columns and Merge Rows Export Method
  • Convert Rows Into Columns
  • Exporting Data with Scripts
  • Export From Multiple Agents.

Adding Data Export Pipeline Using Config File

Config Setting examples to add data exports targets in the pipeline, Remote Storage and Named Connections:

  • Use the below config setting to add multiple data export targets along with Robots.txt and Folder Export Targets:
[ExportTargets.GetOrAddTarget("Excel").GetOrAddTarget("Csv").GetOrAddTarget("Json").

GetOrAddTarget("Xml").GetOrAddTarget("Parquet").GetOrAddTarget("Pdf").

GetOrAddTarget("ErrorDataFiles").GetOrAddTarget("RobotsTextFiles").GetOrAddTarget("Folder").Set]

 AllExportTargets_config.jpg

  • To add CSV, Parquet, and Script in the pipeline
  • To add the multiple Remote Storages e.g. S3 Browser, Email, Google Drive and Azure Storage use the following config:
[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget("Parquet").GetOrAddTarget("Script").Set]
IsEnabled=true
ScriptLanguage=Python

PythonScript=[Content.ExportData]

[Content.ExportData]
import datetime
from Sequentum.ContentGrabber.Api import *
from Sequentum.ContentGrabber.Commands import *
from datetime import datetime
from System.Collections.Generic import *
import re
import os
import shutil

def ExportData(args):
dtFormatted = datetime.now().strftime("%Y%m%d%H%M%S")
newExportedFiles = List[str]()
Env_1 =args.GlobalData.GetString("env_")
args.WriteDebug("Environment -"+Env_1)
for i in range(len(args.ExportFiles.DataFiles)):
file = args.ExportFiles.DataFiles[i]
file_name = os.path.basename(file)
filename_withouext = os.path.splitext(file_name)[0]
filename_ext = os.path.splitext(file_name)[1]
newFile = file.replace(file_name, '{0}_{1}{2}'.format(dtFormatted, filename_withouext, filename_ext))
shutil.move(file, newFile)
newExportedFiles.Add(newFile)

args.ExportFiles.DataFiles = newExportedFiles
return True

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget("Parquet").GetOrAddTarget("Script").GetOrAddTarget(RemoteExport,DoNotUse).Set]
EmailRecipientAddresses =abc@sequentum.com
EmailBccAddresses = xyz@sequentum.com
EmailSubjectTemplate = ACC-TEST -Config
EmailBodyTemplate = ExportTarget Config
Environment =Prod

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget("Parquet").GetOrAddTarget("Script").GetOrAddTarget(RemoteExport,Named Connections-AWS-S3).Set]
RemoteDirectory = acc-test-aws s3
Environment =Always
IsDeliverDataFiles=True
IsDeliverExtractedFiles=True
IsCompressFiles=False

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget("Parquet").GetOrAddTarget("Script").GetOrAddTarget(RemoteExport,GoogleDriveCoomon).Set]
RemoteDirectory = acc-test- google drive
Environment =Dev

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget("Parquet").GetOrAddTarget("Script").GetOrAddTarget(RemoteExport,Azure_NewConnection_Test).Set]
RemoteDirectory = acc-test- google drive
Environment =QA

CSVParquetScript_multipleRemoteStorage.jpg

  • To add multiple Export Targets for different types of Databases e.g. Oracle, Postgre DB, SQL Server or MySQL.
[ExportTargets.GetOrAddTarget(Database,Oracle).Set]
ConnectionName =OracleMangaedconnectiontest

[ExportTargets.GetOrAddTarget(Database, Postgre).Set]
ConnectionName = PostgressSQLManagedConnectionTest

[ExportTargets.GetOrAddTarget(Database, SqlServer).Set]
ConnectionName = SQLServerNamedConnection-For Export

[ExportTargets.GetOrAddTarget(Database, MySql).Set]
ConnectionName = MYSQLManagaedConnectionTest

ExortDatabaseConfig.jpg

  • To add Export Target as CSV and upload the CSV file on multiple Remote Storages e.g. FTP and Google Storage.
[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget(RemoteExport,FTP Named Connection).Set]
RemoteDirectory = CGTesting
Environment =Always
IsDeliverDataFiles=True
IsDeliverExtractedFiles=True
IsCompressFiles=False

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget(RemoteExport,GoogleStorage Named Connection).Set]

CSV_FTP_GoogleStorage.jpg

  • To add Export Target as CSV and add Remote Storages e.g. S3 Browser and deliver the compressed file and rename the compressed file using FileNameTransformationScript.
[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget(RemoteExport,Named Connections-AWS-S3).Set]
IsCompressFiles=True
RemoteDirectory =bharti_abc

[ExportTargets.GetOrAddTarget("Csv").GetOrAddTarget(RemoteExport,Named Connections-AWS-S3).FilesTransformationScript.Set]
IsEnabled=true
ScriptLanguage=Csharp
CsharpScript=[Content.TransformFilename]

[Content.TransformFilename]

using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
//See help for a definition of RemoteExportTransformationArguments.
public static RemoteExportFiles TransformFiles(RemoteExportTransformationArguments args)
{
//Test.
var exportFiles = args.ExportFiles.ToRemoteExportFiles();
var dtFormatted = DateTime.Now.ToString("yyyy.MM.dd_HH.mm.ss") + "Z";
exportFiles.CompressedFileName = dtFormatted +"_"+args.Agent._name+ "_S3ExportTarget.zip";
return exportFiles;
 }
}

RemoteStorageCompressedFileName.jpg

 

  • To add Export Target as CSV and rename the CSV file using FileNameTransformationScript.
[ExportTargets.GetOrAddTarget("Csv").Set]
IsTimeStampFile=false

[ExportTargets.GetOrAddTarget("Csv").FilenameTransformationScript.Set] 
IsEnabled=true
ScriptLanguage=Csharp

CsharpScript=[Content.TransformFilename] 
[Content.TransformFilename]
using System;
using Sequentum.ContentGrabber.Api;
public class Script
//See help for a definition of FilenameTransformationArguments.

public static string TransformFilename(FilenameTransformationArguments args)
{
//Place your transformation code here. //This example just returns the input data 

var dt = DateTime.UtcNow;
var dtFormatted = dt.ToString("yyyyMMdd");
var outputFileName=args.FileName+"-"+dtFormatted+".csv";
return outputFileName;
}
}

ExportTargetFileNameTransformationScript.jpg

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.