Transformation Scripts
Transformation scripts are used to transform content after it has been extracted from a web page. Transformation is often used on HTML elements to extract information that is not placed in individual elements and therefore cannot be selected in the web browser. For example, Transformation can be used to extract parts of an address, such as a postal code, from a single HTML element containing the full address.
Transformation scripts can be used in most Capture Commands to transform the content extracted by the commands, but can also be used in other types of commands, such as in a Navigate Link command to transform an extracted URL.
A Transformation can be defined as a regular expression or as a C# or VB.NET script. Regular expressions are often used when you wish to extract sub-text from a larger piece of extracted text.
The following regular expression example extracts all the text until the first '-' character:
(.*?)-
return $1
See the topic Script Languages for information about how to use regular expressions in Sequentum Enterprise.
The following script also extracts all text until the first '-' character, but uses C# instead of regular expressions:
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.Content.Remove(args.Content.IndexOf('-'));
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return args.Content.Remove(args.Content.IndexOf('-'))
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The function will return the transformed content.
The following script is used to get the page response time or page load time for the web page:
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.HtmlDocument.ResponseTime.ToString();
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return str(args.HtmlDocument.ResponseTime)
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The ResponseTime property of HtmlDocument will return the page response time in milliseconds as the transformed content.
The following script is used to get all the headers including User-Agent :
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.HtmlDocument.RequestHeaders;
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return str(return args.HtmlDocument.RequestHeaders)
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The RequestHeaders property of HtmlDocument will return all headers, including User-Agent.
The following script is used to get the entire request in string format:
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.HtmlDocument.Request;
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return str(args.HtmlDocument.Request)
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The Request property of HtmlDocument will return the entire request in string format.
The following script is used to get the HttpResponseVersion:
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.HtmlDocument.HttpResponseVersion.ToString();
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return str(args.HtmlDocument.HttpResponseVersion)
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The property args.HtmlDocument.HttpResponseVersion will return the Http Response Version based on the request sent by the user, depending on the communication between the client and server, selecting the highest supported version.
The following script is used to get the Http Request Version and is based on the value of the Http Client property set in the Agent command properties:
C#
PythonPython
using System;
using Sequentum.ContentGrabber.Api;
public class Script
{
public static string TransformContent(ContentTransformationArguments args)
{
return args.HtmlDocument.HttpRequestVersion.ToString();
}
}
import clr
from Sequentum.ContentGrabber.Api import *
def TransformContent(args):
return str(args.HtmlDocument.HttpRequestVersion)
The script must have a static method with the following signature:
public static string TransformContent(ContentTransformationArguments args)
The property args.HtmlDocument.HttpRequestVersion will return the Http Request Version and is based on the value of the Http Version property set in the Agent command properties.
An instance of the ContentTransformationArguments class is provided by Sequentum Enterprise and has the following functions and properties:
Property or Function | Description |
---|---|
Agent Agent | The current agent. |
ScriptUtils ScriptUtilities | A script utility class with helper methods. See Script Utilities for more information. |
Command Command | The current agent command being executed. |
IConnection DatabaseConnection | The current internal database connection used by the agent. This connection is already open and should not be closed by your script. |
string Content | The extracted content that should be transformed. |
IHtmlNode HtmlNode | The extracted HTML node. |
IHtmlDocument HtmlDocument | The current HTML document. |
IInternalDataRow DataRow | The current internal data row containing the data that has been extracted so far in the current container command. |
IRuntimeData RuntimeData | Provides access to the internal database. |
bool IsDebug | True if the agent is running in debug mode. |
bool IsParentCommandMissingSelectionOrData | This parameter is set to true if the parent container's selection was not found, or is iterating through an empty data list. |
bool IsParentCommandActionError | This parameter is set to true if the parent action command encountered an error while executing the action command. |
IInputData InputDataCache | All input data available to the current command. |
bool HasDoneAction | This parameter is set to true if the parent action command triggered an action in the web browser. |
BrowserResponse LastErrorResponse | The last error response from the web browser. This will contain any errors messages when loading a new web page. |
void WriteDebug(string debugMessage, DebugMessageType messageType = DebugMessageType.Information) | Writes log information to the agent log. This method has no effect if agent logging is disabled, or if called during design time. debugMessage - The log message. messageType - The log message type. The agent log level specifies what information is written to the log. If the log level is set to Low for example, only errors are written to the log. |
void WriteDebug(string debugMessage, bool showMessageInDesignMode, DebugMessageType messageType = DebugMessageType.Information) | Writes log information to the agent log. This method has no effect if agent logging is disabled, or if called during design time. debugMessage - The log message. showMessageInDesignMode - Set to True if you want to see debug messages in design mode. messageType - The log message type. The agent log level specifies what information is written to the log. If the log level is set to Low for example, only errors are written to the log. |
void WriteInfo(string debugMessage, params object[] pars) | Writes log information to the agent log. This method has no effect if agent logging is disabled, or if called during design time. debugMessage - The log message. pars - A variable number of parameters used in the debug message. |
void WriteError(string debugMessage, params object[] pars) | Writes log error to the agent log. This method has no effect if agent logging is disabled, or if called during design time. debugMessage - The log message. pars - A variable number of parameters used in the debug message. |
void WriteWarning(string debugMessage, params object[] pars) | Writes log warning to the agent log. This method has no effect if agent logging is disabled, or if called during design time. debugMessage - The log message. pars - A variable number of parameters used in the debug message. |
void Notify(bool alwaysNotify) | Triggers notification at the end of an agent run. alwaysNotify - If alwaysNotify is set to false, this method only triggers a notification if the agent has been configured to send notifications on critical errors. |
void Notify(string message, bool alwaysNotify) | Triggers notification at the end of an agent run, and adds the message to the notification email. message - The message to add to the notification email. alwaysNotify - If alwaysNotify is set to false, this method only triggers a notification if the agent has been configured to send notifications on critical errors. |
void AddNotificationMessage(string message) | Adds a message to the notification message, but does not trigger a notification. message - The message to add to the notification email. |
GlobalDataDictionary GlobalData | Global data dictionary that can be used to store data that needs to be available in all scripts and after agent restarts. Input Parameters are also stored in this dictionary. |
IConnection GetDatabaseConnection(string connectionName) | Returns the specified database connection. The database connection must have been previously defined for the agent or be a shared connection for all agents on the computer. Your script is responsible for opening and closing the connection by calling the OpenDatabase and CloseDatabase methods. |
IInputDataRow GetInputData() | If the current command is a data provider, the data for that command is returned. Otherwise, this function searches the command's parents and returns the first found input data. |
IInputDataRow GetInputData(Command command) | If the specified command is a data provider, the data for that command is returned. Otherwise, this function searches the command's parents and returns the first found input data. |
IInputDataRow GetInputData(string commandName) | If the specified command is a data provider, the data for that command is returned. Otherwise searches the command's parents and returns the first found input data. |
IInputDataRow GetInputData(Guid commandId) | If the specified command is a data provider, the data for that command is returned. Otherwise, the function throws an error. |
void SetPageLoadError(string errorMessage) | Logs a page load error and mark the page for retry if the agent errors are later retried. errorMessage - The error message to log. |
GetOrAddDataStorage(string storageName, string key, CustomDataColumns columns) | Returns a custom database table storage that can be used to store persistent data while running an agent. A new storage is created if it does not already exist. The storage is created in the internal database. storageName - The name of the storage. key - A value indentifying a sub-storage. columns - Data field definitions for the storage. |
ContainerDataDictionary ContainerData | Container data dictionary that can be used to store data that needs to be available in the current container command and all sub-commands of the current container. |