Download Document
The Download Document command extracts a document from a web page. The command will download a document, and then save it to the file system or send it to a database - depending on your chosen export target.
The web selection path for this command normally points to the document link itself, but it could also point to a web element that contains information from which the document URL can be derived by using Content Transformation.
The figure below shows the Command Properties panel after choosing Download Document from the New Command drop-down:
Data Fields
If the agent is saving the document to a database, then by default this command will generate two data fields: one for the document binary data and another for the name of the document. If the agent is saving the document to the file system, then the command will generate only one data field containing the full file path to the document. The command property Export URL can be used to also generate a data field that contains the document URL.
Command Configuration
The Common tab in the Configure Agent Command panel has three tabs:
File URL - contains the URL for the file.
File Name - contains the name of the downloaded file.
Convert to HTML - specifies if the downloaded document should be converted to HTML.
We explain the details of each below.
File URL
The entry in this tab determines the specific URL for the file, and the agent uses this URL to download the document at run time.
You can choose the HTML attribute that the command should extract to get the document URL. The default value is URL, which extracts the href HTML attribute (if the chosen web element is a link).
Click the Transformation Script button to enter a Regular Expression or write a .NET script that will transform the document URL to meet your requirements. See the Content Transformation Script topic for more information.
Use the Data Value option to specify that an agent data value will be used as a file URL. The agent data can come from a data provider, an input parameter, or captured data.
File Name
The entry in this tab contains the file name. From the drop-down menu, you can choose the HTML attribute that you want to use as the name.
Click the Transformation Script button to enter a Regular Expression or write a .NET script that will transform the document name to meet your requirements. See the Content Transformation Script topic for more information.
Use the Data Value option to specify that an agent data value will be used as the file name. The agent data can come from a data provider, an input parameter, or captured data.
Use the Detect File Extension option to specify if the agent should try and detect the file type of the downloaded document, or if a transformation script or a data value will provide a file name that includes a file extension.
Convert To HTML
A downloaded document can be converted into an HTML page as it's being downloaded, and a URL command can later be used to open the HTML page. Capture commands can then be used to extract data from the HTML page.
Please see Extracting Data From Non-HTML Documents for more information.
Click To Download
Check this box when no direct URL is available for the document, but it is necessary to download the document by clicking on a web element - such as a button. When you enable this option:
The File URL tab becomes unavailable
Sequentum Enterprise will assign a unique identifier as the file name
You will have access to the Action configuration tab where you can fine-tune the behavior of the action.
See the topic Action Configuration for more information.
Command Properties
Action
Download Method: Specifies how the document is downloaded. The file can be downloaded using the extracted URL directly, using the extracted URL in a dynamic web browser, using an action on the selected web element, or as a result of an action from a parent field.
Download From URL: Downloads a document from URL.
Use Web Browser: Downloads a document using the web browser.
Click To Download: When no direct URL is available for the document, but it is necessary to download the document by clicking on a web element - such as a button.
Download Header Only: Downloads headers only.
File Download Action: Action configuration when the Download Method property is set to Click to Download.
Action Type: Specifies the action to perform when clicking on a web element.
Activities: Specifies how this action should wait for the browser activities to complete.
Wait for Content: The default value is set to Optional. Waits for web selections and URLs specified by this property value required, optional, no wait.
Wait for External Sub-Page Load: The default value is set to No Wait or Parse. Specifies if the command should wait for one or more page loads to occur that originate from different domains than the main page.
Wait for External Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async script loads to complete on subpages from different domains than the main page.
Wait for Internal Sub-Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on sub-pages from the same domain as the main page.
Wait for Internal Sub-Page Load: The default value is set to Optional. Specifies if the command should wait for one or more page loads to occur in browser frames. Will only wait for pages from the same domain as the main page.
Wait for Internal Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async script loads to complete on subpages from the same domain as the main page.
Wait for Main Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on the main page in the browser.
Wait for Main Page Load: The default value is set to Required. Specifies if the command should wait for one or more full-page loads to occur in the browser.
Add Force Refresh Header: Adds an “If-Modified-Since” header to the web request to make sure the web page is not retrieved from cache. The default value is set to false.
Block Known Ad Servers: The web page will not load content from known ad servers, such as “ad.doubleclick.net ”. This speeds up processing slightly.
Block Popup: Default value is set to False. This property set to True allows the user to blocks any popup opened by the action.
Blocked Events: A list of events that are blocked in design and debug mode. These events are only blocked if the document mode is greater than 9.
Break HTML Area: Specifies whether the action breaks an HTML area. Actions that load a new page always break HTML areas.
Browser Mode: This property specifies the different types of browser that can be used to run an agent·
Dynamic Browser: The browser functions as a standard web browser, and it downloads images and executes JavaScript.
HTML Parser: The HTML Parser doesn’t execute JavaScript and does not load frames, so it is faster and more reliable than a Dynamic Browser. However, the parser doesn’t work on websites that rely on JavaScript, and the parser may also be unable to some web forms (even when they don’t rely on JavaScript).
J SON Parser: This property does not start a new browser. Instead, it parses J SON content returned by a web server, and lets you easily extract content elements from the J SON content.
XML Parser: This property does not start a new browser. Instead, it parses XML content returned by a web server and lets you easily extract content elements from the XML content.
Capture Requests(Regex): A regular expression matching URLs which requests should be captured and made available for scripting. Multiple regular expressions separated by line breaks can be specified.
Clear Storage: This property allows users to clear the website’s cookies and local storage. The Default value indicates that no storage will be cleared. This property when set works on the commands such as Navigate_URL, Navigate_Link, and not on the overall agent.
Default: A value of default will delete all cookies and local storage if using an in-memory cache, but only site cookies and local storage is using a persistent cache.
None: Indicates that will not delete cookies and storage.
All Cookies & Storage: It will delete all cookies and local storage if using an in-memory cache, but only site cookies and local storage is using a persistent cache.
Site Cookies & Storage: It will delete only website cookies and local storage if using an in-memory cache, but only site cookies and local storage is using a persistent cache.
All Cookies: It will delete all cookies only.
Site Cookies: It will delete website cookies only.
Site Session Cookies: It will delete site session cookies only.
Close Browser After Use: The default value is set to False. This property set to True allows users to close any browser after use. This option is only applicable if the action opens a new browser.
Command Returning Max Scrolls: Limit the number of scrolls to a number captured by the specified command. This option can be useful when a page can scroll indefinitely.Custom Request Headers: Custom headers are sent with all requests made by a browser. Headers must be sent on each new browser.
Custom Error Handling: This enables the user to define a custom script for handling errors. This property is only applicable if "Error Handling" is set to "Custom Error Handling".
Default Events: A default list of events are fired on the selected web element. The default value is set to True.
Detect Encoding: This property specifies how a new parser should detect the encoding of the content received from a web server. If the option is set to Default, this action uses the same detection method as the parent parser. If there’s no parent action, The default detection method is “Content & Server”. You must reopen the browser window or the agent for this change to take effect.
Discover Action: This property set to True configures action properties automatically when the command is first executed. The default value is False.
Editor Action: Specifies the web element or URL to use when performing the action in the design browser. The default value is “Default” which is indicated by the “Use Specific URL” value False. To use the Specific URL, we set the “Use Specific URL” value True which allows users to use a specific URL.
URL: Specifies the URL to load.
Use Specific URL: Specifies whether to load a direct URL. This value is set to False.
Error Handling: This property specifies how the agent should react when an error occurs while executing the command action. The default reaction is to exit the command. Use the option “No Error Handling” if you want the agent to continue executing sub-commands after an error. You can handle the error in sub-commands by using the script parameter “IsParentActionError”. This property specifies the different ways of Error Handling.
Custom Error Handling: This property specifies how the agent should react when an error occurs while executing the command action. The default reaction is to exit the command. Use the option "Custom Error Handling" if you want the agent to use a custom script after an error occurs.
Exit Command: The agent command will exit the action command and continue executing the next command. The agent will skip all the sub-commands of the action command.
Retry Command: The agent command will retry the action command a specified number of times, and if the action command does not succeed, it will skip all sub-commands of the action command and continue executing the next command. Set the property Retry Count to specify the number of retries. If Retry Count is set to zero, the agent will keep retrying the command indefinitely.
Retry With No Error Handling: The agent command will retry the specified number of times, but then continue with no error handling.
Restart and Resume Agent: The agent command will restart and resume where it left off. This option is useful if an error puts the website into a state where the agent cannot continue.
Restart Agent and Retry Command: The agent command will retry the action command a specified number of times, and if the action command does not succeed, then it will Restart the Agent.
Stop Agent: The agent will stop.
Error Retry Clear On Success: This property clears the counter if the action succeeds. The default value is set to True.
Error Retry Count: This property allows users to specify the number of times the agent should retry the command when an error occurs while executing the command action. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5, which indicates that the agent will retry the command 5 times.
Error Retry Delay: This property allows users to specify the number of milliseconds the agent will delay a retry. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5000, which indicates that the agent will delay a retry of 5000 milliseconds.
Error Retry Proxy Rotation: The default value is set to “Keep and Rotate”. This property allows users to specify if the proxy should rotate before retrying the action and if the current proxy should be removed. This property is only applicable if “Error Handling ” is set to “Retry Command”.
None: Indicates that will not Rotate proxy before or after retrying the action.
Remove and Rotate: Indicates that the current proxy will be removed and Rotate proxy before retrying the action.
Remove and Rotate After Retry: Indicates that the current proxy will be removed and Rotate proxy after retrying the action.
Keep and Rotate: Indicates that current proxy will remain the same and Rotate proxy before retrying the action.
Events: A list of events to fire on the selected web element. The events can be mouseenter,
mouseover,mousemove,focus,mousedown,mouseup and click.Fixed Encoding: This property specifies the encoding when “Detect Encoding” is set to “Fixed”. You must reopen the browser window or the agent for this change to take effect.
Ignore Error Codes: Some websites return incorrect status codes, so this property allows users to ignore status codes by default unless the web-server is not returning any content.JSON Transformation: Regular expression used to transform JSON content loaded with a static parser.
Limit Number Of Scrolls: This property set to true allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to false.
Max AJAX Calls: Stops waiting for AJAX after the specified number of AJAX Calls. The default value is set to 10.
Maximum Number Of Scrolls: This property allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to “50”. This property is only applicable if “Limit Number Of Scrolls ” is set to “True”.
Never Open New Browser: This property set to True indicates never configure an action to open a web page in a new web browser when discovering an action. Some webpage may not function correctly when opening a link in a new browser. The default value is set to False.
No Parse: The web pages are not parsed if existing parsed pages exist. The default value is set to “False”.
Page Not Found Handling: Specifies the action to take if a web page was not found. The default is to do nothing and let normal error handling deal with the error.
Continue- If a web page not found, it continues to execute the agent.
ExitCommand- If a web page not found, it will exit from the agent.
Redirect First Request: This property allows users to redirect the first request to a new browser window when Target Browser is set to “New”, even if the first request is coming from a frame within the current browser window. If this property is set to “False”, requests from frames within the current browser window will not be redirected.
Rotate Proxies: The property set to “True” allows users to rotate the proxy before executing the action. The default value is set to False.
Scroll Steps: The number of pixels on the page will be scrolled in each step. The default value of “o” will scroll the page all the way to the bottom and all the way to the right in each step.
Scroll Until End of Page: This property set to True allows users to scroll to the end of the web page after an action. Scrolls repeatedly until unable to scroll any further, and wait for AJAX calls to complete if scrolling trigger AJAX calls. The default value is set to “False”.
Separate Parser Proxies: Creates a separate parser proxy loop if the action opens in a parser. The default value is set to “False”.
Target Browser: Specifies the web browser where a new web page should be loaded. These are the different options that can be used by a user:
New: Default value is set to New which specifies to load a web page in a new browser window.
Current: This specifies to load a web page in the same browser window.
Parent: This specifies to load a web page in the parent browser window.
Popup: This specifies to load a web page in a popup window.
Timeouts: Specifies timeout values for the action. Timeouts specified in activities override these values.
Ajax Completed: The default number of milliseconds to wait for an AJAX call to complete.
Ajax Content Render Delay: The default number of milliseconds to wait for ajax loaded content to render on a web page.
Ajax Content Render Delay After Scroll: The default number of milliseconds to wait for ajax loaded content to render on a web page after triggering a scroll. The command will stop waiting as soon as it can scroll the page further down. This property is different from “Ajax Content Render Delay” which always waits a fixed amount of time. A fixed timeout is slower but is required on some websites.
Asynchronous Completed: The default number of milliseconds to wait for an asynchronous action to complete.
Discover First Activity: The default number of milliseconds to wait for the first activity when discovering new activities.
Discover First URL Activity: The default number of milliseconds to wait for the first URL to start loading
Discover Next Activity: The default number of milliseconds to wait for the next activity when discovering new activities.
File Download Completed: The default number of milliseconds to wait for a file download to complete.
File Download Started: The default number of milliseconds to wait for a previous file download to complete to start downloading the next file.
Frame Completed: The default number of milliseconds to wait for frame content to complete loading. This timeout applies to internal frames, and only to external frames if the property WaitForExternalFrames is set to True.
JavaScript Parser Timeout: The number of milliseconds to wait for JavaScript DOM operations to complete
Main Page Redirect: The number of milliseconds to wait for the main page to redirect to another page.
Page Completed: The default number of milliseconds to wait for a page-load to complete.
Wait For Content Timeout: The default number of milliseconds to wait for web content to appear on a web page, or a URL to load that matches a specified Regex.
Wait Times: The default timeout values are multiplied by this value. This can be a quick way to test if issues with action are caused by timeout values being too short. Default timeouts are used when discovering activities, and when scrolling a page.
Capture
Act as System Value: Acts as a system value that is guaranteed to be present, and does not participate in an empty data row check. The default value is set to False.
Allow Empty Value: The default value is set to True which allows empty or missing values. Allow Empty set to False indicates that it will not allow null values.
Always Update Design Value: This property value set to True indicates that the design value is updated whenever possible, and not just when editing the command. The Default value is set to False.
Change Tracking: The default value is set to ‘Include’ specifies the captured content will cause change tracking to record a change. Change Tracking value set to Exclude indicates that capture commands can be excluded from change tracking, so if the captured data changes, it will not cause the last change status for the data entry to change.
Create Index: Creates an index in the internal database for the column holding this content. This can improve performance when a duplicate check is performed on this content.
Data Consumer: Specifies the input data to use when processing this command.
Captured Data Command: Specifies the previously captured data column name which you want to use as input data.
Data Source: The source of the data consumed.
Data Transformation Script: Data transformation script. The default value is set to Disabled which is reflected by the "Enabled" property value False. If you want to enable the data transformation script then you need to set to "Enabled" property value as True.
Input Parameter Name: Specifies the input parameter name to use.
Provider Column Name: Specifies a column from the data source that should provide the data to this command. specifies a command that provides data to the agent. A command can provide data to itself.
Provider Container: Specifies a command that provides data to this command.
Data Format: This property specifies the data format of the captured content.
Data Format Style: This property specifies the style of data format for captured content. The default value is set to ‘None’.
Data Type: The data type of captured content.
Short Text: All content will be captured as Short Text by default. Short Text content can be up to 4000 characters long.
Long Text: Long Text content can be any length, but cannot always be used in comparisons, so you may not be able to include Long Text content in duplicate checks.
Integer: A whole number.
Float: A floating-point number.
Date/Time: A date and/or time value.
Boolean: A value that can be true or false. Boolean values are stored as 1 or 0 integer values.
Binary: A variable-length stream of binary data ranging between 1 and 8,000 bytes.
Big Integer: A 64-bit signed integer.
Decimal: Represents a decimal floating-point number. A fixed precision and scale numeric value between -10 38 -1 and 10 38 -1.
GUID: A globally unique identifier (or GUID). A GUID is a 128-bit integer (16 bytes) that can be used across all computers and networks wherever a unique identifier is required. Such an identifier has a very low probability of being duplicated.
Document: The captured data is a document in binary form. This can be used in capture commands that stores a downloaded document from the web.
Image: The captured data is an image in binary form. Can be used in capture commands that store a downloaded image from the web.
Temporary: The captured data is not stored in the internal database, and also not exported. Can be used as temporary storage during agent run time.
Date/Time Conversion: This property specifies the possible options one can choose to allow for date/time conversion. for e.g. If we set AssumeLocalTime, then it explicitly assumes as local time i.e. whatever time is defined in the field using the script (UTC NOW/Universal or Only NOW/Local). On the other hand, If we set Universal LocalTime, then it explicitly assumes universal time i.e. whatever time is defined in the field using the script (UTC NOW/Universal or Only NOW/Local).
Decimal Precision: Specify decimal Precision. The default value is 19.
Decimal Scale: Specify decimal Scale. The default value is 4.
Design Value: The value to use for this capture command in the agent editor. This value can be important when testing scripts in the editor if the scripts depend on captured data.
Key Column: The captured content is used to identify a data entry if this option is set to true.
Make Data Available to Parent Commands: Copies the extracted data to all parent data rows, making the data available to parent commands executed after this command.
Max. Data Length: The maximum length of the captured content when using the Short Text data type. The maximum possible length depends on the chosen database. The default value is set to 4000.
Raise Validation Error: Default value is set to TRUE adds a page load error if value validation fails.
Transformation Script: A script used to transform the captured content.
Use Data Value: Captures a data value instead of a property of the selected web element. Web selection is ignored when this option is true.
User Defined Design Value: This property value set to True indicates the design value for this capture command is user-defined instead of set automatically when the command is saved.
Validation Time: The default value is set to Runtime specifies when data validation will take place. If you want to data validation at an export time instead of Run time then you can set this property value as “Export”.
Command
Command Description: A custom description for the command. The default value is Empty.
Command Transformation Script: A script used to change command properties at run time. The default value is disabled.
Disabled: This property set to True allows the user to disable the command. A disabled command will be ignored. The default value is set to False.
ID: This property indicates the internal unique ID of the command and is always auto- generated e.g. 58c8e4ac-e4c0-48f7-a63d-77064945380b.
Increase Data Count: This property indicates the data count every time this command is processed. The default value is set to False. Set it to TRUE if you want to get the count of the number of times a specific command is executed to get the data. The data count value is increased during data extraction, so it is used to measure agent progress and basis this increased data count, the agent decides the success criteria.
Name: This property specifies the name of the command.
Notify On Critical Error: A notification email is sent at the end of an agent run if the command encounters a critical error, and the agent has been configured to notify on critical errors. Critical errors include page load errors and missing required web selections. The default value is set to False.
Convert Document
Conversion Script: A script used to transform the downloaded document into an HTML page.
C# Script: Specifies C# script.
Enabled: To use the Script we need to set this Property as True. The default value is set to False, which indicates that the script is disabled.
Library Assembly File: The name of a custom assembly file when "Use Default Library" is set to false.
Library Method Name: The method to execute when using the default script library.
Library Method Parameter: A custom parameter passed to the script library method.
Python Script: Specifies Python script.
Regex Script: Specifies Regex script.
Script Language: Specifies the scripting language which you want to use e.g C#, VB.NET , Python, Script Library, Regular Expressions.
Template Name: The template name of the referenced template.
Template Reference: Loads this script from a template when the agent is loaded.
Use Default Library: Uses the default script library when Script Language is set to Script Library.
Use Selection: The script is provided with the selected web element. The script will not be provided with the selected web element if this value is False.
Use Shared Library: Uses a script library that is shared among all agents.
Convert to HTML: Uses a script to convert the downloaded document to an HTML page which can be processed later using URL command. The property is set to False by default.
Export Converted Document: Exports the downloaded document in addition to generating and exporting the converted HTML page.
Debug
Debug BreakPoint: Debugging will break at this command if the breakpoint is set. The default value is set to False.
Debug Disabled: A disabled command will be ignored during debugging. The default value is set to False.
Debug Error Option: This property specifies what action to take when an error occurs in the debugger. The default value is set to Notify which indicates that when an error occurs at debugging time, then it will be notified. If we want to ignore the error at debug time, then we need to set this property value as Ignore.
Export
Excel Column Width: Specifies the width of the data column holding the captured data when exporting to Excel or PDF.
Excel/PDF/CSV Column Format: Specifies the format of the data column holding the captured data when exporting to Excel, PDF, or CSV. For Excel and PDF, this format string is the same used in Excel under Custom format when formatting a cell. For CSV this is a standard .NET format string. This is useful in cases where one needs to apply a particular format like NUMBER, DATE, CURRENCY, etc.
In addition, it is to be noted that when the Export target is set to anything other than Excel, CSV, or PDF, any value under this property will not come into play.
Export Enabled: A command with Export Enabled set to false will not save any data to data output. The default value is set to True indicates that data will be output.
Merge Rows Method: When the parent list Container command option "Export Method" is set to "Add Columns And Merge Rows", this option specifies how to combine row values.
Merge Rows Value Separator: When "Merge Rows Method" is set to "Concatenate", this separator is used to separate the extracted values.
Sort Order: Specifies the order in which the column is listed when exporting to a file format.
File Capture
Auto Detect File Extension: Default value is set to True automatically detects the file extension of the downloaded file. Clear this option if you want a filename transformation script to set the file extension.
Download Timeout: The maximum amount of time waiting for a file to download (milliseconds). Default value is 50000 milliseconds.
Export URL: This property set to True Exports the URL along with the file. Default value is set to False which do not export the URL along with the file.
File Name Attribute: The web element attribute to use as file name for the downloaded file.
File Name Column Name: Specifies the name of the export column containing the file name. A default column name will be used if this property is empty.
File Name Data Consumer: Specifies the input data to use when "Use Data Value as File Name" is set to true.
Captured Data Command: Specifies the previously captured data column name which you want to use as input data.
Data Source: The source of the data consumed.
Data Transformation Script: Data transformation script. Default value is set to Disabled which is reflected by "Enabled" property value False. If you want to enable the data transformation script then you need to set to "Enabled" property value as True.
Input Parameter Name : Specifies the input parameter name to use .
Provider Column Name: Specifies a column from the data source that should provide the data to this command. specifies a command that provides data to the agent. A command can provide data to itself.
File Name Design Value: The value to use for the file name capture in the agent editor. This value can be important when testing scripts in the editor if the scripts depend on captured data.
File Name Transformation Script: A script used to transform the file name attribute used to name the download file.
Fixed File Extension: Adds a fixed file extension e.g. jpeg, jpg, or gif to the downloaded file.
Try Internet Cache: Retrieves the file from the Internet cache if it exists, instead of downloading it. This can be useful for some CAPTCHA images where the image is first downloaded by a web browser and the website does not allow a second download.
URL Column Name: If the URL is exported, this property specifies the name of the export column containing the URL. A default column name will be used if this property is empty.
Use Data Value as File Name: Uses a data value as file name instead of an attribute of the selected web element.
Use Original File Name: Uses the file name of the document when possible.
HTML Capture
Concatenate Content Separator: The separator such as comma, pipe etc. to use between content from multiple web elements. This property is only applicable when "Concatenate Multiple Web Elements" is set to True.
Concatenate Multiple Web Elements: Concatenates content if multiple web elements are selected. Only the first web element will be used if this value is set to False.
HTML Attribute: The web element attribute to capture.
Web Selection
Selection: The selection XPaths of the web elements associated with this command.
Paths: List of selection XPaths.
Path: The selection XPath.
Select Hidden Elements: Selects visible and disabled elements when true. Otherwise selects only visible and enabled web elements.
Selection Missing Option: Specifies what happens if this selection does not exist on the current page.
Default: Specifies if this selection does not exist in the current page then logs an error.
Ignore Command but Execute Sub-Commands: Specifies if this selection does not exist in the current page then it ignores the current command , but executes sub-commands of the command.
Ignore Command: Specifies if this selection does not exist in the current page then it ignores the current command as well as sub-commands.
Log Error and Ignore Command: Specifies if this selection does not exist in the current page then it ignores the current command as well as sub-commands and logs an error message.
Log Warning and Ignore Command: Specifies if this selection does not exist in the current page then it ignores the current command as well as sub-commands and logs a warning message. Note: Warning message will be logged if , Log level is set to either ‘Low’ or ‘High’.
Log PageLoad Error and Ignore Command: Specifies if this selection does not exist in the current page then it ignores the current command as well as sub-commands and logs a Page Load error.