Download Document

Download Document extracts a document from a web page. The command will download a document, and then save it to the file system or send it to a database - depending on your chosen export target.

See the Download Document for configuring this command and setting up its important settings.

Example

Below screenshot shows the simplest example where the Download Document command can be used.

DownloadDocumentExample.png

The web selection path for this command normally points to the document link itself, but it could also point to a web element that contains information from which the document URL can be derived by using Content Transformation. 

Data Fields

If the agent is saving the document to a database, then by default this command will generate two data fields: one for the document binary data and another for the name of the document. If the agent is saving the document to the file system, then the command will generate only one data field containing the full file path to the document. The command property Export URL can be used to also generate a data field that contains the document URL.

Command Configuration

The Common tab in the Configure Agent Command panel has three tabs:

  • File URL - contains the URL for the file.
  • File Name - contains the name of the downloaded file.
  • Convert to HTML - specifies if the downloaded document should be converted to HTML.

We explain the details of each below.

File URL

The entry in this tab determines the specific URL for the file, and the agent uses this URL to download the document at run time.

You can choose the HTML attribute that the command should extract to get the document URL. The default value is URL, which extracts the href HTML attribute (if the chosen web element is a link).

Click the Transformation Script button to enter a Regular Expression or write a .NET script that will transform the document URL to meet your requirements. See the Content Transformation Script topic for more information.

Use the Data Value option to specify that an agent data value will be used as a file URL. The agent data can come from a data provider, an input parameter or captured data.

File Name

The entry in this tab contains the file name. From the drop-down menu, you can choose the HTML attribute that you want to use as the name.

Click the Transformation Script button to enter a Regular Expression or write a .NET script that will transform the document name to meet your requirements. See the Content Transformation Script topic for more information.

Use the Data Value option to specify that an agent data value will be used as file name. The agent data can come from a data provider, an input parameter or captured data.

Use the Detect File Extension option to specify if the agent should try and detect the file type of the downloaded document, or if a transformation script or a data value will provide a file name that includes a file extension.

Convert To HTML

A downloaded document can be converted into an HTML page as it's being downloaded, and a URL command can later be used to open the HTML page. Capture commands can then be used to extract data from the HTML page.

ConvertToHTML.png

Please see Extracting Data From Non-HTML Documents for more information.

Click To Download

Check this box when no direct URL is available for the document, but it is necessary to download the document by clicking on a web element - such as a button. When you enable this option:

  • The File URL tab becomes unavailable
  • Sequentum Enterprise will assign a unique identifier as the file name
  • You will have access to the Action configuration tab where you can fine-tune the behavior of the action.

See the topic Action Configuration for more information.

Command Properties

Action

Download Method: Specifies how the document is downloaded. The file can be downloaded using the extracted URL directly, using the extracted URL in a dynamic web browser, using an action on the selected web element, or as a result of an action from a parent field.

  • Download From URL: Downloads a document from URL.
  • Use Web Browser: Downloads a document using web browser.
  • Click To Download: When no direct URL is available for the document, but it is necessary to download the document by clicking on a web element - such as a button. 
  • Download Header Only: Downloads headers only.

File Download Action: Action configuration when the Download Method property is set to Click to Download.

    • Action Type: Specifies the action to perform when clicking on a web element.
    • Activities: Specifies how this action should wait for the browser activities to complete.
      • Wait for Content: The default value is set to Optional. Waits for web selections and URLs specified by this property value required, optional, no wait.
      • Wait for External Sub-Page Load: The default value is set to No Wait or Parse. Specifies if the command should wait for one or more page loads to occur that originate from different domains than the main page.
      • Wait for External Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async scripts loads to complete on subpages from different domains than the main page.
      • Wait for Internal Sub-Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on sub-pages from the same domain as the main page.
      • Wait for Internal Sub-Page Load: The default value is set to Optional. Specifies if the command should wait for one or more page loads to occur in browser frames. Will only wait for pages from the same domain as the main page.
      • Wait for Internal Sub-Page Scripts: The default value is set to False. Specifies if the command should wait for one or more async scripts loads to complete on subpages from the same domain as the main page.
      • Wait for Main Page AJAX: The default value is set to True. Specifies if the command should wait for one or more AJAX requests to complete on the main page in the browser.
      • Wait for Main Page Load: The default value is set to Required. Specifies if the command should wait for one or more full page loads to occur in the browser.
    • Add Force Refresh Header: Adds an “If-Modified-Since” header to the web request to make sure the web page is not retrieved from cache. The default value is set to false.
    • Block Known Ad Servers: The web page will not load content from known ad servers, such as “ad.doubleclick.net”. This speeds up processing slightly.
    • Block Popup: Default value is set to False. This property set to True allows the user to blocks any popup opened by the action.
    • Blocked Events: A list of events that are blocked in design and debug mode. These events are only blocked if the document the mode is greater than 9.
    • Break HTML Area: Specifies whether the action breaks an HTML area. Actions that load a new page always break HTML areas.
    • Browser Mode: This property specifies the different types of browser that can be used to run an agent·
      • Dynamic Browser: The browser functions as a standard web browser, and it download images and execute JavaScript.
      • HTML Parser: The HTML Parser doesn’t execute JavaScript and does not load frames, so it is faster and more reliable than a Dynamic Browser. However, the parser doesn’t work on websites that rely on JavaScript, and the parser may also be unable to some web forms (even when they don’t rely on JavaScript).
      • J SON Parser: This property does not start a new browser. Instead, it parses J SON content returned by a web server, and lets you easily extract content elements from the J SON content.
      • XML Parser: This property does not start a new browser. Instead, it parses XML content returned by a web server, and lets you easily extract content elements from the XML content.
    • Capture Requests(Regex): A regular expression matching URLs which requests should be captured and made available for scripting. Multiple regular expressions separated by line breaks can be specified.
    • Clear Storage: This property allows users to clear the website’s cookies and local storage. The Default value indicates that no storage will be cleared. This property when set, works on the commands such as Navigate_URL, Navigate_Link and not on the overall agent.
      • Default: A value of default will delete all cookies and local storage if using in-memory cache, but only site cookies and local storage is using a persistent cache.
      • None: Indicates that will not delete cookies and storage.
      • All Cookies & Storage: It will delete all cookies and local storage if using in-memory cache, but only site cookies and local storage is using a persistent cache.
      • Site Cookies & Storage: It will delete only website cookies and local storage if using in-memory cache, but only site cookies and local storage is using a persistent cache.
      • All Cookies: It will delete all cookies only.
      • Site Cookies: It will delete website cookies only.
      • Site Session Cookies: It will delete site session cookies only.
    • Close Browser After Use: Default value is set to False. This property set to True allows user to close any browser after use. This option is only applicable if the action opens a new browser.
      Command Returning Max Scrolls: Limit the number of scrolls to a number captured by the specified command. This option can be useful when a page can scroll indefinitely.
    • Custom Request Headers: Custom headers are sent with all requests made by a browser. Headers must be sent on each new browser.
    • Default Events: A default list of events are fired on the selected web element. Default value is set to True.
    • Detect Encoding: This property specifies how a new parser should detect the encoding of content received from a web server. If the option is set to Default, this action uses the same detection method as the parent parser. If there’s no parent action, The default detection method is “Content & Server”. You must reopen the browser window or the agent for this change to take effect.
    • Discover Action: This property set to True configures action properties automatically when the command is first executed.Default value is False.
    • Editor Action: Specifies the web element or URL to use when performing the action in the design browser. Default value is “Default” which is indicated by “Use Specific URL” value False. To use the Specific URL, we set the “Use Specific URL” value True which allows users to use a specific URL.
      • URL: Specifies the URL to load.
      • Use Specific URL: Specifies whether to load a direct URL. This value is set to False.
    • Error Handling: This property specifies how the agent should react when an error occurs while executing the command action. The default reaction is to exit the command. Use the option “No Error Handling” if you want the agent to continue executing sub-commands after an error . You can handle the error in sub-commands by using the script parameter “IsParentActionError”. This property specifies the different ways of Error Handling.
      • Exit Command: The agent command will exit the action command and continue executing the next command. The agent will skip all the sub-commands of the action command.
      • Retry Command: The agent command will retry the action command a specified number of times, and if the action command does not succeed, it will skip all sub-commands of the action command and continue executing the next command. Set the property Retry Count to specify the number of retries. If Retry Count is set to zero, the agent will keep retrying the command indefinitely.
      • Retry With No Error Handling: The agent command will retry the specified number of times, but then continue with no error handling.
      • Restart and Resume Agent: The agent command will restart and resume where it left off. This option is useful if an error puts the website into a state where the agent cannot continue.
      • Restart Agent and Retry Command : The agent command will retry the action command a specified number of times, and if the action command does not succeed, then it will Restart the Agent.
      • Stop Agent: The agent will stop.
    • Error Retry Clear On Success: This property clears the counter if the action succeeds. The default value is set to True.
    • Error Retry Count: This property allows users to specify the number of times the agent should retry the command when an error occurs while executing the command action. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5, which indicates that the agent will retry the command 5 times.
    • Error Retry Delay: This property allows users to specify the number of milliseconds the agent will delay a retry. This property is only applicable if “Error Handling ” is set to “Retry Command”. The default value is set to 5000, which indicates that the agent will delay a retry 5000 milliseconds.
    • Error Retry Proxy Rotation: The default value is set to “Keep and Rotate”. This property allows users to specify if the proxy should rotate before retrying the action and if the current proxy should be removed. This property is only applicable if “Error Handling ” is set to “Retry Command”.
        • None: Indicates that will not Rotate proxy before or after retrying the action.
        • Remove and Rotate: Indicates that current proxy will be removed and Rotate proxy before retrying the action.
        • Remove and Rotate After Retry: Indicates that current proxy will be removed and Rotate proxy after retrying the action.
        • Keep and Rotate: Indicates that current proxy will remain same and Rotate proxy before retrying the action.
    • Events: A list of events to fire on the selected web element. The events can be mouseenter,
      mouseover,mousemove,focus,mousedown,mouseup and click.
    • Fixed Encoding: This property specifies the encoding when “Detect Encoding” is set to “Fixed”. You must reopen the browser window or the agent for this change to take effect.
      Ignore Error Codes: Some websites return incorrect status codes, so this property allows users to ignore status codes by default unless the web server is not returning any content.

    • JSON Transformation: Regular expression used to transform JSON content loaded with a static parser.

    • Limit Number Of Scrolls: This property set to true allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to false.

    • Max AJAX Calls: Stops waiting for AJAX after the specified number of AJAX Calls. The default value is set to 10.

    • Maximum Number Of Scrolls: This property allows users to limit the number of scrolls to a specified number. This property can be useful when a page can scroll indefinitely. The default value is set to “50”. This property is only applicable if “Limit Number Of Scrolls ” is set to “True”.

    • Never Open New Browser: This property set to True indicates never configure an action to open a web page in a new web browser when discovering an action. Some webpage may not function correctly when opening a link in a new browser. The default value is set to False.

    • No Parse: The web pages are not parsed if existing parsed pages exist. The default value is set to “False”.

    • Page Not Found Handling: Specifies the action to take if a web page was not found. The default is to do nothing and let normal error handling deal with the error.
      • Continue- If a web page not found, it continues executes to the agent.
      • ExitCommand- If a web page not found, it will exit from the agent.
    • Redirect First Request: This property allows users to redirect the first request to a new browser window when Target Browser is set to “New”, even if the first request is coming from a frame within the current browser window. If this property is set to “False”, requests from frames within the current browser window will not be redirected.
    • Rotate Proxies: The property set to “True” allows users to rotate the proxy before executing the action. The default value is set to False.
    • Scroll Steps: The number of pixels the page will be scrolled in each step. The default value of “o” will scroll the page all the way to the bottom and all the way to the right in each step.
    • Scroll Until End of Page: This property set to True allows users to scroll to the end of the web page after an action. Scrolls repeatedly until unable to scroll any further, and wait for AJAX calls to complete if scrolling trigger AJAX calls. The default value is set to “False”.
    • Separate Parser Proxies: Creates a separate parser proxy loop if the action opens in a parser. The default value is set to “False”.
    • Target Browser: Specifies the web browser where a new web page should be loaded. These are the different options which can be used by a user:

      • New: Default value is set to New which specifies to load a web page in a new browser window.
      • Current: This specifies to load a web page in the same browser window.
      • Parent: This specifies to load a web page in the parent browser window.
      • Popup: This specifies to load a web page in a popup window.
    • Timeouts: Specifies timeout values for the action. Timeouts specified in activities override these values.

        • Ajax Completed: The default number of milliseconds to wait for an AJAX call to complete.
        • Ajax Content Render Delay: The default number of milliseconds to wait for ajax loaded content to render on a web page.
        • Ajax Content Render Delay After Scroll: The default number of milliseconds to wait for ajax loaded content to render on a web page after triggering a scroll. The command will stop waiting as soon as it can scroll the page further down. This property is different from “Ajax Content Render Delay” which always waits a fixed amount of time. A fixed timeout is slower but is required on some websites. 
        • Asynchronous Completed: The default number of milliseconds to wait for an asynchronous action to complete.
        • Discover First Activity: The default number of milliseconds to wait for the first activity when discovering new activities.
        • Discover First URL  Activity: The default number of milliseconds to wait for the first URL to start loading
        • Discover Next Activity: The default number of milliseconds to wait for the next activity when discovering new activities.
        • File Download Completed: The default number of milliseconds to wait for a file download to complete.
        • File Download Started: The default number of milliseconds to wait for a previous file download to complete to start downloading the next file.
        • Frame Completed: The default number of milliseconds to wait for frame content to complete loading. This timeout applies to internal frames, and only to external frames if the property WaitForExternalFrames is set to True.
        • JavaScript Parser Timeout: The number of milliseconds to wait for JavaScript DOM operations  to complete
        • Main Page Redirect: The number of milliseconds to wait for the main page to redirect to another page.
        • Page Completed: The default number of milliseconds to wait for a page load to complete.
        • Wait For Content Timeout: The default number of milliseconds to wait for web content to appear on a web page, or a URL to load that matches a specified  Regex.
        • Wait Times: The default timeout values are multiplied by this value. This can be a quick way to test if issues with action are caused by timeout values being too short. Default timeouts are used when discovering activities, and when scrolling a page.

Capture

Act as System Value: Acts as a system value that is guaranteed to be present, and does not participate in an empty data row check. Default value is set to False.

Allow Empty Value:Default value is set to True which  allow empty or missing values. Allow Empty set to False indicates that it will not  allow null values .

Always Update Design Value: This property value  set to True indicates that the  design value is updated whenever possible, and not just when editing the command. The Default value is set to False.

Change Tracking: Default value is set to ‘Include’ specifies the captured content will cause change tracking to record a change. Change Tracking value set to Exclude  indicates that capture commands can be excluded from change tracking, so if the captured data changes, it will not cause the last change status for the data entry to change.

Create Index: Creates an index in the internal database for the column holding this content. This can improve performance when a duplicate check is performed on this content.

Data Consumer: Specifies the input data to use when processing this command.

  • Captured Data Command: Specifies the previously captured data column name which you want to use as  input data .
  • Data Source:  The source of the data consumed.
  • Data Transformation Script: Data transformation script. Default value is set to Disabled which is reflected by "Enabled" property value False. If you want to enable the data transformation script then you need to set to "Enabled" property value as True.
  • Input Parameter Name : Specifies the input parameter name to use .
  • Provider Column Name: Specifies a column from the data source that should provide the data to this command. specifies a command that provides data to the agent. A command can provide data to itself.
  • Provider Container: Specifies a command that provides data to this command.

Data Format: This property specifies the  data format of captured content.

Data Format Style: This property specifies the style of  data format for captured content. Default value is set to ‘None’.

Data Type: The data type of captured content.

  • Short Text: All content will be captured as Short Text by default. Short Text content can be up to 4000 characters long.
  • Long Text: Long Text content can be any length, but cannot always be used in comparisons, so you may not be able to include Long Text content in duplicate checks.
  • Integer: A whole number.
  • Float: A floating point number.
  • Date/Time: A date and/or time value.
  • Boolean: A value that can be true or false. Boolean values are stored as 1 or 0 integer values.
  • Binary: A variable-length stream of binary data ranging between 1 and 8,000 bytes.
  • Big Integer: A 64-bit signed integer.
  • Decimal: Represents a decimal floating-point number. A fixed precision and scale numeric value between -10 38 -1 and 10 38 -1.
  • GUID: A globally unique identifier (or GUID).A GUID is a 128-bit integer (16 bytes) that can be used across all computers and networks wherever a unique identifier is required. Such an identifier has a very low probability of being duplicated.
  • Document: The captured data is a document in binary form. This can be used in capture commands that stores a downloaded document from the web.
  • Image:  The captured data is an image in binary form. Can be used in capture commands that stores a downloaded image from the web.
  • Temporary: The captured data is not stored in the internal database, and also not exported. Can be used as temporary storage during agent run time.

Date/Time Conversion: This property specifies the possible options one can choose to allow for date/time conversion. for e.g. If we set AssumeLocalTime, then it explicitly assumes as local time i.e. whatever time is defined in the field using script (UTC NOW/Universal or Only NOW/Local). On the other hand, If we set Universal LocalTime, then it explicitly assumes as universal time i.e. whatever time is defined in the field using script (UTC NOW/Universal or Only NOW/Local).

Decimal Precision: Specify decimal Precision. Default value is 19.

Decimal Scale: Specify decimal Scale. Default value is 4.

Design Value: The value to use for this capture command in the agent editor. This value can be important when testing scripts in the editor if the scripts depend on captured data.

Key Column: The captured content is used to identify a data entry if this option is set to true.

Make Data Available to Parent Commands: Copies the extracted data to all parent data rows, making the data available to parent commands executed after this command.

Max. Data Length: The maximum length of captured content when using the Short Text data type. The maximum possible length depends on the chosen database. Default value is set to 4000.

Raise Validation Error: Default value is set to TRUE adds a page load error if value validation fails.

Transformation Script: A script used to transform the captured content.

Use Data Value: Captures a data value instead of a property of the selected web element. The web selection is ignored when this option is true.

User Defined Design Value: This property value set to True indicates the design value for this capture command is user defined instead of set automatically when the command is saved.

Validation Time: Default value is set to Runtime specifies when data validation will take place. If you want to data validation at export time instead of Run time then you can set this property value as “Export” . 

Command

Command Description: A custom description for the command.Default value is Empty.

Command Transformation Script:  A script used to change command properties at run time. The default value is disabled.

Disabled: This property set to True allows user to disable the command. A disabled command will be ignored. The default value is set to False.

ID:  This property indicates the internal unique ID of the command and is always auto- generated e.g. 58c8e4ac-e4c0-48f7-a63d-77064945380b.

Increase Data Count: This property indicates the data count every time this command is processed. The default value is set to False. Set it to TRUE if you want to get the count of the number of times a specific command is executed to get the data. The data count value is increased during data extraction, so it is used to measure agent progress and basis this increased data count, the agent  decides the success criteria.

Name:  This property specifies the name of the command.

Notify On Critical Error:  A notification email is sent at the end of an agent run if the command encounters a critical error, and the agent has been configured to notify on critical errors. Critical errors include page load errors and missing required web selections. Default value is set to False.

Download Page

Download Type : Specifies the type of download pages to save as PDF or HTML.

File Name Column Name : Specifies the name of the export column containing the file name . A default column name will be used if this property is empty.

File Name Design Value : The value to use for the file name capture in the agent editor. This value can be important when testing scripts in the editor, if the scripts depend on captured data.

File Name Page Attribute: The web element attribute to use as file name for the downloaded file.

File Name Transformation Script: A script used to generate the file name for the download page.

Export

Excel/PDF/CSV Column Format: Specifies the format of the data column holding the captured data when exporting to Excel, PDF or CSV. For Excel and PDF this format string is the same used in Excel under Custom format when formatting a cell. For CSV this is a standard .NET format string. This is useful in cases where one needs to apply particular format like NUMBER, DATE, CURRENCY etc.
In addition, it is to be noted that when the Export target is set to anything other than Excel, CSV or PDF, any value under this property will not come into play.

Excel Column Width: Specifies the width of the data column holding the captured data when exporting to Excel or PDF.

Export Enabled: A command with Export Enabled set to false will not save any data to data output. Default value is set to True indicates that data will be output.

Merge Rows Method: When the parent list Container command option "Export Method" is set to "Add Columns And Merge Rows", this option specifies how to combine row values.

Merge Rows Value Separator: When "Merge Rows Method" is set to "Concatenate", this separator is used to separate the extracted values.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.