Using the HTML Parser

When Sequentum Enterprise extracts data from a web page, it first loads the web page into a web browser. The web browser parses and renders the web page, and executes any JavaScript the page contains. This is a very safe approach since Sequentum Enterprise uses a customized version of Chromium as it's web browser. Therefore if the target website is working in Chromium, Sequentum Enterprise can usually extract data from the website. However, the approach is also slow and may cause instability.

If you have been using Chromium to browse the web, you may have sometimes experienced problems, such as hanging websites or program crashes. This may occur very rarely (say once a year), so it may not be a problem during normal usage of Chromium. When Sequentum Enterprise uses Chromium to browse a website, it may access more web pages in a few hours than you access in a year, so stability issues are magnified significantly.

The main source of website instability is JavaScript. A website developer can use JavaScript to implement dynamic features on the website, but JavaScript bugs may lead to memory leaks, hanging websites or even program crashes.

All action commands in a Sequentum Enterprise agent that opens a new web browser, can be configured to open a specific type of web browser. The default browser is a customized version of Chromium, but you can change this to a HTML Parser. The HTML Parser does not use Chromium at all, and it completely ignores JavaScript, so it's generally much more reliable.

JavaScript is always single threaded, so many operations cannot be performed simultaneously when using Chromium web browsers. Since the Static Parser does not execute JavaScript, it can often process web pages much faster than a Chromium web browser.

Many websites don't work properly if JavaScript is disabled, so the HTML Parser will not work for all websites, but many websites can be partly processed with an HTML Parser, so you should always switch to an HTML Parser if a particular web page can be processed without JavaScript.

Parser-HTML-2.gif

Configure an Action command to use a specific web browser type

If you want an agent to use the HTML Parser by default, then you can set the web browser type on the Agent Settings > Browser > HTML Parser:

Parser-HTML.gif

Setting up Default HTML Parser

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.