Activity Monitor and Web Request Editor
Many modern websites use web frameworks that separate layout from data. JavaScript is used to generate the final web page you view in your web browser. When you open a modern website it often only loads some layout and JavaScript initially, and then loads the data asynchronously afterward and updates the data into the layout.
Simple web scraping tools that don’t execute JavaScript will not be able to extract data from these websites at all, and even advanced tools will have difficulties with many of these websites unless the web scraping bots are carefully created.
Advanced web scraping tools can use embedded web browsers to load websites and execute JavaScript, so they will be able to process most modern websites, but web browsers are very slow and are known to crash occasionally, so you should avoid using them whenever possible. Furthermore, many modern websites load data asynchronously, sometimes depending on how you scroll down a web page. High-end web scraping tools can deal with these scenarios, but it can be very difficult to create reliable bots for such websites, and they’re certain to be very slow.
The solution lies in the asynchronous calls modern websites make to load data. The web server functionality that provides the data is often called a Web API, so the asynchronous calls are often referred to as Web API requests. The Web API normally provides structured data in JSON format which is very easy to work with, and the Web API requests are very fast compared to loading a full web page. Sequentum Desktop has a built in activity monitor that allows you to interact with the website's API.
The activity monitor is located in the bottom right corner of Sequentum Desktop. Once opening it you will find several columns with information about the site and behind the scenes requests that help the site run. Activity is the name of the process and gives a brief description as to the status of the request. The Url is as expected, the URL used by the site to deliver data to and from the server. The AJAX content is an extraction of the AJAX content POSTed to the URL (if there is any.) Lastly, the timing indicates when each call was made.
From the activity monitor you can do several things but one of the most common techniques is to edit a web request. To access the web request editor you simply double click on the desired web request and it brings up this window.
The URL where the web request is sent to. The headers are a required part of the given websites API and most are necessary to make the web request. The POST Data is mostly what can be manipulated so that the client computer receives data to their specifications.
Press on the Test button in the bottom left corner of the Web Request Editor in order to test the web request to guarantee that the request is working properly and generates the desired output.