Skip to main content

Automating Google’s reCAPTCHA v2 with Sequentum Enterprise

Be careful when using Sequentum Enterprise for automating a captcha.  There is no legal precedent clarifying whether captcha should be considered a technological barrier under the CFAA or whether information that resides behind a captcha is in the public domain.  It’s up to the operator to determine whether automating a captcha is aligned with your organization's compliance guidelines.

Most people working with web scraping have come across web pages protected by Google’s reCAPTCHA v2. Recent versions of this CAPTCHA are nearly impossible to automate using image or voice recognition, and will require manual processing to bypass. However, just because manual work is required, it doesn’t mean YOU have to do the manual work yourself. You can outsource this manual process to companies that specialize in automating CAPTCHAs and who maintain staff 24/7 to solve these CAPTCHAs for a low fee (approx $3 per 1000 CAPTCHAs).

In this blog post I’ll show how to use the http://2CAPTCHA.com  and Death By Captcha services with Sequentum Enterprise to automate Google’s reCAPTCHA v2. I’ll use Google’s reCAPTCHA v2 demo website for this exercise. This website requires you to click on the familiar “I’m not a robot” checkbox to submit a web form. The web page looks like this:

reCaptchaFirstPage.png

reCAPTCHA v2 can be implemented on a web page in many different ways, and each different implementation requires special consideration, which is why there’s no single command in Sequentum Enterprise that can be used to resolve reCAPTCHA v2. In this blog post I’ll only deal with the most common way reCAPTCHA v2 is used on a web page. You can read more about reCAPTCHA v2, and the different ways it’s implemented on web sites, on the http://2CAPTCHA.com website (https://2captcha.com/2captcha-api#solving_recaptchav2_new ).

Resolving reCAPTCHA with the 2CAPTCHA.com Service

Here’s the process required to resolve reCAPTCHA v2 with the 2CAPTCHA.com service (as described on their website).

Step 1

Find a link that begins with www.google.com/recaptcha/api2/anchor, or find a HTML tag with the parameter “data-sitekey”.

Step 2

Copy the value of the “k” parameter of the link (or the value of the data-sitekey parameter).

Step 3

Submit a request to our API with URL http://2captcha.com/in.php and method set to “userrecaptcha” and provide the value found on previous step as value for googlekey and full page URL as value for pageurl. Example:

http://2captcha.com/in.php?key=1abc234de56fab7c89012d34e56fa7b8&method=userrecaptcha&googlekey=6Le-wvkSVVABCPBMRTvw0Q4Muexq1bi0DJwx_mJ-&pageurl=http://mysite.com/page/with/recaptcha?appear=1&here=now

Step 4

If everything is fine, the server will return the ID of your CAPTCHA as JSON {“status”:1,”request”:”2122988149″}.

Step 5

Make a 15-20 seconds timeout then submit a request to our API with URL http://2captcha.com/res.php to get the result.

If CAPTCHA is already solved, the server will return JSON with a token that looks like this:

03AHJ_Vuve5Asa4koK3KSMyUkCq0vUFCR5Im4CwB7PzO3dCxIo11i53epEraq-uBO5mVm2XRikL8iKOWr0aG50sCuej9bXx5qcviUGSm4iK4NC_Q88flavWhaTXSh0VxoihBwBjXxwXuJZ-WGN5Sy4dtUl2wbpMqAj8Zwup1vyCaQJWFvRjYGWJ_TQBKTXNB5CCOgncqLetmJ6B6Cos7qoQyaB8ZzBOTGf5KSP6e-K9niYs772f53Oof6aJeSUDNjiKG9gN3FTrdwKwdnAwEYX-F37sI_vLB1Zs8NQo0PObHYy0b0sf7WSLkzzcIgW9GR0FwcCCm1P8lB-50GQHPEBJUHNnhJyDzwRoRAkVzrf7UkV8wKCdTwrrWqiYDgbrzURfHc2ESsp020MicJTasSiXmNRgryt-gf50q5BMkiRH7osm4DoUgsjc_XyQiEmQmxl5sqZP7aKsaE-EM00x59XsPzD3m3YI6SRCFRUevSyumBd7KmXE8VuzIO9lgnnbka4-eZynZa6vbB9cO3QjLH0xSG3-egcplD1uLGh79wC34RF49Ui3eHwua4S9XHpH6YBe7gXzz6_mv-o-fxrOuphwfrtwvvi2FGfpTexWvxhqWICMFTTjFBCEGEgj7_IFWEKirXW2RTZCVF0Gid7EtIsoEeZkPbrcUISGmgtiJkJ_KojuKwImF0G0CsTlxYTOU2sPsd5o1JDt65wGniQR2IZufnPbbK76Yh_KI2DY4cUxMfcb2fAXcFMc9dcpHg6f9wBXhUtFYTu6pi5LhhGuhpkiGcv6vWYNxMrpWJW_pV7q8mPilwkAP-zw5MJxkgijl2wDMpM-UUQ_k37FVtf-ndbQAIPG7S469doZMmb5IZYgvcB4ojqCW3Vz6Q

If CAPTCHA is not solved yet, the server will return JSON with result CAPCHA_NOT_READY. Retry your request after 5 seconds.

Step 6

Locate the form field with ID “g-recaptcha-response” and set the form field with the token value retrieved in the previous step.

Step 7

Submit the web form.

Creating a Sequentum Enterprise Agent (2CAPTCHA.com)

https://youtu.be/O3pADpAR2Vs

I now want to create a Sequentum Enterprise agent that performs the steps required to work with the http://2CAPTCHA.com service. This is not as hard as it may first look, but it does require some investigative work to find the HTML tag that contains the “data-sitekey” parameter and the form field that has the ID “g-recaptcha-response”. These HTML tags are hidden tags, so you need to look in the HTML source code, or use the Nodes panel in the Sequentum Enterprise editor.

Step 1

To find the HTML tag that contains the data-sitekey parameter, I’ve selected the reCAPTCHA box in the Sequentum Enterprise editor, and then opened the Nodes panel to find the correct HTML tag. I’ve also found the form field with ID “g-recaptcha-response”, which is requires in a later step.

reCaptchaSiteKey.png

Step 2

I need the data-sitekey parameter and the page URL in my first call to the 2CAPTCHA.com service, so I’ll add two Capture commands to capture those two data values.

I need to use the data from these two commands as parameters to the call I’ll make to the 2CAPTCHA.com service, so to make things easier at design time, I’ll set the option “Update Design Value Automatically” on both Capture commands, so next time I open the agent in the editor, the design time values are extracted and updated. Otherwise, next time I open the agent in the editor, I would get the data-sitekey from when I created the agent, and that would not work, since the data-sitekey parameter changes every time I load the page.

My agent now looks like this:

reCaptchaCommands1.png

Step 3

I now need to make the call to the 2CAPTCHA.com service. The service returns JSON, so I’ll add a URL command that opens a new JSON parser.

I want to load a URL that uses the data from my Capture commands as parameters. The easiest way to do this is to use a URL template like this:

http://2captcha.com/in.php?key={$key}&method=userrecaptcha&json=1&googlekey={$sitekey}&pageurl={$pageUrl}

I can then use the Content Transformation “insert_data” to replace the templates {$sitekey} and {$pageUrl} with extracted data. I also need to specify the authentication key, which is provided to you when sign up for the 2CAPTCHA.com service. I’ve added the authentication key as an Input Parameter, so the {$key} will be replaced with the authentication key specified as input parameter to the agent.

My agent now looks like this:

reCaptchaCommands2.png

Step 4

I’ll now execute the URL command that makes the first API call, and that should open a new JSON parser with JSON that looks like this:

reCaptchaApi1.png

Step 5

I now need to make a second API call with the request ID as a parameter, so I need to add a Capture command that extracts the request ID from the JSON Parser. Again, the request ID changes every time I process a new CAPTCHA, so I set the Capture option “Update Design Value Automatically” to make sure the design time value is always current.

Step 6

I’ll now add a URL command that makes the second API call and uses the request ID as a parameter. Again, I use a URL template and a Content Transformation script to insert the request ID and the authentication key.

The second call looks like this:

http://2captcha.com/res.php?key={$key}&action=get&json=1&id={$requestId}

Step 7

If the CAPTCHA has not yet been resolves (it can take up to a minute or more, since there are real people involved), then you’ll see a result like this:

reCaptchaJson1.png

I now have to wait for a little while and try again. I’ll add an “Exit or Retry” sub-command that pauses for 5 seconds and retries the parent command if the result is CAPTCHA_NOT_READY. I’ll keep retrying 15 times before exiting. This gives the http://2CAPTCHA.com service more than a minute to resolve the CAPTCHA, which seems to work for me, but you may have to increase this number in some situations.

Step 8

Once the CAPTCHA has been resolved, you’ll see a JSON response like this:

reCaptchaApi2.png

I’ll add a Capture command to extract the CAPTCHA token. I need the captured value to be available to the parent commands that are processing the original web form, so I set the Capture option “Make Data Available to Parents”, and I also set the Capture option “Update Design Value Automatically”, so the extracted value is always current at design time.

The agent commands dealing with the JSON responses  from the 2CAPTCHA.com service now look like this:

resolveCaptcha1.png

Step 9

I can now go back to the web form and add a Form Field command that sets the value of the form field with ID “g-recaptcha-response” with the CAPTCHA token extracted in step 8.

Step 10

Finally, I’ll add a Link command that clicks on the Submit button to submit the form, and you should see the following page when you execute the command:

reCaptchaFinalPage.png

My final agent looks like this:

finalAgent.png

You can download the agent here:

googleRecaptchaTest (2).scgx

googleRecaptchaTest.scgx

This agent is also attached with this article at the bottom.

To run this agent, import the agent in the Sequentum Enterprise editor and assign your 2CAPTCHA.com authentication key to the agent input parameter named key.

 

Resolving reCAPTCHA with the Death By Captcha Service

Here’s the process required to resolve reCAPTCHA v2 with the Death By Captcha service. To simplify the process, it is recommended to use Death By Captcha API which can be downloaded from their website.

Step 1

Find a link that begins with www.google.com/recaptcha/api2/anchor, or find a HTML tag with the parameter “data-sitekey”.

Step 2

Copy the value of the “k” parameter of the link (or the value of the data-sitekey parameter).

Step 3

Write a script to solve the reCAPTCHA. See example below in C#.

// Do not forget to reference DeathByCaptcha.dll in your project!
using DeathByCaptcha;

string userName = args.GlobalData["username"].ToString(); //Your Death By Captcha username
string password = args.GlobalData["password"].ToString(); //Your Death By Captcha password
string googleKey = args.DataRow["siteKey"].ToString();
string pageUrl = args.DataRow["pageUrl"].ToString();

Client client = (Client)new HttpClient(userName, password);

//Create the Json payload, Put the Site url and Sitekey here.
string tokenParams = "{\"googlekey\": \"" + googleKey + "\"," +
"\"pageurl\": \"" + pageUrl + "\"}";

// Upload a CAPTCHA and poll for its status. Put the Token CAPTCHA
// Json payload, CAPTCHA type and desired solving timeout (in seconds) here.
Captcha captcha = client.Decode(Client.DefaultTimeout,
new Hashtable(){
{ "type", 4 },
{"token_params", tokenParams}
});

//If CAPTCHA solved, you'll receive a DeathByCaptcha.Captcha object.
if (captcha != null)
{
if (captcha.Solved && captcha.Correct) {
args.DataRow["captchaToken"] = captcha.Text;
}
else
{
//CAPTCHA was not solved correctly
}
}
else
{
//Handle errors here
}

Step 4

If everything is fine and the CAPTCHA were solved succesfully, the Death By Captcha API will return a DeathByCaptcha.Captcha object, which includes a token that looks like this:

03AHJ_Vuve5Asa4koK3KSMyUkCq0vUFCR5Im4CwB7PzO3dCxIo11i53epEraq-uBO5mVm2XRikL8iKOWr0aG50sCuej9bXx5qcviUGSm4iK4NC_Q88flavWhaTXSh0VxoihBwBjXxwXuJZ-WGN5Sy4dtUl2wbpMqAj8Zwup1vyCaQJWFvRjYGWJ_TQBKTXNB5CCOgncqLetmJ6B6Cos7qoQyaB8ZzBOTGf5KSP6e-K9niYs772f53Oof6aJeSUDNjiKG9gN3FTrdwKwdnAwEYX-F37sI_vLB1Zs8NQo0PObHYy0b0sf7WSLkzzcIgW9GR0FwcCCm1P8lB-50GQHPEBJUHNnhJyDzwRoRAkVzrf7UkV8wKCdTwrrWqiYDgbrzURfHc2ESsp020MicJTasSiXmNRgryt-gf50q5BMkiRH7osm4DoUgsjc_XyQiEmQmxl5sqZP7aKsaE-EM00x59XsPzD3m3YI6SRCFRUevSyumBd7KmXE8VuzIO9lgnnbka4-eZynZa6vbB9cO3QjLH0xSG3-egcplD1uLGh79wC34RF49Ui3eHwua4S9XHpH6YBe7gXzz6_mv-o-fxrOuphwfrtwvvi2FGfpTexWvxhqWICMFTTjFBCEGEgj7_IFWEKirXW2RTZCVF0Gid7EtIsoEeZkPbrcUISGmgtiJkJ_KojuKwImF0G0CsTlxYTOU2sPsd5o1JDt65wGniQR2IZufnPbbK76Yh_KI2DY4cUxMfcb2fAXcFMc9dcpHg6f9wBXhUtFYTu6pi5LhhGuhpkiGcv6vWYNxMrpWJW_pV7q8mPilwkAP-zw5MJxkgijl2wDMpM-UUQ_k37FVtf-ndbQAIPG7S469doZMmb5IZYgvcB4ojqCW3Vz6Q

Step 5

Locate the form field with ID “g-recaptcha-response” and set the form field with the token value retrieved in the previous step.

Step 6

Submit the web form.

Creating a Sequentum Enterprise Agent (Death By Captcha)

I now want to create a Sequentum Enterprise agent that performs the steps required to work with the Death By Captcha service. This is not as hard as it may first look, but it does require some investigative work to find the HTML tag that contains the “data-sitekey” parameter and the form field that has the ID “g-recaptcha-response”. These HTML tags are hidden tags, so you need to look in the HTML source code, or use the Nodes panel in the Sequentum Enterprise editor.

Step 1

To find the HTML tag that contains the data-sitekey parameter, I’ve selected the reCAPTCHA box in the Sequentum Enterprise editor, and then opened the Nodes panel to find the correct HTML tag. I’ve also found the form field with ID “g-recaptcha-response”, which is requires in a later step.

reCaptchaSiteKey.png

Step 2

I need the data-sitekey parameter and the page URL in my first call to the Death By Captcha service, so I’ll add two Capture commands to capture those two data values.

I need to use the data from these two commands as parameters to the call I’ll make to the Death By Captcha service, so to make things easier at design time, I’ll set the option “Update Design Value Automatically” on both Capture commands, so next time I open the agent in the editor, the design time values are extracted and updated. Otherwise, next time I open the agent in the editor, I would get the data-sitekey from when I created the agent, and that would not work, since the data-sitekey parameter changes every time I load the page.

My agent now looks like this:

Death_By_Captcha1.jpg

Step 3

Copy DeathByCaptcha.dll that you downloaded as a part of Death By Captcha API to the Assemblies folder of the agent.

Death_By_Captcha5.jpg

Step 4

Add a reference to DeathByCaptcha.dll.

Death_By_Captcha6.jpg

Step 5

I now add an Execute Script command, rename it to SolveCaptcha which would make a call to the Death By Captcha API to submit and solve the Captcha. The API will return DeathByCaptcha.Captcha object.

My agent now looks like this:

Death_By_Captcha2.jpg

My SolveCaptcha script looks like this:

Death_By_Captcha3.jpg

Step 6

Once the CAPTCHA has been resolved, our SolveCaptcha script will update the captchaToken field with the CAPTCHA token it received back from the Death By Captcha API.

Step 7

I can now go back to the web form and add a Form Field command that sets the value of the form field with ID “g-recaptcha-response” with the CAPTCHA token extracted earlier.

Step 8

Finally, I’ll add a Link command that clicks on the Submit button to submit the form, and you should see the following page when you execute the command:

reCaptchaFinalPage.png

My final agent looks like this:

Death_By_Captcha4.jpg

You can download the agent here:

googleRecaptchaDeathByCaptcha.scgx

 googleRecaptchaDeathByCaptcha.scgx

This agent is also attached with this article at the bottom.

To run this agent, import the agent in the Sequentum Enterprise editor and replace the username and password input parameters with your Death By Captcha username and password.

  • googleRecaptchaTest.scgx(6 KB)

googleRecaptchaTest.scgx

  • googleRecaptchaDeathByCaptcha.scgx(20 KB)

googleRecaptchaDeathByCaptcha (1).scgx

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.