THE LAB #87: Bypassing ReCAPTCHAs with open source and commercial tools
History, Technical details, Alternatives, and Bypass Methods of ReCAPTCHA
Recently, both in the TWSC Discord server and on Reddit, I’ve noticed that more and more people are struggling with ReCAPTCHA, especially with its V3, so I decided to dive into it with this article.
We’ll see what ReCAPTCHA is, a bit of its history, how it works, and alternatives. Then we’ll test both some open-source packages and commercial solutions against a real-world use case.
Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.
History
ReCAPTCHA
ReCAPTCHA is a CAPTCHA system originally developed at Carnegie Mellon University in 2007 and acquired by Google in 2009.
Early CAPTCHAs (Completely Automated Public Turing tests) typically presented challenges like distorted text or simple puzzles that users had to solve to prove they were human. ReCAPTCHA’s innovation was to use this effort for a meaningful purpose: the original ReCAPTCHA v1 showed users two words from scanned texts (one known control word and one unknown word from old books/newspapers) to help digitize those texts.
By solving the CAPTCHA, users not only verified they were human but also helped transcribe difficult words that optical character recognition (OCR) could not read.
This “mass collaboration” approach meant that millions of CAPTCHA responses were simultaneously improving Google’s OCR projects (like Google Books and old newspaper archives).
ReCAPTCHA v1 was extremely successful, at one point serving over 100 million CAPTCHAs per day on popular sites like Facebook, Twitter, and many others. However, it relied on explicit human effort to solve each challenge.
ReCAPTCHA V2
As bot technology improved and user experience became a bigger concern, Google evolved ReCAPTCHA to reduce the friction for legitimate users. ReCAPTCHA v2, introduced around 2014, moved away from always requiring users to decipher text. Instead, Google introduced the famous “I’m not a robot” checkbox (NoCAPTCHA) along with behind-the-scenes behavioral analysis. If a user’s browser activity and Google’s risk analysis deemed them likely human, checking the box was enough. Only if the system was uncertain would it present the familiar image-selection challenges (e.g. “select all images with a bus”).
This dramatically improved usability, as many users could pass with a single click instead of typing words. Over time, the image challenges also shifted from random pictures to ones with practical machine learning value: in 2012 Google started using street view photos and asking users to identify objects like crosswalks and traffic lights – tasks which possibly helped train Google’s autonomous vehicle vision systems, though Google claimed it was for improving Maps. Either way, ReCAPTCHA was no longer about digitizing text, but about distinguishing bots from humans with minimal user effort.
This episode is brought to you by our Gold Partners. Be sure to have a look at the Club Deals page to discover their generous offers available for the TWSC readers.
🧞 - Reliable APIs for the hard to knock Web Data Extraction: Start the trial here
💰 - Use the coupon WSC50 for a 50% off on mobile proxies and proxy builder software
ReCAPTCHA V3
The latest evolution, ReCAPTCHA v3, launched in 2018, eliminates interactive puzzles altogether in normal operation. It runs completely in the background, monitoring user interactions and other signals to produce a “risk score” for each request. Site owners get a score from 0.0 (likely bot) to 1.0 (likely human) for each user and can decide what action to take (allow, throttle, or challenge with an alternate method). The user typically is not interrupted at all by any prompt under v3’s model. This was a response to growing frustration with CAPTCHAs – even the creator of the original CAPTCHA has lamented that these tests ended up “frittering away… millions of hours” of human time. Google’s aim with v3 was to preserve security while making the verification nearly invisible to legitimate users.
It’s worth noting that Google’s reCAPTCHA now dominates the CAPTCHA market (around 98% share as of 2022), but this ubiquity has raised concerns (discussed later) about privacy and dependency.
In 2020, for example, Cloudflare (a major internet infrastructure company) dropped Google reCAPTCHA in favor of a competitor, citing both privacy issues and Google’s newly introduced fees for high-volume usage.
ReCAPTCHA Alternatives (Competitors and New Approaches)
Despite Google reCAPTCHA’s dominance, several alternative CAPTCHA and bot mitigation solutions have emerged – some aiming to improve user experience (no human puzzles), and others focusing on privacy or different challenge types. Here are a few notable ones:
- hCaptcha: Arguably the biggest direct competitor to ReCAPTCHA. hCaptcha also presents image identification challenges, very similar in appearance to Google’s (often asking users to pick objects in images), though it can be integrated invisibly as well. Cloudflare switched from Google to hCaptcha in 2020, largely due to privacy concerns and cost – they didn’t want Google harvesting data on their millions of users, and Google’s pricing for heavy use was significant. hCaptcha, by contrast, doesn’t funnel data to an ad company. It even has a model where website owners can earn a tiny reward for each captcha solved (because those image classifications help train machine learning models for hCaptcha’s clients).
For users, hCaptcha feels much like reCAPTCHA v2 in practice, though some find its image puzzles occasionally more difficult or frequent. Technically, hCaptcha is just as challenging to bots – it uses similar concepts of random image challenges and analysis of user behavior. One difference: hCaptcha is more privacy-focused and does not consider a user’s Google login cookie (unlike ReCAPTCHA, which gives a higher score to users with Google cookies). In fact, hCaptcha touts that it doesn’t track users across sites and complies with strict privacy laws.
- Cloudflare Turnstile: In 2022, Cloudflare went a step further and released Turnstile, a completely invisible, user-friendly CAPTCHA replacement. Turnstile can be used by any website (not just Cloudflare customers) as a drop-in replacement for reCAPTCHA. Its approach is to never show a puzzle to the user, unless its fingerprint seems shady. In this case, a click on a tick box is needed.
Instead, when a user needs to be verified, Turnstile runs a series of lightweight, rotating challenges in the background: things like proof-of-work calculations, probing for certain browser behaviors or quirks, and other non-interactive tests.
It adapts these challenges based on the visitor – for a suspicious browser, it might do slightly more work, for a known good browser, hardly any. Turnstile also employs machine learning on the back-end to identify patterns of legitimate vs. malicious traffic, and it continually updates the challenge mechanisms to stay ahead of bots. A key point is privacy: Turnstile never requires or uses cookies (it explicitly does not look for a Google login cookie or any login cookie), and it integrates with a new standard called Private Access Tokens.
These tokens, developed in collaboration with Apple, allow devices running iOS/macOS to attest they are real users at the OS level, without revealing personal data. Turnstile can use such a token to automatically let an Apple user through with zero challenge. This concept is a novel alternative to CAPTCHAs: instead of asking the user or scanning the browser, ask the device or OS to vouch for the user.
- Friendly Captcha: Another approach that avoids human input is Friendly Captcha, which uses a proof-of-work puzzle solved by the user’s device. When a user visits a site with Friendly Captcha, a small web worker in the browser quietly performs some computational puzzle (a few seconds of hashing) to prove the client’s sincerity.
There’s no checkbox, no images – by the time the user clicks “Submit” on a form, the background puzzle is done and a token is ready. This increases the cost for bots (they’d need to expend CPU power for each attempt) while legitimate users only experience a minor delay. Friendly Captcha markets itself as “the first truly invisible CAPTCHA based on proof-of-work”. Technically, it’s ensuring that any client making requests has skin in the game (CPU time), which bot operators find costly to scale up. It also doesn’t track or fingerprint the user, addressing privacy concerns (the cost to solve isn’t tied to identity, just computation). Friendly Captcha is used on some sites where user privacy is paramount, and even Google’s tracking is considered unacceptable.
- Geetest adaptive CAPTCHA: Geetest is a popular CAPTCHA solution originally from China, known for its “slider” CAPTCHA. Instead of picking images, the user is asked to slide a puzzle piece into place or perform a simple drag-and-drop action. The idea is that a human can do this easily, but a bot would struggle to mimic the exact cursor movement and speed. It’s a more game-like challenge and generally quicker than clicking multiple images. Geetest also offers an “invisible” mode and various difficulty levels. It gained adoption on many websites (especially in Asia) as an alternative to the more tedious image grids. The sliding puzzle can be more accessible to international users since it doesn’t rely on recognizing specific objects or text.
In addition to the above, there are broader bot management solutions (from companies like Akamai, DataDome, PerimeterX, etc.) that don’t use traditional CAPTCHAs but rather fingerprinting and behavioral analysis to silently drop bot traffic. These often integrate with or replace CAPTCHAs entirely by scoring the client behind the scenes (much like ReCAPTCHA v3, but proprietary). For example, a service might compute a device fingerprint and threat score; if it’s too low, it might block the request or present a secondary verification like emailing a code, all without a classic “click images” CAPTCHA.
The trend in the industry is clearly toward reducing or eliminating direct user challenges. Even Google has hinted at the end of CAPTCHA-solving as we know it, looking toward devices and trusted identities to authenticate humanity in the future, rather than puzzles.
Before continuing with the article, I wanted to let you know that I've started my community in Circle. It’s a place where we can share our experiences and knowledge, and it’s included in your subscription. Give it a try at this link.
How ReCAPTCHA Works (v1 vs v2 vs v3)
ReCAPTCHA v1 (2007–2018): The original ReCAPTCHA presented two words as an image. The technical insight was that one word was already known to the system and used to check correctness, while the second was an unknown word that needed deciphering. If the user typed the known control word correctly, the assumption was that their attempt at the unknown word was likely correct as well. By aggregating answers from multiple users, the system could determine the unknown word’s text with high confidence. Technically, this was a clever blend of security and crowdsourced OCR. However, as bots grew more sophisticated, even distorted text CAPTCHAs became vulnerable – researchers showed methods to reverse-engineer the distortions or used machine learning to solve text CAPTCHAs with decent success.
ReCAPTCHA v2 (2014): Google’s pivot in v2 was toward behavioral analysis and conditional challenges. The v2 widget (the “No CAPTCHA reCAPTCHA”) loads a Google JavaScript that observes user behavior on the page – how the user moves the mouse, how quickly the checkbox is clicked, browser environment data like canvas fingerprint, and especially Google cookies present in the browser. All these signals are sent to Google’s risk analysis engine.
If the user has a Google account cookie and a good reputation, for example, Google is more likely to consider them human and give a pass. In such cases, when the user checks “I’m not a robot,” the back-end validation (via Google’s API) returns a success without further input. Suppose signals are missing or suspicious (e.g., coming from a Tor node or an unfamiliar device). In that case, ReCAPTCHA v2 serves a challenge: typically a grid of images where the user must identify specific objects, or alternatively an audio puzzle.
The site integration uses a client-side API (`grecaptcha` JavaScript) to render the widget and a server-side API where the website sends Google the secret key along with the user’s response token for verification. Only if Google’s response indicates success does the site proceed with the user’s action.
ReCAPTCHA v3 (2018): In v3, Google took the invisible approach further. There is no checkbox and no challenge prompt in the normal flow. Instead, when a page loads (or when a specific action is triggered), the site calls grecaptcha.execute()
in JavaScript with a site-specific key and an “action name.” This causes the ReCAPTCHA script to run quietly, collecting telemetry on the user’s interactions and environment. Google then returns a token representing a score (and that action), which the website backend verifies via API. The score ranges from 0.0 to 1.0; a higher score means the user is likely not a bot.
Site owners set a threshold (e.g. 0.5) and define what to do if the score is below that – for instance, require additional verification (like 2FA or a traditional CAPTCHA), or block the action, or flag for review. A common practice is to run v3 on multiple pages (login, register, etc.) and even include it site-wide, so Google’s model can learn typical user behavior on that site. In fact, Google’s guidance is to embed ReCAPTCHA v3 on all pages to get the best risk assessments.
Technically, this means Google is continuously tracking user activity on the site, which raised some privacy concerns when v3 was announced (since it effectively allows Google to monitor users even on non-Google websites). Google ReCAPTCHA Enterprise (the paid enterprise version of v3) extends this by allowing custom tuning and integration with bot management systems. It can present “reCAPTCHA v2” challenges in suspicious cases – a sort of failover challenge if purely invisible scoring isn’t confident. However, for most users of v3, the experience is seamless: pages simply display a badge stating “Protected by reCAPTCHA,” and no puzzles are shown.
The scripts mentioned in this article are in the GitHub repository's folder 87.RECAPTCHA, available only to paying readers of The Web Scraping Club.
If you’re one of them and cannot access it, please use the following form to request access.
Bypassing ReCAPTCHA with Open-Source Tools
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.