The Lab #36: Bypassing Cloudflare with anti-detect browsers
Configuring GoLogin to bypass Cloudflare bot detection
Cloudflare Bot Protection is one of the most used anti-bot solutions and we’ve already seen in the past articles from The Lab how it relies on device fingerprint as one of the techniques to detect bots.
In this post, we see how to use a commercial solution like GoLogin to change our device fingerprint to bypass the anti-bot. To be completely honest, this article was initially meant to compare different anti-detect browsers against Cloudflare but due to some time constraints, I’ve decided to write only about GoLogin because it’s the one I’m more familiar with. Anyway, a comparison between different anti-detect browsers will come soon.
How to know you’re getting blocked because of the device's fingerprint?
Before trying to modify your device fingerprint, you need to understand if this is the real cause of the block of your scraper. Luckily, this can be easily found.
Let’s say your scraper works on your local device (PC, laptop) but it does not work when you deploy it on your production environment, typically inside a data center.
As a first step, add some residential proxy to your scraper and try again to run it both locally and in production. If it works in both environments, you were getting blocked only because the server IP was not trusted enough by the target website and you need a residential one.
If the scraper keeps working on your machine but not on the server, then we have a device fingerprinting issue.
This is the case of our baseline scraper of the website indeed.com, a job listing website protected by Cloudflare. While on my machine it works and could look for Python job positions and iterate in the pages, the same scraper with the same residential IP provider gets blocked by Cloudflare Turnstile on an AWS machine.
As always, all the code can be found on our GitHub repository for paying subscribers: if you’re one of them and cannot get access, please write me at pier@thewebscraping.club with your GitHub username, since I need to add you manually to the repository.
If you’re willing to access all The Lab articles like this, consider becoming a paid reader and supporting The Web Scraping Club project.
How GoLogin allow me to change my fingerprint?
GoLogin, similarly to many other anti-detect browsers, allows you to create multiple profiles, each with a proper customized fingerprint.
You can create them both by the desktop app and via API, even if it seems to me that the first has more fine-tuning options available.
Once the profile is created, you need to configure your Playwright scraper to connect via CDP to it, as we have done many times in the past.
with sync_playwright() as p:
gl = GoLogin({
"token": "MYTOKEN",
"profile_id": 'MYPROFILEID',
}) debugger_address = gl.start()
browser = p.chromium.connect_over_cdp("http://"+debugger_address)
page = browser.new_page()
gl.normalizePageView(page)
page.goto('https://www.google.it', timeout=0)
What we’re doing now, is to create different profiles with different layers of device masking, to see what are the key features that allow us to bypass the anti-bot.
Profile 1: no masking
We’re creating a new profile for a Linux Machine since our AWS machine is a Linux one, but we’re setting off all the masking options.
The only improvement we’re making is to set a Screen Definition since we cannot avoid it.
To my great surprise, using Orbita, the browser developed by GoLogin, instead of Chrome, is enough to bypass indeed.com protection, even if we’re not masking anything about the underlying device.
Let’s test the same solution on a page I know has higher barriers set up by Cloudflare, harrods.com
In this case, our plain vanilla profile is not enough. This is because we should think of anti-bots as monoliths but once they’re installed on a website, rules, and red flags are set by the website owner, so the same solution could lead to different results, just like this case.
Let’s see what’s the turning point to bypass this level of protection by adding more and more noise to our fingerprints.
Profile 2: Audio Context and Client Rects with noise
Let’s try with the same profile as before but modifying the Client Rects and the Audio context.
The Client Rect fingerprint is obtained by calculating the exact pixel position and size for text inside a determined box.
The Audio Context instead is determined by a hash representation of the audio context on your computer. According to your browser’s audio settings and available hardware, a website requests it to imitate a sinusoidal curve for displaying sound files. As supplementary entropy for browser fingerprinting, this sinusoid is transformed into a hashing algorithm and transmitted to servers.
Unluckily, activating both of them does not solve our issue, since we’re still getting blocked.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.