The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin
Copy link
Facebook
Email
Notes
More

THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin

A new way to scrape Cloudflare-protected website using antidetect browsers

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Jan 19, 2023
∙ Paid
5

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin
Copy link
Facebook
Email
Notes
More
3
Share

This article is sponsored by Serply, the solution to scrape search engine results easily.

Serply
Serply

Web Scraping Club readers can save 25% on all SERP scraping plans by using the code TWSC25.


Cloudflare anti-bot detection

If you google “Cloudflare bypass”, you will find hundreds of articles and Github repositories explaining how to bypass Cloudflare (or sell a solution for doing it). I also wrote another post on this topic some months ago, and it’s one of the most successful in terms of readers coming from search engines.

The reason is pretty straightforward: Cloudflare Bot Management solution is one of the strongest and most used anti-bot protection used on the internet.

woman holding cardboard box with do we look like bots ? text
Photo by Waldemar Brandt on Unsplash

Why is difficult

Unlike traditional security measures, which rely on IP blocking or CAPTCHAs, Cloudflare's Bot Management solution uses advanced machine learning algorithms to analyze the requests made to a website. This allows it to identify bots by looking for patterns in their behavior that are commonly associated with bots. For example, bots may make a large number of requests in a short period of time, or they may use a specific type of user agent or IP address or have inconsistent/suspect fingerprints.

Another reason why Cloudflare's Bot Management solution is hard to bypass is that it is constantly updated to detect new types of bots. The company uses machine learning algorithms to continuously update its detection methods, so it can quickly identify and block new types of bots as they appear.

Last but not least, there’s no silver bullet against Cloudflare Bot Management since it’s a highly customized solution, so what works for a website could not work for another one.

As proof of this, in my previous post about Cloudflare, I wrote three similar solutions for 3 different websites, but only two of them still work. In fact, during the past weeks, I’ve struggled to use Playwright with the Antonioli website for bypassing Cloudflare but I was blocked again after a few pages, especially when the execution was running inside a VM on AWS.

A new approach: anti-detect browsers

Having tried Playwright with different browsers and contexts and on several cloud providers without any success, I decided to give a try with Playwright launching an anti-detect browser.

What are anti-detect browsers?

Anti-detect browsers are usually a fork of Chromium but with some features that enhance the privacy of the users. Typically they obfuscate or randomize fingerprints and the location of the user, and this is the main difference from a classic execution of Playwright or Selenium.

Simplifying the comparison, using Chrome with Playwright the server knows you’re using a genuine version of Chrome but from a Datacenter machine because of its device fingerprint. With an anti-detect browser, you’re using a version of Chromium set up for maximum privacy, that connects using a custom profile that sends custom device fingerprints (i.e. you fake you’re running the browser from a Mac while it’s running from a server).

GoLogin

I needed then to test if this solution could work and, between the several browsers available, I had to choose one with the following specs:

  • Has a fully working free demo, to test my solution

  • Can quickly be integrated with Playwright, minimizing the impact on my production environment

  • Has a Unix client, always because of my production environment.

Given that, technically, any chromium-based browser could run with Playwright if the executable_path is specified in the following way

browser = playwrights.chromium.launch(executable_path='/opt/path_to_bin')

I’ve chosen GoLogin because of all the features above and for the fact that I could create different profiles (so different fingerprints), which I could use for my experiments.

The configuration

After the onboarding for the trial, I created from the interface my first profile that mimics a Windows workstation.

Then I downloaded the browser’s client and the python source code from their repository, which is needed for interacting with Playwright using their API.

Using the tests on amiunique.org we can see the differences between the Playwright standard execution of Chrome from my Mac laptop and the one with a custom profile of a windows machine using GoLogin browser.

Playwright headers
Playwright headers

In the first case, we can see the Macintel platform and the macOS headers, which could be easily changed anyway.

Gologin headers
GoLogin headers

Using Gologin instead, I am faking the execution from a Windows machine. Of course, the differences are much more than those I screenshotted, but you easily check them by yourself using the code I’ve shared in The Web Scraping Club Github repository reserved for paying readers.

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More