The Lab #37: Bypassing Cloudflare with anti-detect browsers - Part 2
Using Kameleo to bypass Cloudflare bot detection
In the latest article of The Web Scraping Club, we’ve seen how to configure GoLogin to bypass Cloudflare Bot Protection. We have seen how device fingerprinting works, since our scraper worked from our local machine but not on a server on the AWS datacenter, even using residential proxies.
Thanks to the Kameleo team, I had the opportunity to test their anti-detect browser, so we can try a new tool we can add to our toolbelt.
If you’re willing to access all The Lab articles like this, consider becoming a paid reader and supporting The Web Scraping Club project.
What is Kameleo and how does it work?
Kameleo is an anti-detect tool that allows you to create different profiles you can use in your scrapers.
Every profile could be seen as a device, with its own OS, canvas fingerprint, WebGL, and so on.
You can choose between two custom-built browsers, one to simulate Chrome profiles and one for Firefox. You can also install an app on an Android device to use and automate your browser on it.
All these profiles, created via the web interface or the API, can be used inside the most popular web automation frameworks like Selenium, Puppeteer, or Playwright.
First of all, you need to download and set up the Kameleo installer but unluckily it’s only available for Windows, which is a great limitation. This will be the machine where the browser will be opened and browse the requested pages.
On the machine where you launch the scraper, instead, you need to install the API client, which will need to connect to the Kameleo server.
Explaining the functioning of Kameleo it’s quite easy: its team has collected a database of fingerprints from real devices (called base profiles) that are used as a template when we create the profiles we’re gonna use for scraping (virtual profiles). Since base profiles come from real devices, when we’re creating virtual profiles we can modify only a few options, to avoid errors in the overall coherence of the fingerprint. It seems to me a smart move, especially if the database of base profiles is big enough to create many different devices.
But let’s play with different profiles and see how they perform against the traditional fingerprint tests.
Profile 1: A Windows profile
First of all, we need to install the Kameleo API, in our case using Python.
python3.10 -m pip install kameleo.local-api-client
After that, following the examples in the documentation, we’ll create our first profile mimicking a desktop Windows machine.
from kameleo.local_api_client import KameleoLocalApiClient
from kameleo.local_api_client.builder_for_create_profile import BuilderForCreateProfile
from playwright.sync_api import sync_playwright
# This is the port Kameleo.CLI is listening on. Default value is 5050, but can be overridden in appsettings.json file
kameleo_port = 5050
client = KameleoLocalApiClient(
endpoint='http://YOURSERVERPORT:5050',
retry_total=0
)
# Search Chrome Base Profiles
base_profiles = client.search_base_profiles(
device_type='desktop',
browser_product='chrome',
os_family='windows'
)
# Create a new profile with recommended settings
# Choose one of the Base Profiles
create_profile_request = BuilderForCreateProfile \
.for_base_profile(base_profiles[0].id) \
.set_recommended_defaults() \
.build()
profile = client.create_profile(body=create_profile_request)
I only selected the browser, the machine type, and the OS family, allowing Kameleo to create the profile with its default configurations.
In the second part of the script, which you can always find in the GitHub repository available for paying subscribers, I’ve used the profile to load the Sannysoft test collection.
# Start the browser profile
client.start_profile(profile.id)
print(profile.id)
# Connect to the browser with Playwright through CDP
browser_ws_endpoint = f'ws://YOURSERVERIP:{kameleo_port}/playwright/{profile.id}'
with sync_playwright() as playwright:
browser = playwright.chromium.connect_over_cdp(endpoint_url=browser_ws_endpoint)
context = browser.contexts[0]
page = context.new_page()
# Use any Playwright command to drive the browser
# and enjoy full protection from bot detection products
page.goto('https://browserleaks.com/javascript')
interval=randrange(10,30)
time.sleep(interval)
# Wait for 5 seconds
time.sleep(5)
# Stop the browser by stopping the Kameleo profile
client.stop_profile(profile.id)
The profile looks perfectly legit despite the browser has been loaded on a data center machine on AWS using Windows Server 2022 as an operating system.
Profile 2: A Mac OS profile running on a Windows Server
Let’s test the consistency of the profile by creating a Mac OS profile keeping Kameleo running on the same Windows machine, and we reload the Sannysoft test collection for something macro not coherent.
We can see that not only the user agent has changed but also the navigator.platform attribute and the WebGL renderer.
The interesting part of this process of profile creation is that if we create different profiles on the same OS, they all differ in some details like the screen size or sound devices, creating different but plausible hardware configurations.
Can we finally bypass Cloudflare Anti-Bot?
Now we have seen the functioning of Kameleo, let’s continue with the main topic of the post and see if we can bypass Cloudflare using it.
I’ve created the script playwright_kameleo_harrods.py in The Lab Github Repository, available for paying subscribers. If you’re one of them but don’t have access to it, please write me at pier@thewebscraping.club to be added to the repository.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.