Making Playwright scrapers undetected with open source solutions
Let's see the latest libraries for making your Playwright scrapers unstoppable
In last Sunday’s article, we saw some common setups we can use to improve the reliability of our Playwright scrapers, making their configuration more similar to that of an average human web user.
By using some arguments in Playwright's instance, we can mitigate the most common red flags indicating that a browser automation tool is in action. However, as mentioned in the article, anti-bot solutions are aware of these tricks and have become more sophisticated in detecting scrapers: they can analyze the browser fingerprint, the mouse movements, and other configuration nuances that may indicate a bot in action.
Since the demand for data is growing exponentially, and businesses rely more on web scraping to fuel analytics, pricing strategies, and market insights, there’s a need for tools to bypass these Anti-bot solutions. More companies are offering their unblockers or super APIs to bypass them, and also the open-source community is more vibrant than ever. Today, we’ll see some libraries that help our scrapers in this task; some of them we already know from previous articles, and others will be covered in detail in the next weeks.
Now a word from our web scraping consulting arm: RE Analytics
Since 2009, we’ve been at the forefront of web scraping, helping some of the world’s largest companies with sustainable and reliable access to critical web data.
If your web data pipeline is under pressure— be it due to high costs, technical hurdles, or scaling challenges— we’re here to provide unbiased, confidential advice tailored to your case.
Human-like mouse movements
When scraping public data from e-commerce websites, we rarely use libraries that mimic human-like mouse movements, unless they’re protected by Datadome.
In this case, we have two libraries that can be easily integrated with Playwright:
python-ghost-cursor: we covered this library in a previous article, it’s the python port of ghost-cursor, which uses Bezier curves to create more realistic trajectories for moving the mouse between two points on the screen.
Oxymouse: Released by Oxylabs, it gives access to different algorithms for calculating mouse movement, as we saw in this other article.
Spoofing browser fingerprinting
Exploiting the Browser’s API to detect the hardware and software stack of the client connecting to a website is one of the most common techniques that anti-bots use to detect scrapers. It’s relatively easy to implement, and several red flags can be used to block bots. Is the client using a certain WebGL renderer typical of servers? Does the client have some audio and video devices? Is the browser's timezone coherent with the IP's timezone?
A relatively new (at least in the web scraping arena) family of tools is addressing these issues. They are the anti-detect browsers: Kameleo, NSTbrowser, GoLogin, and many, many others are born to create a new “digital identity” for your browser’s sessions, mimicking plausible fingerprints coming from consumer-grade devices instead of servers. Of course, these tools are commercial ones, but we also have some alternatives in the Open-Source space.
Browserforge is one of the libraries that create a legitimate browser fingerprint for your Playwright scrapers. Recently, it has been included and improved in Camoufox, an Open-Source anti-detect browser that also includes human-like mouse movement features and patches to bypass all the most well-known challenges from anti-bots.
Playwright patches
If you just want a patched version of Playwright to use for your scraping tasks, you can just have a look at Patchwright, a modified Playwright client that starts by default with some fixes for the best well-known vulnerabilities:
disable and set all the parameters related to browser automation so that the session seems to come from a legit user
disable the console and runtime enable in order to avoid a CDP leak, which is lately used to detect any browser automation tool.
I hope you liked the article; if you have some more tools you are actually using and want to share, please write them in the comment section.
Like this article? Share it with your friends who might have missed it or leave feedback for me about it. It’s important to understand how to improve this newsletter.
You can also invite your friends to subscribe to the newsletter. The more you bring, the bigger prize you get.