How to Improve the Performance of Puppeteer Stealth Evasions
How to modify Puppeteer library to avoid bot detection
Welcome to this new series in partnership with ZenRows, a leading web scraping API to extract data from any website, where we’ll see some techniques to bypass bot detection when scraping.
Puppeteer Stealth is a plugin for Puppeteer Extra, a Node.js library that extends Puppeteer with extra plugin functionalities. The Stealth plugin, specifically, features evasion techniques for avoiding anti-bot detection during web scraping.
Stealth is quite popular, with an average of 290k weekly downloads and 519 dependents.
Source: npm trends
This series will give you a solid understanding of how Puppeteer Stealth works internally to avoid getting blocked. In this first article, you'll learn how Stealth evasions operate and how to develop custom evasions to improve bypass in the plugin, one of the most popular open-source bypass tools.
How Powerful Is This Plugin Nowadays?
The main Puppeteer library (vanilla Puppeteer) leaves obvious bot-like tracks that can get you flagged while scraping. Therefore, the plugin patches those loopholes to reduce the likelihood of bot detection.
A test comparing vanilla Puppeteer and the Stealth plugin on CreepJS gives a better insight into their evasive capabilities. The result below is a comparison of their headless behavior.
Vanilla Puppeteer shows 33% headless mode, making it appear as an automated headless browser. That information alone is enough for an anti-bot to detect and block it.
The WebDriver property of vanilla Puppeteer also shows true (webDriverIsOn: true). That can reduce the stealth ability of your scraper, making it easy for a target website to flag it as a potential bot. See the result below:
webDriverIsOn: true
hasHeadlessUA: false
hasHeadlessWorkerUA: false
So, how does the Stealth plugin fix these issues?
Puppeteer Stealth has a higher chance of evading detection because it has a headless score of 0%. That indicates the Stealth plugin doesn’t appear to a server as automation software.
Puppeteer Stealth also handles the worker differently by patching its WebDriver behavior and setting its presence as false (webDriverIsOn: false), which reduces the likelihood of bot detection:
webDriverIsOn: false
hasHeadlessUA: false
hasHeadlessWorkerUA: false
As you can see, the Stealth plugin enhances Puppeteer's bypassing capabilities by patching some properties. However, it only works well with websites using basic protection. Therefore, learning what the Stealth plugin does behind the scenes will help improve it with custom evasions.
The Evasions You Must Know
These are some critical evasions you must know to understand how Puppeteer Stealth works.
UA Override
The User-Agent Override evasion modifies Puppeteer's default user agent information, including the platform data. This evasion is handier when running Puppeteer in the headless mode. For instance, it ensures that the browser string changes to chrome instead of the default headlesschrome while in headless mode.navigator.webdriver
The navigator.webdriver evasion modifies Puppeteer's WebDriver property. Some websites use the presence of a WebDriver as a pointer to bot-like activities. So, this evasion reduces the likelihood of getting blocked by changing the WebDriver property from true to false in the Puppeteer browser instance.Media Codecs
Some media codecs are proprietary from Google and only present in Chrome, not Chromium. The media codec evasion patches the missing audio and video codecs in Puppeteer's Chromium instance, allowing you to mimic native Google Chrome. It enhances your likelihood of passing the codec support test during an anti-bot browser fingerprinting.Plugins
The Stealth plugin evasion emulates the real Chrome browser's plugins while in headless mode. Puppeteer's headless mode lacks plugins by default and can get you blocked, so the plugin evasion patches that loophole.WebGL Vendor and Renderer
The WebGL vendor evasion helps you modify the name of the organization that created your machine's Web Graphics Library (WGL). For instance, Intel WGL returns Intel Iris by default. You can change it to Apple Inc. if using an Apple WGL. That can help you bypass anti-bot systems that block specific WebGL vendors and also patch some virtual GPUs, especially while running cloud machines.
These are the common evasions that the Stealth plugin deals with. But you can add more custom evasions to improve Puppeteer Stealth.
Step-by-step to Improve Puppeteer Stealth's Efficacy
Adding custom evasions can further boost your chances of avoiding detection with Puppeteer. Next, you'll learn how to patch the platform and the device's memory properties.
Before you begin, ensure you install Puppeteer and the Stealth plugin with npm:
[Code: Link]
Demo 1: How to Patch navigator.platform
The navigator.platform property defines the client's operating system. Patching the platform helps you mimic any operating system environment during scraping and can enhance Puppeteer's bypass capability.
The default navigator.platform value on Windows is Win32. See the test result from CreepJS:
Say you want to appear as a Mac user on a Win32 OS. You can patch the platform property to mimic a MacIntel machine. Let's see how to achieve this.
First, tie Puppeteer Extra to the Stealth plugin:
[Code: Link]
Define an anonymous function and start Puppeteer in non-headless mode:
[Code: Link]
Extend that function with a dedicated function that changes the platform property as the page document loads:
[Code: Link]
Finally, visit the test website to load it in non-headless mode and wait for the page to load. Feel free to extend the timeout to help you navigate the test page properly.
Note: The target website (CreepJS) is only for testing sake. You can replace it with your chosen website.
[Code: Link]
Now, put everything together, and the final code should look like this:
[Code: Link]
The script visits CreepJS in non-headless mode. You can navigate to the Workers section to view the platform details. The platform now shows MacIntel:
This patch can help bypass anti-bot systems that target a specific platform. You can also patch the hardware property to improve stealth. Let's see how to do that next.
Demo 2: How to Patch Device Memory and CPU Cores
Patching the device memory (deviceMemory) and CPU cores (hardwareConcurrency) involves modifying them directly in the navigator. With this patch, you can give each request a different memory value to appear as a different browser and avoid getting tracked.
It can also be helpful for evading fingerprinting and detection systems that analyze a machine's hardware component to identify bots.
Running the fingerprinting test on CreepJS shows that the default device memory used in this tutorial is 8GB RAM with 4 CPU cores:
Let's modify these values to show that your request is from a machine with 16GB RAM and 8 CPU cores.
Start by tying Puppeteer Extra to the Stealth plugin. Then, start an anonymous function that launches the browser instance in a non-headless mode.
[Code: Link]
Next, define a patch function to modify the device memory and hardware concurrency once the page document loads:
[Code: Link]
Visit the test page to view the fingerprinting result:
[Code: Link]
Here's the final code:
[Code: Link]
The code modifies the device memory to 16GB RAM and 8 CPU cores, as shown on the test website:
You're already one more step closer to avoiding anti-bot systems that block specific memory and hardware concurrency profiles.
What's Next: How to Keep Improving
In this part of the series, you've seen the complexities of the Puppeteer Stealth Plugin and learned how to patch some evasions to improve your scraper's ability to bypass blocks. However, the easiest solution to bypass any anti-bot system is to use a web scraping API like ZenRows. It's the fastest solution when facing advanced and evolving anti-bot measures.
The next article will feature the navigator object and the WebDriver. You'll also see how to modify these properties to improve stealth performance.
Stay tuned for the next article in the series!
This article is written in collaboration with ZenRows.
Scrape any website with a leading web scraping API on success rate and response times. Try it for free in seven seconds.