The Lab #39: Mouse movements in Playwright
How to move the mouse in Playwright to mimic human behavior
When a human-like mouse movement is important in web scraping?
As we approach a web scraping project for a website, we may encounter an anti-bot protection software installed. Depending on the industry in which the website operates, it could be a common case or not: in my experience with 200+ fashion e-commerce websites, where I scrape the public components like prices and products, I could say that around 20% of them have a sort of bot protection. This is true especially if you target large players with the budget to spend on these solutions, while lesser-known websites usually prefer to focus on the security of the purchasing process.
If there’s no anti-bot protection, probably a Scrapy spider will do the job and you don’t have to care about mouse movements, since there’s no one on the server side who cares about it.
Even if you see there’s an anti-bot that requires you to provide a plausible device fingerprint to bypass, in many cases this is enough and you don’t need to worry about movements. Several anti-bot solutions prefer to rely primarily on signals about the browser and the device configuration rather than tracking every event sent by the browser, like the mouse movement, at least when they’re set at a low “aggressivity” level.
Other anti-bot softwares, like Datadome, when set on a high level of security, are very sensitive to the behavior of the scraper, and everything that goes off the rails is marked as suspect and could lead to a block, even when a human is browsing. Try by yourself at the Hermes.com website, which in my experience is one of the most protected websites I’ve encountered. Enter a random product category and start browsing fast all the products, opening images and links in new tabs: you will get blocked after a few minutes.
This is because you’re not browsing the website like someone who’s genuinely interested in buying will do and it’s the result of the so-called “behavioral analysis”, which claim could be found on the Datadome page.
Datadome will analyze not only the sequence of the requests you’re making to the target website but also how your scraper transitions from one page to another: is it accessing via a direct link even in the internal pages? Is it clicking around with the mouse? And if so, how the mouse moves on the page?
The standard Playwright mouse movement
Playwright, since it is a browser automation tool for app testing, has no native function to emulate a human-like mouse movement. Its core functionalities are built for creating and executing tests on web apps: it provides a set of functions to move the mouse inside a Playwright browser instance but the movements are quite unnatural.
In the GitHub repository reserved for paying subscribers, you’ll find a simple program (draw_play.py) that draws a line between two points in the way Playwright would move between them: fast and straight.
Difficult to believe this is a mouse movement between two points generated by a human. Every time we tell Plawright to click on a point on the screen, it will immediately go to it following a straight line.
Advanced anti-bots listen to the mousemove event, which is triggered when, as you can imagine, the mouse moves on a page. As we’re reading from the documentation of W3C standards:
The frequency rate of events while the pointing device is moved is implementation-, device-, and platform-specific, but multiple consecutive
mousemove
events SHOULD be fired for sustained pointer-device movement, rather than a single event for each instance of mouse movement.
This means that on average, every page we browse manually triggers hundreds of events, while if our scraper jumps from selector to selector, it will generate few of them. Also if we force Playwright to move the mouse around, these movements will be too fast and straight to be compatible with human-like ones.
The python_ghost_cursor mouse movement implementation
We already talked about Bezier curves in the past: they’re a way to draw curve lines programmatically. Multiple Python packages use them to mimic human-like interactions with web pages, together with some adjustments to the speed of the mouse.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.