THE LAB #63: Oxymouse and Playwright for human-like mouse movements
Testing the new Oxylabs open source package for human-like mouse movements
When talking about web scraping, you often hear the phrase that it’s a cat-and-mouse game with anti-bot vendors. This statement fits perfectly with today’s article, where we will use Oxymouse, a new open-source package released by Oxylabs. Its purpose is simple: integrating browser automation tools like Playwright and Selenium with a library that creates more human-like mouse movements from point A to point B.
But before diving into the package, let’s understand how websites can track mouse movements from their users.
Events and listeners
When browsing a website, it happens to us to move the mouse around the page, click somewhere, resize the window, and so on. All these actions, and many more, are called events: to understand how many events can be fired, here’s the full list.
Events are fired inside the browser window and are usually attached to a specific item. This might be a single element, a set of elements, the HTML document loaded in the current tab, or the entire browser window.
Using JavaScript, websites can add so-called listeners, to understand when an event is fired and eventually determine an action that follows. For example, if the mouse’s right button is pressed, change the website background.
As happened to Browser API, all these events that were initially designed to make life easier for web developers, then became the tool for many other use cases: advertising software, human behavior tracking, and, of course, anti-bot softwares.
But how we can understand the listeners in place on a website? In this case, the Developer’s Tools on our browser give us some hints.
This is the home page of Google.com. Inside the tab Elements, we can find a list of event listeners active on that page. We can see that every mouse movement, click with any button, and keypress on the keyboard is collected, I’m assuming in order to collect data to use for improving the page usability.
By expanding the listener, we can jump to the portion of the code where it’s located, even if most of the time it’s obfuscated and not readable.
But what happens if we examine a website protected by an anti-bot?
In this case, a website protected by Datadome is listening not only for mouse movements but also to some events and parameters to detect selenium or other browser automation tools using webdrivers.
In the case of this other website, protected by Cloudflare, it seems that the user behavior is not tracked, at least not using listeners. In fact, in my experience, I don’t need any package to mimic human-like mouse movement to bypass the protection when scraping.
On Kasada, things look a bit different.
My supposition is that they monitor the result of their challenges and algorithms based on the fingerprint and human behavior using this listener.
<script type="application/javascript">
document.addEventListener('kpsdk-load', function () {
KPSDK.configure([ { method: '*', domain: 'www.canadagoose.com', path: '*' } ]);
});
</script>
Scraping can be hard and expensive while finding the right tech stack at a reasonable price can be a challenge. Too many providers and services, and being up-to-date with everything on the market is a job itself. But it’s our job at Re Analytics.
How can we mimic human-like mouse movement?
Now that we have understood how to detect when a website is monitoring our mouse movements, we consequently know when to implement a mouse movement library in our scraper based on some browser automation framework like Playwright or Selenium.
In the past, we’ve already seen how to mimic human-like mouse movements with ghost-cursor and Playwright.
The package uses Bezier curves to draw curve lines for moving from A to B, instead of the standard approach used by Playwright, which is a standard approach in computer graphics to draw smooth curves programmatically.
Improvements in OxyMouse
As mentioned previously, OxyMouse improves the functionalities of ghost-cursor by adding two different algorithms for drawing routes from two different points on the screen.
One additional algorithm is called Gaussian and, from the code, we can see it combines Gaussian-based random walks with Bezier curves to produce a movement pattern that closely resembles how humans naturally move the mouse. By adjusting the degree of randomness and smoothness, the algorithm can create movements that are both unpredictable and smooth.
The use of Gaussian smoothing helps to reduce the sharp transitions that might occur with purely random movements, while the Bezier curves ensure the path is not a straight line, introducing a more organic flow to the movement.
The other algorithm available is the Perlin one, used typically in video games for procedural content generation.
The main idea behind using Perlin noise in this context is to generate smooth, continuous, and natural-looking randomness. Perlin noise avoids the harsh transitions that pure randomness might cause, which helps in producing more human-like movements.
The noise values generated by Perlin noise functions are in the range [-1, 1]. These values are then scaled and translated to fit within the screen's resolution, ensuring that the generated mouse movements stay within visible screen boundaries.
Parameters such as octaves, persistence, and lacunarity give fine-grained control over the behavior of the generated movements. Increasing the number of octaves makes the movement more complex while adjusting persistence and lacunarity can fine-tune the smoothness and frequency of changes in direction and speed.
By manipulating these parameters, the behavior of the mouse movement can be adjusted to simulate different types of user interaction, from smooth scrolling to more erratic movements.
On the OxyMouse repository on GitHub, you can see how the three different algorithms behave by drawing a line between two points.
Integrating OxyMouse in a real-world example
Now that we understand how the package works, we can use it to generate human-like mouse movements during a scraping activity.
In the file test_oxymouse.py in the folder 63.OXYMOUSE inside the repository for paying subscribers, you will see how I used Oxymouse to generate random mouse movements on the page and then ghost cursor to click on a selector, which is a thing that at the moment is not possible with OxyMouse.
If you’re a paying subscriber but don’t have access, please write me at pier@thewebscraping.club with your GitHub username, since I need to add you manually to the repository
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.