THE LAB #8: Using Bezier curves for human-like mouse movements
What are Bezier curves and why are important in web scraping?
Here’s another post of “THE LAB”: in this series, we'll cover real-world use cases, with code and an explanation of the methodology used.
The Web Scraping Club is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Being a paying user gives:
Access to Paid Content, like the post series called “The LAB”, where we’ll go deep diving with code real-world cases (view here an example).
Access to the GitHub repository with the code seen on ‘The LAB”
Access to private channels on our Discord server
But in case you want to read this newsletter for free, you will always get a post per week about:
News about web scraping
Anti-bot software and techniques insights
Interviews with key people in the industry
And you can always join the Web Scraping Club Discord server
Enough housekeeping, for now, let’s start.
What is a Bezier curve?
In computer graphics, connecting point A to point B, we use lines that can be categorized as straight or curved. The first ones are easily implemented in software, while the second, while easy to draw for humans, are much more difficult for computers.
In 1962 a French engineer working for Renault called Pierre Bezier published his studies about drawing curves well-suited for design work, using mathematical functions.
The Bezier curves are parametric curves where you define a set of control points, that determine its shape and curvature and interpolate the points in between the results.
A much more detailed explanation can be found at this link, where you can deep dive into all the mathematics aspects.
Why Bezier curves are interesting for web scraping?
As said before, Bezier curves create smooth curves for going to point A to point B, when applied to mouse movement. While a native move function on Playwright will use a straight line for going from A to B, using a Bezier curve trajectory will make the movement seem more human-like. Of course, this comes to be interesting for web scraping when we’re facing anti-bot solutions that track the user behavior to detect anomalies: reproducing a more human-like mouse movement should trigger fewer red flags.
Warning: this paragraph will contain some math!
As we noted before, what we need to implement a Bezier curve is:
A set of control points, the curve will pass from some of them and others will set other parameters.
The ratio R that represents the density of the interpolation. From 0 to 1, a ratio of 0.1 means that there will be 10 points t between the start and the end of the curve, each at the same distance. With a ratio of 0.5, there will be only 1 point t in between.
Given the following formulas for the various types of Bezier curves
we’ll see how to implement a cubic one, that requires 4 control points.
Setting the control points and ratio
I’ve chosen the following four control points. The first and the last are the start and the stop of the curve, so basically, we’ll have a curve that will be like a semi-circle but, given the coordinates of the second and the third point, will likely be more angled on the first half and smoother on the second one.
control_points = [[200, 200], [230, 400], [280, 300], [300, 200]]
Since I’d like to see the curve drawn on the monitor, I’d like to see many points belonging to it.
Using the following command from NumPy package, we’re basically setting 100 points between 0 and 1
t = np.linspace(0, 1, 100)
This means that every point is distanced of 0.01 and so we get an array of 100 values like the following:
[0.01, 0.02, 0.03] and so on.
Calculating the intermediate points
Using the cubic formula, now we’re gonna calculate the coordinates for every point of the curve.
For the X coordinates, we’ll use the X of the 4 control points, while t is the value from the interval array calculated before.
So given the generic formula the formula, the 4 control points set up before, and the t=0.01 for the first point, we’ll translate the following
and it equals 200.897.
Same for Y where
Repeating this for every interval t, will give us the list of the points of the curve.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.