Last Wednesday Oxylabs held the 2024 edition of Oxycon, their annual virtual event where they showcase new products, success stories, and best practices for web scraping.
I’ve been able to attend almost the whole conference and wanted to highlight three speeches that intrigued me.
Ensuring Scalability in Data Collection: Key Components, Challenges, and Advancements
Žydrūnas Tamašauskas, CTO of Oxylabs, talked about large-scale web scraping operations and best practices: this is an evergreen topic, relevant for companies in this situation that can quickly benchmark their infrastructure and tools by listening to this presentation.
From this talk, we can examine the complexity of today’s web scraping environment: more sophisticated anti-bots, rising costs to bypass them, and more tools needed to scrape a bunch of websites. This means a more complex infrastructure, less reliability, and more spaghetti code in our environment.
This is why more and more proxy companies are promoting more advanced unblocker solutions: you can write basic scrapers with Scrapy (or other browserless frameworks) and delegate all the anti-bot bypass to third parties like Oxylabs.
Oxymouse
Another interesting speech was the one by Tadas Gedgaudas, who described how mouse movement is used by anti-bots to separate humans from bots.
We discussed it in one of the previous episodes of The Lab, in which we used the ghost-cursor Python package to bypass Datadome.
Tadas made a brief history of this technique, which was brought to light in the first instance during a trial against Meta in 2018. The company admitted publicly that it was tracking mouse movements on its website to block bots like many other anti-bot companies do nowadays.
After this introduction, Tadas showed how to detect if a website is tracking your mouse events using the developer’s tools of the browser, and then revealed a new open source package released by Oxylabs: Oxymouse.
Oxymouse is a more powerful version of the ghost-cursor library since you can use different algorithms to create the mouse movements and it can be integrated with Playwright and Selenium with no effort.
As you can easily imagine, this is the kind of package that will be soon tested in one of the next articles of this newsletter.
OxyCopilot
The centerpiece of the event has been the presentation of the OxyCopilot, an AI-powered web scraper assistant.
This is not a new concept, many new tools are coming on the market with “AI powers” applied to web scraping. Just some weeks ago I tried to place them on a map.
Most of them focus primarily on HTML parsing using AI, which is a great thing, but unluckily it’s only half of the game.
OxyCopilot, instead has two main features:
custom parser builder, that can create a parser for a specific website you can use in your scrapers
request builder, which helps users build the request code for Scraper API without needing to understand our documentation and field logic.
Both of them rely on the Oxylabs Scraper API infrastructure, so you don’t have to worry about the anti-bot measures since the API will take care of JS rendering and all the stuff needed.
The premises for this tool are excellent and can’t wait to test it soon: if the expectations are met, OxyCopilot will allow more and more people access to web scraping, since it’s made to lower its entrance barriers.
Of course, a dedicated article on it will be soon available in the newsletter.
And did you attend Oxycon this year? What did you like the most? Feel free to write it in the comments section!
Like this article? Share it with your friends who might have missed it or just leave feedback for me about it. It’s important to understand how to improve this newsletter.
You can also invite your friends to subscribe to the newsletter. The more you bring, the bigger prize you get.