What to expect from The Lab posts in 2024

Why I'm writing the "The Lab" articles and what to expect this new year

Jan 04, 2024

If you’ve been reading this newsletter for some time, you should be familiar with “The Lab” series of articles, but since many of you just joined recently, here’s a brief description. They are hands-on guides and solutions to common issues in the web scraping world: how can I bypass the anti-bot X, how can I make my bot undetectable from a human browsing and stuff like that.

three clear beakers placed on tabletop — Photo by Hans Reniers on Unsplash

In most cases, together with the description of the techniques, the code used in them it’s available on the Github private repository.

Botht the repository and the full article are behind paywall, since these articles are the most time consuming to write and test.

Generally speaking, writing articles for The Web Scraping Club requires hours every day and, while I’m making my best for you to extract value from this newsletter for free, every kind of support through paid subscriptions in kindly appreciated.

Why I’m writing The Lab articles?

To get rich, of course! 🤑🤑🤑

Just kidding, of course. The main reason why I write The Web Scraping Club is to share my experience in web scraping with other professionals who are facing similar challenges. That is something I would like to have read in the past, it could have been useful for my career, and I hope my notes are interesting for you now.

The biggest missing content online about web scraping activities, in my opinion, were solutions to real issues, on real websites. Since it’s a cat-and-mouse game, there’s the common idea that if you talk openly about detailed web scraping solutions, your words will be used by anti-bot companies to fix their products.

While I keep for myself the secret Cola recipe, I think that sharing these these techniques in public could bring more benefits to the webscraping community than make harms. Is it better for you to read a solution for your web scraping issues today even if, maybe, in the future won’t work anymore? Or you prefer to spend more weeks without delivering data to your final user/customer?

I really needed in the past some articles where it was described clearly what to do to bypass the anti-bot X, but the only articles I’ve found around were the ones of the commercial solutions, which at the end suggested to buy their services. This is perfectly legit and we’ll keep covering also commercial solutions on this pages, but it was not what I was looking for at that moment.

So I decided to write them by myself, and to me this is the most satisfying task, since it forces me to keep updated, study, make some research and, at the end, becoming a better web scraping professional. And I hope that by sharing the results of my studies with you, I’m also helping you somehow in your daily tasks.

What we covered in 2023 with The Lab?

During the past 35 articles, written mostly in 2023, we’ve faced several challenges, most of them created by anti-bots.

We started the 2023 by scraping OpenSea, the NFT marketplace, to understand how the sales of the Bored Ape Yatch Club sales evolved.

The Web Scraping Club

THE LAB #9: Scraping OpenSea NFT's data

The NFT hype cycle Welcome back to The Web Scraping Club for a new episode of The Lab. In the past week, a scandal that involves the famous influencer Logan Paul and his crypto project called “Criptozoo” exploded, thanks to the Cofeezilla investigations (you can see…

3 years ago · Pierluigi Vinciguerra

This involved bypassing it Cloudflare protection and, to keep track of the transactions, also scraping Etherscan which, at that time, was protected by Clouflare too.

From this example, you can understand the pros and cons of writing accurate solutions to scrape real websites: these website change and what it was real one year ago, maybe not could not work anymore. But this also means that we always have something to write about and study, since the web is constantly evolving.

In February instead I wrote the first edition of “The anti-detect anti-bot matrix”, where I compared 5 solutions (undetected-chromedriver, pyppetteer, and 3 different Playwright setups) against 5 anti-bots to see what’s working against who.

The Web Scraping Club

THE LAB #11: The Anti-Detect Anti-Bot matrix

This post is sponsored by Oxylabs, your premium proxy provider. Sponsorships help keep The Web Scraping Free and it’s a way to give back to the readers some value. In this case, for all The Web Scraping Club Readers, using the discount code WSC25 you can…

2 years ago · 1 like · Pierluigi Vinciguerra

This article was extremely interesting and probably it will become a recurring one during 2024.

Always in February we had this great article by

Fabien Vauchelles

(his Linkedin here) where he explains how to scrape data from a mobile app using Charles Proxies and Android Studio.

The Web Scraping Club

THE LAB #12: Reverse-engineering Mobile API

This post is sponsored by Smartproxy, the premium proxy and web scraping infrastructure focused on the best price, ease of use, and performance. In this case, for all The Web Scraping Club Readers, using the discount code WEBSCRAPINGCLUB10 you can save 10% OFF…

2 years ago · 4 likes · Fabien Vauchelles

This is the power of creating a community: everyone brings in its knowledge and expertise and Fabien is a master of web scraping techniques.

Of course, the core of The Lab are the articles where we bypass anti-bot solutions like Cloudflare, PerimeterX, Kasada and Akamai. They are the biggest challenges today for web scrapers and so they are the main focus for this series of articles.

The most successful The Lab Articles in 2023

As a confirmation of what just said, the podium of the top 3 articles of The Lab for the 2023 is occupied by these topics.

In third position we have our article about hRequests, where we benchmark this tool against the top 5 anti-bots on the market, and see where it works and where it’s not enough.

The Web Scraping Club

THE LAB 32: hRequests vs anti-bots: a full benchmark

In one of the past articles, I wrote about hRequests (human requests), a Python package that enhances traditional HTTP requests with more features, like headless browsing, real TLS fingerprints from browsers, and many others. We tested the hRequests package against Akamai…

2 years ago · 4 likes · Pierluigi Vinciguerra

In second position we have the already mentioned Anti-detect Anti-bot matrix, again another benchmark against these anti-bots.

The Web Scraping Club

THE LAB #11: The Anti-Detect Anti-Bot matrix

2 years ago · 1 like · Pierluigi Vinciguerra

In first position our early 2023 update of the scraping solution against Cloudflare, which probably will need an update soon also for 2024.

The Web Scraping Club

THE LAB #14: Scraping Cloudflare Protected Websites (early 2023 version)

This article is sponsored by MobileHop, your mobile IP proxy provider. MobileHop provides native mobile IPs on dedicated 4G/5G modems via Verizon and AT&T Wireless to bypass almost all website blocks. A single multihop license gives you access to 50 USA markets and growing…

2 years ago · 2 comments · Pierluigi Vinciguerra

What to expect in 2024

Talking about 2024, what’s cooking for this year in The Web Scraping Club kitchen?

As mentioned in our previous 2023 recap, The Lab will become a weekly issue from a bi-weekly 2023. This is made to bring more value to the readers, since I acknowledged that these articles are the ones most wanted by the community.

As already mentioned there will be some recurring issues like the Anti-Detect Anti-Bot matrix (maybe with a most readable name), and some other focuses on anti-bots.

And you what you would like to read on these pages?

The Web Scraping Club

What to expect from The Lab posts in 2024

Why I'm writing the "The Lab" articles and what to expect this new year

Why I’m writing The Lab articles?

What we covered in 2023 with The Lab?

The most successful The Lab Articles in 2023

What to expect in 2024

Discussion about this post