Celebrating the 50th article of The Lab series
A brief review of the first 50 episodes of The Lab series
I started The Web Scraping Club newsletter at the end of August 2022, almost two years ago.
Until today, I published 180 articles, almost all written by myself, and between these today we reached the 50th episode of The Lab series.
But let’s take a step back and let me share again my journey with the new readers who joined the community only recently (thanks to all of you, from the bottom of my heart!)
Who’s writing The Web Scraping Club?
Hey everyone, it’s me, Pier.
I started doing some arcane web scraping in C++ back in 2009 and some years later, together with
, we started our first company, Re Analytics.It’s basically a web data factory, focused on the fashion industry, that sells insights to investors, brands, and retailers. We scored some goals, like being part of the most famous market researchers for investors and being partners with some famous magazines like Vogue Business borrowing data for some articles but we understood during all these years that web scraping is still an inefficient industry: too many small players making war each other instead of networking to serve more customers.
Despite selling web data as a business is made online, it still works like an artisan boutique. A customer enters your lab, asks for a custom pair of shoes, and then you craft a model that fits only for him. Of course, you’re reusing for every customer your own tools, but you need to charge a lot since it’s time-consuming and requires a great effort.
Sounds familiar?
Well, it was surely our case, and then we thought: why do all the companies like ours need to scrape the same websites? I’m sure each of us has a peculiar data model, but can’t we agree on some common columns that cover 80% of the cases so that at least our basic needs are satisfied?
By creating Databoutique.com, a marketplace for web-scraped data, it’s enough that only one seller scrapes the data correctly and, by creating an economy of scale, he can sell the same dataset multiple times at a fraction of the extraction costs.
In this way, we’re productizing the production of datasets, by leveraging on an almost infinite workforce of freelancers and companies.
Going back to the shoemaker’s example, we’re bringing the mass production of Nike’s shoes to the world of web scraping. If you need a custom shoe for a wedding, you can definitely ask for a custom one, but now you’ve got the option to buy ready-made shoes at a fraction of the cost, for your everyday life.
We started with Databoutique.com some months ago and the response has been quite good for now, considering that it’s a completely new business model for the web scraping industry. There are other data marketplaces, but we’re focused on web data so that we can apply a series of controls on both the data quality and the products listed.
The number I’m most proud of is this: when you’re looking for a website already available on Databoutique, you can get the data in 3 clicks. But if you ask a legit website that is not listed on the website, most of the time you can get the data in 1/2 days, thanks to our community of sellers.
We’re working hard to increase the number of buyers, in order to make the flywheel go faster, so feel free to contact me if you’re interested in more info.
So, this is what I do in my life: 24/7 involved in the web scraping ecosystem, from coding to writing articles, passing through its business aspects.
Why do I choose to write The Web Scraping Club?
Well, as every programmer I look for info on the internet for my bug-fixing sessions. Unluckily any other traditional programming job, it seems that the less you talk about scraping techniques online, the better it is.
This is because web scraping has always been seen as a sort of black market of data, while it’s just a tool like other ones.
Since I was used to taking some notes for my daily job, I decided to start sharing them online and found out that my need for information is common to many web scraping professionals who are struggling to find a bypass solution for their tasks.
Some people may think that sharing these solutions online could make life easier for anti-bot providers. You know what? Solutions like Cloudflare, Datadome, and others were constantly updated even before I started writing this newsletter, so even if some of their employee are subscribed to The Web Scraping Club (hi guys! 👋🏻👋🏻) I’ll keep writing about solutions for bypassing anti-bots.
This is appreciated by the community, with almost 3k subscribed readers in less than two years.
I think that these numbers are explained by the fact that I’m sharing my experience in web scraping without a hidden agenda: I’m for sure not the smartest person in the industry but at least I’m writing genuine content without the need to sell you anything and I’m not getting paid by any company to write, making me free to write openly about what works and what don’t.
Last but not least, these numbers are also a sort of appreciation for my commitment.
In 2024 I planned to write two articles per week, giving space for free to partner companies for a third day. It takes some time and energy, but I love doing it.
What is The Lab series?
The articles called The Lab, which I try to write every Thursday, are special articles available for paying subscribers only.
In these posts, I usually test some tools and try new techniques to bypass anti-bot solutions, with also a private repository accessible for subscribers.
Here are the top five articles of The Lab for the number of readers:
In the fifth position, we find our recent article about bypassing Cloudflare using Kameleo.
In the fourth position, again another Cloudflare bypass article but older.
On the lower step of the podium my first article about bypassing Datadome. Unluckily the solution is two year olds and no more valuable but I’m planning an update soon.
In the second position, there’s Cloudflare again. It’s quite old article, but I wrote more about it recently.
And in the first place, thanks to a great success also on Hacker News, the first The Lab episode.
The first spike in the growth of the newsletter you see in the readers chart it’s due to this article being featured on the first page of Hacker News for almost a day long.
As you can see from the topics, they are the more time-consuming articles but also the most valuable and I decided to put them behind a paywall because there’s no free lunch in this world. I prefer to be honest with the community and say that I monetize this blog with a paid subscription despite keeping all the content free but with the need to accept money from other companies to write good reviews about them.
But do you know there are some ways to read all the The Lab articles for free?
The most immediate one is using the free 7-day trial option. You can redeem it once for your account and then cancel the subscription if the articles don’t satisfy you. It’s sad for me when it happens, but that’s life.
You can recommend The Web Scraping Club to your friends. The more friends subscribe as a reader the longer the free period is. Just check the recommendation page to see how it works.
You can ask your company to pay the subscription for you and your colleagues. You and your colleagues will have your articles for free and your company will have a 20% discount on the subscription.
If your company wants more than 5 subscriptions, please write me at pier@thewebscraping.club for an extra discount.
Of course, the Sunday’s articles will always be free and I try to provide some value for the readers also there.
Any feedback on the content is welcome, so if you have any suggestion and comment, please write me, it’s very important to me.