Welcome to the latest post of The Web Scraping Club of 2024. I’ll take a small break for the holiday season and return on January 5th with a new post.
In this episode, I’ll discuss some of the milestones The Web Scraping Club reached and missed and give a brief outlook on what to expect in 2025.
How was the 2024 for The Web Scraping Club
I can summarize the 2024 of The Web Scraping Club with one word: growth!
The total number of newsletter readers has more than doubled this year, and there has always been zero budget allocated for advertising, just through word of mouth and SEO. I’d love to thank each of you for your help in this process: every time you share an article, you help this newsletter grow.
For this reason, at the end of each article, you can find the button for the TWSC referral program. The more friends you invite, the bigger reward you get.
Looking at the numbers, the growth was not explosive but steadily increasing. It took sixteen months to reach the first 2000 readers, while in 2024, with a few days left in the year, the new subscribers were 2231.
Also, the paying readers grew by 150% this year (really, thank you!), so The Web Scraping Club became a real company this year, with a VAT ID and other bookkeeping stuff.
This means more work for me in accounting, but I can also invest more money in tools and people.
The main expenses I incur for running TWSC are:
AI and other tools (OpenAI, Cursor, Make.com, Grammarly, Notion, Riverside for the videos, and others)
Social channels and SEO (Ahrefs, Linkedin Premium, X, Discord)
Scraping tools
Bookkeeping
People (video editor)
During the summer, I worked on a new writing process that improved the quality of the articles, or at least I think it did. I was able to write and review them more efficiently, and I hope you’re enjoying them.
AI is also helping me in this: all the content you see on this blog is original and not written by an AI, but I use it in three different phases:
Idea generation
Generating the initial structure of the article, but most of the time, I change it to reflect my writing style.
Double-check for grammar errors and for rephrasing some difficult sentences together with Grammarly.
Just for fun, I took some online tests of tools that control the usage of AI in some text. I’ve used the first part of the latest article about RabbitMQ, and the results were quite funny:
One solution states that I used every AI tool on earth to write it (and to sell me its AI content generator, which is undetectable by the other tools).
The Grammarly test, instead, said that 20% of the text seems to be AI-generated. The number could be pretty accurate if this includes fixes and rephrases made by Grammarly itself.
Instead, The ZeroGPT test stated that GPT wrote 6% of the text that passed (basically, two sentences).
Two sentences that I asked ChatGPT to rewrite because I was unhappy with my outcome are marked as GPT-written text. Therefore, I think this is the most accurate number.
In the end, AI helps me explain my ideas better, but it does not write them down, which leads me to another point I’d like to introduce.
A call for authors
This is a club, but the band playing the music is almost always the same: me. In 2024, including this article, I published 107 posts, of which 93 were written by me.
This can be interesting for some time, but I’d like to add more variety of voices and content to these pages.
If you have experience and a strong background in web scraping and are willing to share your expertise with us, feel free to write to me at pier@thewebscraping.club.
It can be a one-off or a continuous collaboration, ranging from technical web scraping aspects to more architectural and business solutions.
The Web Scraping Club YouTube Channel
During this year, I launched the TWSC YouTube Channel with the first batch of five interviews. The ambitious target for 2025 is to have at least 50 videos on the channel, so I’m looking for more interesting people in the industry to be in contact with.
In particular, I’d like to interview who is:
developing solutions to bypass anti-bots
working in the anti-bot industry
working on unique products used for web scraping
working on companies that use a large amount of web data
working in AI companies and need web data for training their models
If you fall into one of these categories (or you can introduce me to someone who does), please write me at pier@thewebscraping.club.
My hidden dream? Interviewing someone at OpenAI / Google / Perplexity to understand how they fulfill the need for data of their models. Will you help me with this?
Missed objectives for 2024
While 2024 has been a great year for TWSC in many senses, I would like to have done more in some others.
First, I would have liked to have maintained more continuity in the benchmarks for the unblocker and anti-detect browsers.
I wanted to update each of them every three months, but the anti-detect browser article is very demanding and time-consuming, so I could not do it.
I also wanted to record more interviews for the YouTube channel during 2024, but due to time constraints, I could not.
Lastly, I wanted to make the TWSC Discord server a more interactive space with more valuable features, but I still didn’t find a way to improve it.
What to expect for 2025
We talked enough about 2024, so let’s start thinking about the future. I wish to see more variety of authors and ideas on TWSC and more videos on YouTube.
I also wish I could focus more on tests on tools and solutions, even if these articles take longer to create and require more people and companies to be involved.
For example, I’m starting a benchmark between Shopee scraping solutions: if you work for a company that offers one of those, please write to me since I’d be interested in using it for this article.
Thanks to LLMs, web scraping is becoming cool again (even if it always seems like a secret). We can already see two trends that will be part of the 2025’s articles for sure:
advanced browser automation tools for creating AI Agents (but also scrapers)
new solutions for transforming websites in data for LLMs
These two topics, along with AI in general, will appear more frequently on these pages over the next year.
What would you like to see on these pages in 2025? Please let me know in the comments section.
That’s it for 2024. I hope you have a great end-of-year. I’ll be back in two weeks with more energy and ideas for this newsletter.
Like this article? Share it with your friends who might have missed it or leave feedback for me about it. It’s important to understand how to improve this newsletter.
You can also invite your friends to subscribe to the newsletter. The more you bring, the bigger prize you get.
To be clear, each user will have a refer link, people access the link and I get credit for the refer. So each access to the website though link, you will log it and update in database, right? I don't know much about backend stuff, just ask.