End of year recap for The Web Scraping Club

What happened in 2023 and what are the plans for 2024

Dec 31, 2023

Hi everyone and welcome to the last post of 2023. According to tradition, this is the time of the year when we have a look back at what happened and we plan our next steps for 2024.

The 2023 recap

The Web Scraping Club is a newsletter created by me, Pierluigi Vinciguerra, co-founder of Databoutique.com, a web-scraped data marketplace. If we are not yet connected on Linkedin, please let’s do it, I’m always willing to connect with people in the same industry. I’ve been working in the web scraping industry for 10+ years and I’ve felt there was a need for a community where web scraping experts could openly share tools, techniques, and experiences. As web scraping is becoming harder and harder, this necessity was more and more urgent, so I decided to create it by myself.

I’ve never been a tech writer, especially in English which is not my mother tongue, so please be indulgent if some posts are not that great. The opinions expressed are always mine, but since I’m not the ultimate expert on every topic related to web scraping, you’ll find also guest posts from other key people who agreed to share their expertise with us.

In any case, all the posts are written with great passion, and with the intention to provide value for the readers, and I think this is perceived by the audience, which is constantly growing.

The growth of The Web Scraping Club

During 2023 we surpassed two great milestones in terms of growth of subscribers to this newsletter. We started this year in 510 and ended with more than 2000 members.

Subscribers’s growth on The Web Scraping Club

This is a great achievement, considering the niche we’re operating in, but I think this community could reach and pass the 10.000 members, as web scraping is becoming more and more mainstream.

For this reason, if you know someone who you think could benefit from the insights contained in this newsletter, please share it with them.

Share The Web Scraping Club

Another great result is that we passed 600 members in our Discord Server, where we share suggestions, news, and offers from our partners.

A big thank you

Talking about partners, I’d like to thank all the companies and partners that shared with our community their offers and discount codes, available on this page.

In addition, until the 6th of January, I wanted to share with you these two promotions for the paid plan of The We Scraping Club, which allows you to read the whole archive of The Lab articles and access its exclusive GitHub Repository.

Group Annual subscriptions: 30% off for groups of more than 2 people, forever. Read The Web Scraping Club together with your friends and colleagues and save money! Valid only until the 6th of January.

Redeem TWSC group offer

Annual subscriptions: 20% off, forever, on your The Web Scraping Club annual subscription. Redeem by the 6th of January.

Redeem your annual subscription offer

The most popular content of the year

There’s no need to say that 2023 has been the year of Artificial Intelligence and this is confirmed also by the podium of the most-read articles.

In the third position, we have the guest post from Shawn Rushefsky from Salad, with its in-depth article about LLM-powered web scraping using a distributed infrastructure. This is a particularly interesting case since running scrapers on a distributed cloud of gaming PCs allows us to bypass device fingerprinting using a consumer device and a residential IP.

The Web Scraping Club

Web DRAGON - LLM-powered web scraping on a distributed cloud

3 years ago · 5 likes · 1 comment · Shawn Rushefsky

In the second position, instead, we have one of the earliest articles of The Lab about bypassing Cloudflare, which, as you know, is one of the hot topics in web scraping.

This is a typical article from The Lab collection, a series of articles where we dive into some practical solutions to solve scraping issues, like bypassing anti-bot solutions or trying new techniques and tools.

The Web Scraping Club

THE LAB #14: Scraping Cloudflare Protected Websites (early 2023 version)

This article is sponsored by MobileHop, your mobile IP proxy provider. MobileHop provides native mobile IPs on dedicated 4G/5G modems via Verizon and AT&T Wireless to bypass almost all website blocks. A single multihop license gives you access to 50 USA markets and growing…

3 years ago · 2 comments · Pierluigi Vinciguerra

The most-read article is about writing a scraper with ChatGPT, a few weeks after its launch. Long story short: ChatGPT didn’t have any plugin to fetch data live from the web at that moment, so when we asked to write a scraper for some websites, the results contained generic selectors not customized for the live website.

From then, a lot changed in the AI-powered web scraping landscape with tools becoming more and more sophisticated, and for sure in 2024 we’ll see more of them.

The Web Scraping Club

Writing a web scraper with ChatGPT. Is it a good idea?

3 years ago · 2 likes · 2 comments · Pierluigi Vinciguerra

What will happen in 2024

New projects and publishing calendar

In 2024 there will be some changes in the publishing plan, to provide more value to the community and transition The Web Scraping Club from a one-man-band show to a club, as the name says.

As you may already have noticed in the past weeks, there will be more space for guest posts from key people in the web scraping industry, to help me cover topics I’m not familiar with.

We’ve got already the first edition of The Legal Zyte-geist, a monthly column by Sanaea Daruwalla, the Chief Legal & People Officer at Zyte, where she’ll help us navigate the legal aspects of web scraping.

The Web Scraping Club

Legal Zyte-geist #1: Step-by-Step Guide to Compliant Web Scraping

Welcome to the monthly column about web scraping and legal themes by Sanaea Daruwalla. She is the Chief Legal & People Officer at Zyte. Sanaea has over 15 years of experience representing a wide variety of clients and is one of the leading experts on web data extraction laws…

3 years ago · 2 likes · 5 comments · Sanaea Daruwalla

We’ve also already hosted the guest post from Shawn Rushefsky from Salad, about LLMs and web scraping using Salad’s cloud architecture and more will be added during the year.

My hope is to host a guest writer every Tuesday, in order to bring new ideas and add more value to the club.

Another smaller change in the publishing plan regards our premium article series, The Lab. I’ve noticed these are the most appreciated and shared articles, so I’ll do my best to write more of them, creating a weekly issue of The Lab instead of the actual bi-weekly frequency, every Thursday.

They are the most time-consuming articles to write for me since they usually involve writing some code and thinking about useful use cases to show, and this is the reason why they are the only content that is paywalled. I think the price is more than affordable and I’ll keep it so since I’m not writing The Web Scraping Club to become a millionaire, but at the same time I’m spending hours every day on it and this must be taken into consideration.

Anyway, if you’re a student from some university around the world or in a difficult economic situation and you think that this content will help you become a better professional web scraper, feel free to reach me at pier@thewebscraping.club for a free plan.

Every Sunday, there will be as usual our posts available for everyone, with new tools review, our Web Scraping from 0 to Hero course every two weeks, new techniques, and web scraping news.

If you want to be updated about the weekly plan, I’ve just added a new channel on our Discord server where I’ll post updates on the upcoming articles.

This is all for today, I hope you’ve enjoyed the journey we’ve made together in 2023 and hope you’ll like even more the one we’re planning for 2024.

The Web Scraping Club

Discussion about this post

Ready for more?