Introducing the Web Scraping 101 Wiki
A collaborative way to share basic knowledge about web scraping
This article is sponsored by Serply, the solution to scrape search engine results easily.
Web Scraping Club readers can save 25% on all SERP scraping plans by using the code TWSC25.
Web Scraping 101 Wiki project description
The Web Scraping Club was created with the purpose of sharing and collecting experiences, tutorials, news, and real-world use cases about the web scraping industry and all its nuances.
As the name Club suggests, it’s not a top-down knowledge base but it’s a collaborative environment where we exchange ideas via our Discord Server or other means. Every industry expert can contribute to the community, sharing his expertise via detailed articles on substack (this is what did Fabien with this article, as an example) or simply helping others on Discord.
Actually, via Substack, we have in-depth articles about various aspects of web scraping, interviews with key people involved, and once a month we have a news recap to stay up-to-date with what happened in the industry. But interacting with the community, I’ve felt we were missing something in this offer: a common knowledge base about web scraping.
I’m aware there are hundreds of tutorials on the web about “What is web scraping?” but since The Web Scraping Club is promoting education about web scraping in a free and unbiased way, we cannot leave behind also the basic questions that come to mind when people approach this industry.
It’s like building the Wikipedia of web scraping: there are surely hundreds of pages on the web that explain who is Napoleon Bonaparte but this doesn’t prevent Wikipedia to have its page about Napoleon, since there are still people who don’t know who Napoleon is.
How this will work?
Just like for Wikipedia, all the knowledge about web scraping cannot be collected from a single author. The only way to achieve a decent coverage of all aspects of web scraping is to collaborate in a curated environment, where all the domain experts and partners of The Web Scraping Club can select a topic and contribute to the community by writing down their knowledge.
Practically speaking, here in Notion there will be:
A public database of approved topics and their status, if already covered or not. If there’s still no article about it, an author can pick it up and ask to write about it.
A public database of authors, basically a list of approved writers that can submit articles to the Wiki. You can submit your willingness to participate by writing to firstname.lastname@example.org. Every author will have a dedicated section here on Notion where to share articles.
A public database of suggested topics, not included in the approved ones, where everyone can suggest the next topics to work on.
What does a wiki article look like?
Having different authors on the same platform, if left without rules, could lead to a messy wiki with content having different styles, formats, and content styles.
For this reason, I’ve already started writing some articles and I think I’ve found the best format for a wiki-style page to share inside a dedicated section of The Web Scraping Club substack, obviously for free. You can see the actual state of the 101 Wiki on this page, linked directly on The Web Scraping Club homepage.
I’ll release soon a template for the articles and some guidelines for the writing, but I’m open to suggestions.
What do you think about this 101 Wiki project? Do you think it will be useful? Are you willing to participate? Write me your feedback here in the comment or via email.
The Lab - premium content with real-world cases
THE LAB #8: Using Bezier curves for human-like mouse movements
THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting
THE LAB #4: Scrapyd - how to manage and schedule a fleet of scrapers
THE LAB #2: scraping data from a website with Datadome and xsrf tokens
The Web Scraping Club is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.