The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #3: Scraping Cloudflare protected websites
Copy link
Facebook
Email
Notes
More

THE LAB #3: Scraping Cloudflare protected websites

Without buying any external software, for real.

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Sep 27, 2022
∙ Paid
9

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #3: Scraping Cloudflare protected websites
Copy link
Facebook
Email
Notes
More
Share

Here’s another post of “THE LAB”: in this series, we'll cover real-world use cases, with code and an explanation of the methodology used.

In the future, this kind of content will be available only to paying subscribers. Being one of the first of the series, this one will be available for free until the 2nd of Oct 2022, then will be behind a paywall.

The Web Scraping Club is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Being a paying user gives:

  • Access to Paid Content, like the post series called “The LAB”, where we’ll go deep diving with code real-world cases (view here as an example).

  • Access to the GitHub repository with the code seen on ‘The LAB”

  • Access to private channels on our Discord server

But in case you want to read this newsletter for free, you will always get a post per week about:

  • News about web scraping

  • Anti-bot software and techniques insights

  • Interviews with key people in the industry

And you can always join the Web Scraping Club Discord server

Enough housekeeping, for now, let’s start.

What is Cloudflare?

Cloudflare NET 0.00%↑ is an American company, based in San Francisco, offering several services like DDoS mitigation services, Distributed DNS, Content Distribution Networks, and also anti-bot protection for websites.

On its anti-bot protection it uses both passive bot detection techniques like TCP, TLS, and HTTP fingerprinting and also active ones like Canvas fingerprinting and CAPTCHAs. On top of all this, it queries the browser to identify any automation tool and monitors what happens on the page, to track mouse movements and all actions that can make a bot detectable.

At this moment, it's one of the toughest solutions to bypass in a web scraping project. I think anyone who has some experience in this field has encountered this screen at least once in his life.

Since there's no silver bullet to avoid being blocked, we'll see 3 similar but not identical solutions for scraping 3 different websites:

  • brownsfashion.com

  • antonioli.eu

  • ssense.com

As usual, we’re scraping public product price data, without logging in, and at a speed that doesn't harm the business of the target website.

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Market data by Intrinio
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More