The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting
Copy link
Facebook
Email
Notes
More

THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting

In other words: fake it until you scrape it

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Nov 08, 2022
∙ Paid
1

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting
Copy link
Facebook
Email
Notes
More
Share

Here’s another post of “THE LAB”: in this series, we'll cover real-world use cases, with code and an explanation of the methodology used.

The Web Scraping Club is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Being a paying user gives:

  • Access to Paid Content, like the post series called “The LAB”, where we’ll go deep diving with code real-world cases (view here an example).

  • Access to the GitHub repository with the code seen on ‘The LAB”

  • Access to private channels on our Discord server

But in case you want to read this newsletter for free, you will always get a post per week about:

  • News about web scraping

  • Anti-bot software and techniques insights

  • Interviews with key people in the industry

And you can always join the Web Scraping Club Discord server

Enough housekeeping, for now, let’s start.


As you surely know, the most advanced anti-bot solutions act on different levels:

  • at a behavioral level, they check how the scraper act and try to distinguish a bot from a human.

  • at a browser level, they try to distinguish a genuine browser from an automated version, looking for some incongruence in the setup.

  • at an HTTP level, they try to identify the device configuration to detect suspicious setups.

On our Discord server the focus was on this latest case, so today we'll try to explain how this can be achieved via TLS Fingerprinting and what can we do as a counter-measure in our scrapers.

Understanding TLS Fingerprinting

TLS fingerprinting is a passive (or server-side) fingerprinting technique used by servers to identify the configuration of the clients connecting to it.

The fingerprints are created using the ciphers exchanged when the connection between the client and servers establishes.

To better understand how this technique works, let's borrow the image from this Cloudflare blog post.

HTTP protocol
HTTP protocol

When we connect a client to a server, the first interaction is made by the TCP protocol. It's called Three-way Handshake, where the client and server share their willingness and availability to connect.

  • The client sends a SYN packet to ask for availability to the server for a new connection.

  • If the server is available, it replies with an SYN/ACK packet to the client.

  • The client again replies then with an ACK packet and the connection is established. From now on, the two can exchange data.

Without entering too many details about the full TLS protocol, we'll focus now on what happens after a connection is established.

The "Hello Message", the first one sent by the client after the handshake, is where data needed for fingerprinting are sent. The message will include which TLS version the client supports, the cipher suites supported, and a string of random bytes known as the "client random."

But the point is that ciphers differ from client to client: a Chrome connection has a different cipher suite than a Safari one or a Scrapy one, sent from the same machine.

Here are the ciphers of a connection made to google.com with Chrome from a Mac laptop.

	[8A8A]	Unrecognized cipher - See https://www.iana.org/assignments/tls-parameters/
	[1301]	TLS_AES_128_GCM_SHA256
	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA

Safari:

	[2A2A]	Unrecognized cipher - See https://www.iana.org/assignments/tls-parameters/
	[1301]	TLS_AES_128_GCM_SHA256
	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C00A]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
	[C009]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[C008]	TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA
	[C012]	TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
	[000A]	SSL_RSA_WITH_3DES_EDE_SHA

Scrapy:

	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[1301]	TLS_AES_128_GCM_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[009F]	TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[CCAA]	TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[009E]	TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
	[C024]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
	[C028]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
	[006B]	TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
	[C023]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
	[C027]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
	[0067]	TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
	[C00A]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[0039]	TLS_DHE_RSA_WITH_AES_256_CBC_SHA
	[C009]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[0033]	TLS_DHE_RSA_WITH_AES_128_CBC_SHA
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[003D]	TLS_RSA_WITH_AES_256_CBC_SHA256
	[003C]	TLS_RSA_WITH_AES_128_CBC_SHA256
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[00FF]	TLS_EMPTY_RENEGOTIATION_INFO_SCSV

They all differ in order and number of ciphers. It means that the server, using these ciphers and some other parameters sent, has an idea of my client's architecture as soon as I try to connect to it and can use this data to create fingerprints and block suspicious ones.

Credits to LWT Hiker Blog

This great LWT Hiker blog post, from where the previous table comes, digs deeper in detail and shows also two of the most know algorithms to create fingerprints used nowadays, the JA3 and the TS1.

Countermeasures

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More