THE LAB #104: Bypassing AWS WAF on IMDB with Scrapling

An hands-on test on tools for TLS spoofing and Scrapling

May 14, 2026

∙ Paid

AWS WAF is the protection we run into most often on Amazon’s public properties. It also sits in front of a long tail of third-party sites whose operators built on AWS and clicked the WAF checkbox. We wrote about it two years ago in The Lab #53: Bypassing AWS WAF, but this time our focus is just on AWS WAF. In fact, Traveloka used DataDome on top of AWS WAF, and our analysis had to account for both systems at once.

This time, we wanted AWS WAF on its own, in front of a target with nothing else in front of it, and we wanted to see what changes when the 2024 Scrapy-Playwright stack is replaced with the 2026 toolbox.

The target we picked is imdb.com. It is an Amazon subsidiary, runs a standard AWS WAF deployment, and Wappalyzer confirms that there are not others antibot on the website. That makes IMDB a perfect use case for our article.

Before proceeding, let me thank NetNut, the platinum partner of the month. Their set of solutions cover all your needs for scraping.
Visit Netnut

Today we’ll test three Python HTTP clients with strong TLS fingerprint impersonation: curl_cffi, the newer httpx-curl-cffi, and Rust-backed rnet. Each one produces a TLS handshake indistinguishable from real Chrome. Is that enough to scrape an AWS WAF target without spinning up a browser? And if not, what is the smallest browser step that gets us past the gate so the rest of the work can run on a cheap HTTP client?

The tools we used

Four libraries are in scope. Three are HTTP-only, one runs a real browser.

curl_cffi is a Python binding for the curl-impersonate patched curl. It exposes a requests-like API and ships impersonation profiles for recent Chrome, Firefox, and Safari builds and works at the TLS layer. JA3 and JA4 fingerprints match the impersonated browser, along with HTTP/2 settings and header order. We tested with chrome142, the latest Chrome profile in version 0.14.0.

httpx-curl-cffi is a transport for httpx that delegates the actual HTTP work to curl_cffi. While it does not add new fingerprinting capability, it implements the httpx programming model: sync Client, async AsyncClient, event hooks, the same response object you get from the rest of an httpx-based codebase. We tested with the Chrome profile and default_headers=True.

rnet is a Rust HTTP client with Python bindings. It implements its own impersonation stack rather than wrapping curl-impersonate. The enum rnet.Impersonate exposes a wide range of Chrome, Firefox, Safari, Edge, Opera, and OkHttp profiles. We tested with Chrome137.

Scrapling is the only browser-driven tool in the set. Our Scrapling: A Complete Hands-On Guide goes through the library in depth, with Cloudflare as the test target. Its StealthyFetcher drives a stealth-patched Chromium that runs JavaScript and applies fingerprint countermeasures. The library README only advertises Cloudflare Turnstile, but the same machinery handles AWS WAF’s challenge too.

Your scraping workflows deserve a proxy infrastructure that just works. With Swiftproxy on your side, consistency is built-in.
Try Swiftproxy today

How AWS WAF protects IMDB

A quick intro of the system helps interpret the results that follow. AWS WAF is not a dedicated anti-bot platform like DataDome or Kasada. It is a general-purpose web application firewall with a bot-control module that operators can enable per rule. When the bot-control rule is in challenge mode, AWS WAF inserts a single JavaScript gate at the start of a session.

A request without a valid cookie returns HTTP 202 with x-amzn-waf-action: challenge and a short HTML body. The body contains window.gokuProps containing three base64 blobs (key, iv, context), a <script src> pointing to a customer-specific URL on *.token.awswaf.com, and a small inline script that calls AwsWafIntegration.saveReferrer(), AwsWafIntegration.checkForceRefresh(), and AwsWafIntegration.getToken(). The remote challenge.js tests the browser environment, posts a validation payload back to AWS, and on success, the response sets Set-Cookie: aws-waf-token=.... The inline script then reloads the page, and the second request, now carrying the token, gets the real content.

This works very differently from systems that score every request. Once the token is in our jar, AWS WAF lets us through with no further behavioral checks beyond IP reputation and rate limits.
What we want to discover with this article is if we’re able to bypass AWS WAF with “convincing” requests, with a proper TLS fingerprint and set of headers, or if we need a JS rendering engine.

Check the TWSC YouTube Channel

Test setup

As always, the code can be found in our GitHub repository reserved to paying users, inside the folder 104.IMDB. If you’re not able to access the repository, please use this form to request access.

The libraries we pinned at the time of writing are curl_cffi==0.14.0, httpx==0.28.1, httpx-curl-cffi==0.1.5, rnet==2.4.2, scrapling==0.4.7. Python is 3.11.

Each HTTP test creates a GET against two URLs: the IMDB home page

https://www.imdb.com/

and a title page https://www.imdb.com/title/tt0111161/. We use two URLs to confirm the challenge fires the same way on both, not only on one entry point. We do not follow redirects (follow_redirects=False) because the AWS WAF response is a 202 with content rather than a redirect, and we want to see it raw.

We capture status code, HTTP version, the full response headers, any cookies, body length, and the first 600 characters of the body, and we saved everything to JSON under aws_waf_imdb/responses/ for later inspection.

The baseline probe in probe_plain.py uses an unmodified httpx.Client(http2=True) with a generic Chrome User-Agent header and the standard Accept headers. This is the control: no TLS impersonation, no fingerprint trickery, just a normal Python HTTP client.

Continue reading this post for free, courtesy of Pierluigi Vinciguerra.

Or purchase a paid subscription.

The Web Scraping Club