THE LAB #100: Hybrid Scraping - One Browser Login, Thousands of HTTP Requests

Building a pipeline that uses Camoufox for authentication and curl_cffi for extraction on Akamai-protected targets.

Mar 19, 2026

∙ Paid

Browser-based scraping tools have become the default answer when a website deploys anti-bot protection. When a target runs Akamai, Cloudflare, or Datadome, the natural reflex is to reach for Playwright, Puppeteer, or one of their stealth variants like Camoufox or Pydoll. And it works. A real browser renders JavaScript, solves challenges, and presents a legitimate fingerprint. The success rate is high.

But a browser does everything the hard way. It downloads the full page, parses HTML, executes JavaScript, renders the DOM, loads images, fonts, and stylesheets. For each request, it allocates hundreds of megabytes of RAM and takes seconds to complete what an HTTP client could do in milliseconds. When a pipeline needs to scrape ten pages, this overhead is irrelevant. When it needs to scrape ten thousand pages, the browser becomes the bottleneck.

Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.
Claim your offer

Consider a concrete scenario: we need to monitor the wishlist of an e-commerce account, pulling product data, stock levels, and price changes every hour across hundreds of items. Running Camoufox for every single API call would mean spinning up a full browser instance, navigating to each page, waiting for JavaScript to execute, extracting the data, and closing. For a hundred items, that is minutes of execution time and gigabytes of memory. The same API calls through an HTTP client would complete in seconds using a fraction of the resources.

As we measured in THE LAB #96, HTTP clients with TLS impersonation can be 27x faster than browsers on the same target. The difference is not marginal. It is the difference between a pipeline that runs on a single machine and one that requires a cluster.

For your scraping needs, having a reliable proxy provider like Decodo on your side improves the chances of success.
Try Decodo Now

The problem is that these two approaches are usually treated as mutually exclusive. Either you use a browser for everything, accepting the overhead, or you try an HTTP client and hope the anti-bot system does not block it. But many websites only need a browser at the gate: for the login, the initial challenge, or the session establishment. Everything after that is plain API calls.

If we can use a browser to earn a valid session and then hand it off to an HTTP client, we get the reliability of browser automation where it matters and the speed of HTTP everywhere else. That is the pattern we want to build. But the handoff is not as simple as copying a few cookies, and the traps along the way are worth understanding before building a pipeline around this idea.

The hybrid pattern

The idea is simple in principle. Many websites require a browser only at the gate: the login flow, the initial anti-bot challenge, or the session establishment. Once that gate is passed, subsequent requests are plain API calls or page fetches that do not require JavaScript execution. If we can extract the session state from the browser and replay it through an HTTP client, we skip the browser for 99% of the work.

The session state, in practice, means cookies. An authentication flow sets session cookies that the server trusts for subsequent requests. If we transfer those cookies from the browser to an HTTP client, the server should treat the HTTP client as the same authenticated user.

But cookies alone are often not enough. Modern anti-bot systems like Akamai do not just check whether you have the right cookies. They also check whether the client presenting those cookies looks like the same client that earned them.

This is where TLS fingerprinting enters the picture: if the browser that logged in was Firefox, but the HTTP client that reuses the cookies presents a Python TLS fingerprint, the server may reject the request or simply drop the connection without responding.

So the real challenge is not just transferring cookies. It is maintaining continuity across two different execution models: the browser and the HTTP client must look like the same entity to the server.

Check the TWSC YouTube Channel

Tool landscape

For this experiment, we used two tools.

Camoufox is a custom Firefox build designed for stealth. It spoofs fingerprints (WebGL, canvas, audio, navigator properties), patches headless detection vectors, and uses Playwright’s Juggler protocol for automation. We covered it extensively in THE LAB #65: Scraping Datadome-protected websites with Camoufox. Its role here is limited to one thing: logging in.

curl_cffi is a Python binding for curl-impersonate, a modified version of curl that mimics the TLS and HTTP/2 fingerprint of real browsers. It supports impersonating Chrome and Firefox at specific versions, which means it can present the same TLS fingerprint as the browser that established the session. Unlike a browser, it uses negligible resources per request and can process thousands of pages per minute.

The key property that makes this pairing work: Camoufox is Firefox-based, and curl_cffi can impersonate Firefox’s TLS fingerprint. The server sees a consistent Firefox identity across both steps.

You can find the code in our GitHub repository reserved to paying users, inside the folder 100.HYBRID_SCRAPING.

Need help with your scraping project?

The target: Net-a-Porter

We chose Net-a-Porter as our target. It is a luxury e-commerce platform protected by Akamai Bot Manager, with authenticated features (wishlists, account details) exposed through internal JSON APIs. This gives us a clean test case: the login requires a real browser (Akamai blocks automation tools at the login endpoint), but the authenticated API calls are plain HTTP requests that return structured JSON.

Please keep in mind that this is an experiment for study purposes, and we’re not inciting you to scrape Net-a-Porter or any other website, especially the part behind a login.

Before diving into code, we need to understand what we’re dealing with. Net-a-Porter’s architecture has three layers relevant to us:

Akamai Bot Manager sits in front of everything. It sets a cluster of tracking cookies (_abck, bm_sz, bm_s, ak_bmsc, and others) that are generated through JavaScript execution on the client side. These cookies prove that a real browser visited the page. Without them, API calls either fail or hang indefinitely.

The login API at /api/nap/wcs/resources/store/nap_il/loginidentity/v2 accepts a JSON payload with email and password. On success, it returns a 201 status with an Ubertoken in the response body. This token is the key to all authenticated endpoints.

Authenticated API endpoints like the wishlist API at /api/nap/wcs/resources/store/nap_il/wishlist/v2/{id} require both the session cookies and the Ubertoken passed as an x-ubertoken header. They return clean JSON with product details, stock levels, and metadata.

The experiment: what worked and what did not

We did not arrive at the final solution directly. The investigation path itself reveals the constraints of session handoff, so it is worth walking through each attempt.

Continue reading this post for free, courtesy of Pierluigi Vinciguerra.

Or purchase a paid subscription.

The Web Scraping Club