THE LAB #96: Scraping Nike.com with 5 open source tools

Match your tool to the protection, not the brand

Jan 29, 2026

∙ Paid

Nike.com is one of the most scraped e-commerce targets on the web. Competitors track pricing, researchers analyze catalog changes, and aggregators build product databases. But also think about the sneaker resale market, which alone generates billions in annual revenue, and much of that ecosystem relies on scraped data: release dates, stock levels, and price fluctuations across regions.
So it’s understandable why this website is so popular among scraping professionals, and, at the same time, is protected by anti-bot measures.

For this reason, we tested five open-source scraping tools on 1000 Nike product URLs to measure success rate, speed, and reliability. The results challenge a common assumption in the scraping community: that modern e-commerce sites require browser automation to reliably extract data.

Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.
Claim your offer

Nike.com system model

Before testing, we need to understand what Nike.com actually serves and where potential blocking might occur.

Nike.com is protected by both Akamai Bot Manager and Kasada, but the two systems guard different parts of the site. Akamai handles the public-facing catalog, including product pages and search results. Kasada protects authenticated flows, such as login and checkout. This layered approach makes sense from Nike’s perspective: catalog data is semi-public anyway (they want customers to browse), while account actions carry real business risk.

We focus exclusively on public catalog data in this article. Scraping behind authentication raises legal and ethical concerns we prefer not to encourage. For the product catalog, we only need to bypass Akamai. We found no trace of Kasada challenges or fingerprinting scripts on product pages during our tests.

Nike product pages are server-side rendered. When you request a product URL, the server returns complete HTML with product data embedded in the DOM. This means that, unlike many modern e-commerce sites, which use client-side rendering, where the initial HTML is a shell and product data loads via JavaScript API calls. This is relevant for us, since there’s a practical consequence for scrapers: JavaScript execution is not required to extract product information because the data is already in the first response.

So we can use simple HTTP requests, but with some tweaks: modern WAFs inspect TLS handshake characteristics (cipher suites, extensions, ordering) to identify non-browser clients. This is where tools like Python’s requests library fail immediately: its TLS fingerprint looks nothing like Chrome or Firefox.

HTTP/2 fingerprinting adds another layer: header order, pseudo-header placement, and SETTINGS frames can reveal automation tools. Even if your TLS handshake passes, sending headers in the wrong order or with unusual HTTP/2 settings can trigger detection.

First of all, you need IPs with good reputations for scraping. For this reason, we’re using a proxy provider like our partner Ping Proxies, that’s sharing with TWSC readers this offer.

💰 - Use TWSC for 15% OFF | $1.75/GB Residential Bandwidth | ISP Proxies in 15+ Countries

On top of that, IP reputation plays a role. Datacenter IPs receive more scrutiny than residential ones because most legitimate users browse from home or mobile networks.

For this test, we focused on TLS and HTTP/2 fingerprinting. We scraped from a residential IP, which neutralized IP reputation as a variable. We did not interact with the page, so behavioral signals were not applicable for HTTP clients. We observed no JavaScript challenges on Nike product pages during testing. This last point is crucial: Nike could deploy Akamai’s JavaScript challenge on product pages, but they have chosen not to. Whether this is a deliberate trade-off (challenges slow down real users) or an oversight, we cannot say. But it opens the door to HTTP-based scraping.

Check the TWSC YouTube Channel

The Tool Landscape

The five tools we tested fall into two categories: browser automation and HTTP clients with fingerprint emulation.

On the browser side, Pydoll is an async Python library built on Chrome DevTools Protocol. It controls Chromium without WebDriver, avoiding the navigator.webdriver flag.

Camoufox takes a different approach: it is a custom Firefox build that spoofs fingerprints (WebGL, canvas, audio, navigator) and patches headless detection vectors.

Scrapling sits somewhere in between, offering multiple fetcher types from simple HTTP to full browser automation via Playwright. Its StealthyFetcher wraps Chromium with anti-detection features while the basic Fetcher uses just requests with TLS impersonation.

On the HTTP client side, Rnet is a Rust-based Python client that emulates browser TLS and HTTP/2 fingerprints (JA3, JA4, Akamai). It supports Chrome, Firefox, Safari, Edge, and OkHttp profiles.

Undetected-httpx is built on `curl_cffi` and provides browser-identical TLS fingerprints without running a browser.

The fundamental difference: browser tools execute JavaScript and render pages, while HTTP clients make requests with browser-like signatures but cannot handle JS-dependent content. This distinction matters because it determines both what you can scrape and how fast you can do it. A browser spins up an entire rendering engine, consumes hundreds of megabytes of RAM per instance, and waits for network events, DOM parsing, and JavaScript execution. An HTTP client sends a request and receives bytes. The performance gap is enormous when the extra capability is not needed.

In case you’re still struggling with browser automation you can try out rayobrowse - a self-hosted Chromium stealth browser from Rayobyte. Have a look at it here.
💰 - Rayobyte is offering an exclusive 55% discount with the code WSC55 in all of their static datacenter & ISP proxies, only to web scraping club visitors.
You can also claim a 30% discount on residential proxies by emailing sales@rayobyte.com.

Test Setup

We extracted 1000 product URLs from Nike’s sitemap (Austria EN locale). The sitemap is publicly accessible and provides a clean list of product URLs without requiring crawling. Each tool scraped the same URL set sequentially, with no delays between requests. This aggressive pacing represents a worst-case scenario for detection: real scrapers would typically add delays to reduce load and avoid rate limits.

Extraction logic: All tools used identical HTML parsing. We targeted stable data-testid attributes, which Nike uses for internal testing. These selectors are more reliable than class names, which often change with CSS updates:

# Title
soup.select_one('h1[data-testid="product_title"]')

# Price
soup.select_one('[data-testid="currentPrice-container"]')

# Color
soup.select_one('[data-testid="product-description-color-description"]')

# Style code (SKU)
soup.select_one('[data-testid="product-description-style-color"]')

This approach keeps the comparison fair: differences in results reflect fetching capability, not parsing logic.

Need help with your scraping project?

Tool configurations:

- Pydoll: Headless Chromium, 3-second wait after page load, new browser instance per URL

- Camoufox: Headless Firefox, networkidle wait, single browser session with new pages

- Scrapling: Fetcher.get() with impersonate=’chrome’ and stealthy_headers=True

- Rnet: BlockingClient with Impersonate.Chrome137, 30-second timeout

- Undetected-httpx: httpx.Client with browser-like headers, 30-second timeout

The browser tools opened new page contexts for each URL. The HTTP clients reused the same session.

As always, the full code can be found on The Lab GitHub private repository, inside the folder 96.NIKE, available only for paid subscriber of TWSC.

Continue reading this post for free, courtesy of Pierluigi Vinciguerra.

Or purchase a paid subscription.