The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #92: scraping Depop in a cost-effective way

THE LAB #92: scraping Depop in a cost-effective way

Delegate or in house solution? A cost-wise perspective.

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Aug 29, 2025
∙ Paid
2

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #92: scraping Depop in a cost-effective way
Share

In these pages, especially in The Lab articles, we often provide solutions to bypass specific anti-bot protections on particular websites, just as we’re doing today.

However, instead of focusing on the code for bypassing Cloudflare on Depop (which I’ll provide in the usual repository anyway), I wanted to share a broader perspective on how to approach anti-bot solutions in general, along with some numerical examples.


Before proceeding, let me thank Decodo, the platinum partner of the month. They are currently running a 50% off promo on our residential proxies using the code RESI50.

Claim your discount here


Depop: history and dimension of the website

Depop is a social shopping app and online marketplace where users can buy, sell, and discover secondhand and vintage fashion, streetwear, and unique items. Just like other second-hand websites, such as Vinted or Mercari, it blends elements of e-commerce and social media, allowing sellers to create profiles, post listings with photos, and interact with buyers through likes, comments, and direct messages. Its focus is mainly on clothing and accessories, with a catalog of over 30 million items across the men's and women's categories.

Given its dimension and relevance in the second-hand fashion industry, Depop is a gold mine of data, and it doesn’t surprise me that it’s a requested target for scraping (and even a well-protected one).


This episode is brought to you by our Gold Partners. Be sure to have a look at the Club Deals page to discover their generous offers available for the TWSC readers.


🛜 Buy private Socks5 & HTTPS proxies from 0.9 USD per IP


💰 - 50% Off Residential Proxies


💰 - 1 TB of web unblocker for free at this link

Depop anti-bot protection

Depop is protected by Cloudflare Bot Detection, configured in a quite restrictive way. In fact, just like most e-commerce websites, it has internal APIs to paginate items, which you can easily detect by inspecting the network traffic with the browser.

Depop Internal APIs

However, if you attempt to replicate this request via cURL, you encounter an error nine times out of ten, even if you don’t have to pass a Cloudflare cookie, as it usually happens when CF is involved.

curl 'https://webapi.depop.com/api/v3/search/products/?what=&cursor=MnwyNHwxNzU2NDU3OTE5&items_per_page=24&country=us&currency=USD&groups=tops%2Cbottoms%2Ccoats-jackets%2Cfootwear%2Cjumpsuit-and-playsuit%2Cdresses%2Csuits&is_kids=false&gender=male&force_fee_calculation=false&from=in_country_search' \
  -H 'accept: */*' \
  -H 'accept-language: en-US,en;q=0.6' \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -H 'depop-device-id: 323e03e3-4a96-4f48-b00d-d10e3dd19147' \
  -H 'depop-search-id: b24dc603-9470-47fd-9f8a-f15ecccf5c49' \
  -H 'depop-session-id: f0f1c081-d064-4491-8128-f4bacfb59362' \
  -H 'dnt: 1' \
  -H 'origin: https://www.depop.com' \
  -H 'pragma: no-cache' \
  -H 'priority: u=1, i' \
  -H 'referer: https://www.depop.com/' \
  -H 'sec-ch-ua: "Not)A;Brand";v="8", "Chromium";v="138", "Brave";v="138"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'sec-gpc: 1' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36' \
  -H 'x-cached-sizes: true'

So, how can we scrape data from it, given that there’s a bot protection?


Need help with your scraping project?


Camoufox: a bazooka to kill mosquitoes

The first idea that comes to mind when dealing with an anti-bot is to create a solution that lasts over time; therefore, we might be tempted to use a browser and extract data from it.

In this case, I’m using Camoufox to load the men’s catalog page and intercept the API calls happening at the network level (thanks to

Antonello Zanini
for the inspiration for the code).

from camoufox.sync_api import Camoufox
import json
import time
import random
import os

with Camoufox(
    headless=False, # To save resource
    geoip="86.62.29.238" # An IP from the US to geolocate the browser in the US
) as browser:
    page = browser.new_page()

    # Intercept the AJAX API request to the Depop APIs
    def handle_response(response):
      # If the API request is the one responsible for getting product data
      if response.url.startswith("https://webapi.depop.com/api/v3/search/products/"):
          try:
              # Export product data to JSON
              json_data = response.json()
              with open("mens.json", "w", encoding="utf-8") as f:
                  json.dump(json_data, f, indent=2)
              print(f"✅ mens.json file created successfully!")
              print(f"Captured API response from: {response.url}")
          except Exception as e:
              print(f"Failed to get JSON from response: {e}")

    # Attach the response handler
    page.on("response", handle_response)

  
    response = page.goto("https://www.depop.com/category/mens/")

    # Wait for the product to appear on the page
    page.locator("#main ol").wait_for(timeout=60000)

    # Scroll down until mens.json is created
    print("Starting to scroll down to capture products...")
    
    scroll_count = 0
    max_scrolls = 50  # Safety limit to prevent infinite scrolling
    
    while scroll_count < max_scrolls and not os.path.exists("mens.json"):
        # Scroll down using mouse wheel
        page.mouse.wheel(0, 1000)  # Scroll down by 1000 pixels
        
        # Wait for content to load
        time.sleep(random.uniform(2, 4))
        
        scroll_count += 1
        print(f"Scroll attempt {scroll_count}...")
        
        # Check if file was created
        if os.path.exists("mens.json"):
            print("🎉 mens.json file found! Stopping scroll.")
            break
    
    if os.path.exists("mens.json"):
        print("✅ Success! mens.json file has been created.")
    else:
        print("❌ mens.json file was not created after maximum scroll attempts.")

    # Release the browser resources
    browser.close() 

It’s quite self-explanatory: we load the page and, since the API call does not immediately happen, we scroll down the page until new products are loaded and the mens.json file is created. using the handle_response method, we intercept the API calls and, when we find the right one, we save the results in the JSON file.


Before continuing with the article, I wanted to let you know that I've started my community in Circle. It’s a place where we can share our experiences and knowledge, and it’s included in your subscription. Give it a try at this link.

Join the community


This simple scraper is working, but is it the best option for scraping the whole website?

Every API call returns 24 items. Let’s suppose that the information retrieved from that is enough for our needs. Considering that, including only men's and women's clothing, we have 34.4 million items, it means we need to make 1.43 million API calls.

The scraper needs to load the page, scroll down, and wait for the items to be shown, so we can estimate an average of 3 seconds per call, which, when multiplied by the total number of calls needed, makes a total of 4.3 million seconds of execution time (around 1195 hours).

If we deploy this scraper to AWS, using a t3.medium instance (below that, a browser barely runs), which costs 0.0416 USD per hour, we’re spending 50 USD for a single run, just for the execution runtime. This does not consider any potential technical issues, such as the degradation of performance during the usage of a browser (the more pages you load, the more memory you’ll need to run it).

Then you probably also need a proxy. Considering that just the HTML of the home page of the woman category, without the images (which can be excluded), weighs 450Kb, the JSON response of the API is 350KB, and there are many other API loaded when scrolling, we can estimate a rough 1MB per request of bandwidth. Multiplying by 1.43 million requests, we could easily pass the 1.5 TB of bandwidth for a single snapshot. For such a volume of bandwidth, according to the ProxyPrice calculator, the cost starts from 2000 USD, definitely something for deep pockets.

Delegating a browserless solution: ScraperAPI

But what if we find a way to use the internal API and delegate the bypass of the antibot to external tools, in this case, ScraperAPI?

First of all, we need to understand how the API works: when calling the API, we need to pass a few values as string parameters:

what=&cursor=MnwyNHwxNzU2Mzg5NzI2&items_per_page=24&country=us&currency=USD&groups=tops%2Cbottoms%2Ccoats-jackets%2Cfootwear%2Cjumpsuit-and-playsuit%2Cdresses%2Csuits&is_kids=false&gender=male&force_fee_calculation=false&from=in_country_search

As you can see, there’s no mention of a page number, but just a cursor.

In fact, the first values we get in the response are

{
    "meta": {
        "result_count": 24,
        "cursor": "MTR8MzEyfDE3NTY0NzMwMDY",
        "has_more": true
    },

The cursor is a placeholder that indicates the starting point for the following API call.


The scripts mentioned in this article, like the Scrapy scraper and the Camoufox one, are in the GitHub repository's folder 92.DEPOP, available only to paying readers of The Web Scraping Club.

GitHub Repository

If you’re one of them and cannot access it, please use the following form to request access.


So, leveraging the ScraperAPI scraping API, we’re able to create a simple Scrapy script that iterates through the API calls using this instruction:

return scrapy.Request(
            url=api_url,
            method='GET',
            headers=headers,
            callback=self.parse_search_response,
            meta={'params': params, 'cursor': cursor, 'proxy': YOURPROXYURL},
            dont_filter=True  # Prevent deduplication
        )

Since we don’t need to render the JS on the website to have a correct response, each request costs one credit, so we’re well inside the “Business” credit plan, which costs 299 USD for 3 Million credits.

Since proxies are included, we’re basically spending 143 USD in credits for 1.43 million API calls.

Build your own browserless solution with Scrapy

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share