THE LAB #82: How to scrape Vinted using their internal APIs
Why analyzing both app and websites is key for a successful web scraping project
Marketplaces like eBay or Vinted are always an interesting target for a variety of actors: from brands willing to monitor the second-hand market to scalpers willing to get something on sale and resell it for profit.
Given all this interest in them, these marketplaces needed to protect themselves from bots and fraudulent activity, becoming a hard target to scrape.
In this episode of The Lab, our series of in-depth articles, we’ll focus on Vinted, and we’ll see how we can ethically scrape it after studying its backend.
What is Vinted?
Vinted is a second-hand marketplace born in 2008 in Lithuania, where people can buy and sell used stuff from 21 different countries. According to Similarweb, the website has almost three million visits, while the app has been downloaded 32.8 million times in 2023, with 105 million registered users.
With hundreds of millions of items on sale on the platform, there’s a vibrant ecosystem of sellers and buyers, and, like in every efficient market, people are trying to make some money out of the platform by buying low and selling high, aided by bots in some cases.
To mitigate the bot traffic, the platform introduced some anti-bot solutions such as Datadome. In the next chapters, we’ll see how we can scrape public data from the website by using its internal APIs and a technique called “cookie factory”.
Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.
How the Vinted anti-bot system works
Vinted has both a website and an app, so let’s see how they both work to choose the most stable and easiest approach for scraping.
The website
Let’s say we want the data from a specific category of items on the platform. When browsing the website, we can see that an internal API with the following endpoint handles the pagination of these items.
https://www.vinted.it/web/api/core/catalog/items?page=1&per_page=96&time=1745925814&search_text=&catalog_ids=1920&catalog_from=0&brand_ids=&status_ids=&color_ids=&material_ids
Of course, if we try to copy this URL and paste it into a new tab of the browser, we get an error message.
This is because we need a set of necessary authorization cookies to see the data.
In particular, our focus is on:
access_token_web: It’s an authorization token that lasts two hours, as we can see by copying it into an online JWT debugger.
refresh_token_web: which is a token that is used as a parameter when we need to refresh the auth token seen before
datadome: This is the token that we get when we pass through the security check of Datadome. Its presence unequivocally tells us that the API calls we’re going to make won’t work until we get a valid Datadome token, which can be pretty challenging.
Thanks to the gold partners of the month: Smartproxy, Oxylabs, Massive, Scrapeless, Rayobyte, SOAX, ScraperAPI and IPRoyal. They prepared great offers for you, have a look at the Club Deals page.
The mobile app
Inspecting the traffic of the mobile app with HTTP Toolkit gave us another point of view on the backend.
The endpoint for extracting products for a certain category is the following
https://www.vinted.fr/api/v2/homepage/women?column_count=2&homepage_session_id=f2c89e7b-27b5-4d99-8e76-3e2bf1d55c23&version=3
And, in case we need to retrieve more items, we need to add the following URL parameter
https://www.vinted.fr/api/v2/homepage/women?column_count=2&homepage_session_id=f2c89e7b-27b5-4d99-8e76-3e2bf1d55c23&next_page_token=feed_max_score:1&version=3
As you can see, there’s no traditional concept of page, but the more time we call this endpoint with the same session ID, the more items we get.
In this case, the authorization token is used in the request headers
We don’t have any refresh_token, and even more interestingly, not a Datadome token. This means that while using the web version, Datadome is configured to protect the endpoint. However, on the mobile app, this is not true. It’s not something unseen before: it happens that mobile endpoints are not under bot protection, mainly for two reasons:
They’re usually a bit trickier to find, so unskilled scrapers are not using them and website owners choose not to protect them
Most of the traffic comes from the app, and since these anti-bot systems are billed per request, using them only for the website and not for the app makes the company save a lot of money.
From analyzing the website and the app, it seems easier to use app endpoints since Datadome does not protect them, but we still need to figure out how to obtain the authorization tokens.
How to get an authorization token
This is the real first issue we have to manage for scraping Vinted. In fact, both inspecting the website and the app, there’s no call to any token issuer.
It’s not rare to see on some apps a call to a particular endpoint that issues a token that will be used later. In these cases, all you have to do is reproduce the same call, grab the token, and then modify the request headers to include it.
In the Vinted case, the token seems to be generated on the flight in the backend: from the first calls, both on mobile and on the website, we see the tokens in the headers or in the cookies (in case of the website).
The good thing is that the tokens generated by the website can also be used for the app, and this made me think about a solution for our scraper.
The scripts mentioned in this article are in the GitHub repository's folder 82.VINTED, which is available only to paying readers of The Web Scraping Club.
If you’re one of them and cannot access it, please use the following form to request access.
The cookie factory
The idea is pretty simple: what if I build an API endpoint that, when called, returns me a valid token to use on Vinted mobile endpoint?
So I can call it before my script, get the token, and then use the Vinted mobile API endpoint to scrape data.
Well, it seems a brilliant idea, but I still haven’t figured out how to generate these tokens, since when I load the home page of Vinted, the cookies are generated in the first place with no API calls.
But what if my cookie factory does exactly the same thing?
Code example
Thanks to Cursor, I’ve created this small API endpoint running locally on my Mac.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.