The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #1: Scraping data from an app
Copy link
Facebook
Email
Notes
More

THE LAB #1: Scraping data from an app

How to inspect the network traffic of an app with Fiddler Everywhere and scrape the data from its servers.

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Sep 04, 2022
∙ Paid
14

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #1: Scraping data from an app
Copy link
Facebook
Email
Notes
More
4
Share

This is the first post of “THE LAB”: in this series, we'll cover real-world use cases, with code and an explanation of the methodology used.

The Web Scraping Club is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Why scrape data from an app?

I usually write in this newsletter about how to extract data from websites but what if our target is an app with no web interface? Or if the website is too complex to scrape and, knowing the target has an app, we want to test if there's another door to access the data.

Since we cannot see the source code of the app, we must intercept the requests made from it to its servers and try to replicate them in our scraper.

To do so, we need a network traffic analysis tool. In this post, we'll use Fiddler Everywhere to support us, but there's plenty of similar software on the market, like Wireshark.


How Fiddler Everywhere works

Fiddler works like a "man-in-the-middle" between your target app and its servers.

How Fiddler works
How Fiddler works

First of all, we need to install Fiddler on our computer and set it up to decrypt HTTPS traffic, as described on this page.

The second step is to find out the IP of our computer and configure the network on our mobile phone, where the app is installed, to use our computer as a proxy, using the port opened by Fiddler to route the network traffic.

From now on, every request made by our mobile to external servers will be shown in Fiddler. This is extremely interesting to understand what happens after the installation of a new app and to find out hidden calls to external services we are not aware of.

Start scraping from an app: a real case

Going back to our web scraping project, let's see a real-world case of how to use Fiddler.

We'll use the app of the well-known streetwear brand Off White as an example, even if they have a website to scrape data from.

Before starting our test, let's close all the apps on the phone in the background to reduce the noise in the requests made by our phones.

As soon as we open the app, we will see all these requests to Facebook and the Off-White internal API.

List of requests from the app
List of requests from the app

Let's try to open a catalog page to see what happens.

I browsed to man's ready-to-wear and, after entering the section, noticed this request.

API requests made by the app

Bingo!

There's an API that retrieves the product information from the server.

Let's try to replicate the same request, with the same headers, on a Scrapy project to see if it works.

I'll start with a basic scraper, that requests only this API call, so after starting a new project in Scrapy, my spider will look like the following.

Scrapy first lines
Scrapy first lines

While the settings.py file will have the variable DEFAULT_REQUEST_HEADERS set with almost all the headers we read on Fiddler.

Scrapy default headers
Scrapy default headers

As you may have noticed, I've deleted the "Accept" and "Accept-Encoding" voices, otherwise, Scrapy would mess out with the response output format.

I've deleted also the Cookie parameter because I wanted to be sure that this configuration will work also in the future and not only now because we have the right Cookie assigned.

At the moment we’ll use the Bearer token read from Fiddler even if probably it won’t work in few hours, but we’ll see later how to get it dynamically.

The scraper should print the resulting JSON on the terminal.

Scrapy results
Scrapy results

It works!

Now let's define the Item properties and fields and read the JSON. I suggest copying the whole JSON output into any online formatter tool, like https://jsonformatter.curiousconcept.com/, it helps a lot to understand its structure.

Item class declared in Scrapy
Item class declared in Scrapy
JSON formatted result
JSON formatted result

Let’s use the JSON structure to assign fields to Item fields, the scraper will run like the following.

Scrapy first 60 results
Scrapy first 60 results

60 items scraped, perfect, the same number as the items in the JSON!

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More