The Lab #58: Intercepting traffic from an App - part 1
Discover API endpoints called by an App to scrape its data
Sometimes a website is so hard to scrape that we’d like to test how its app works and emulate its behavior, hoping that the underlying endpoints would be less protected. In some other cases, we simply don’t have a web version to scrape for the data we need.
In the past few days, I’ve worked again on this topic and wanted to share with you some tricks and techniques used to intercept traffic sent from an app and discover its underlying endpoints.
The Man in the Middle Approach
The title is self-explanatory: instead of directly connecting the app with the target server, we put third-party software intercepting the traffic in the middle of them.
So we basically install the App on our Android or iPhone device, or even on a virtual device created by Android Studio.
Then, always on this device, we typically need to install a root certificate, generated by the network monitoring tool we’ll use, that will allow us to decrypt the HTTPS traffic between the device and the target website.
But let’s take a step back and see how HTTPS works. A good starting point can be this great post on
’s where you can understand all the details of the protocol.How HTTPS Works
HTTPS (Hypertext Transfer Protocol Secure) is an extension of HTTP (Hypertext Transfer Protocol) designed to provide secure communication over a computer network. The primary objective of HTTPS is to ensure the confidentiality, integrity, and authenticity of data exchanged between a client (such as a web browser or mobile app) and a server. HTTPS achieves this through the use of SSL/TLS (Secure Sockets Layer/Transport Layer Security) protocols.
Key Components of HTTPS
SSL/TLS Protocols: SSL (Secure Sockets Layer) and its successor, TLS (Transport Layer Security), are cryptographic protocols that provide secure communication. They use a combination of symmetric and asymmetric encryption to secure data transmission.
Certificates and Public Key Infrastructure (PKI): HTTPS relies on digital certificates issued by Certificate Authorities (CAs) to authenticate the identity of servers. These certificates are part of the Public Key Infrastructure (PKI) that ensures the validity and integrity of public keys.
Steps in an HTTPS Connection
Client Hello: The client initiates a secure connection by sending a "Client Hello" message to the server. This message includes information about the supported SSL/TLS versions, cipher suites, and other encryption-related parameters.
Server Hello: The server responds with a "Server Hello" message, selecting the SSL/TLS version, cipher suite, and other parameters from the client's list.
Server Certificate: The server sends its digital certificate, which contains the server's public key. The client uses this certificate to verify the server's identity.
Key Exchange: The client and server exchange cryptographic keys. This process often involves the client generating a pre-master secret, encrypting it with the server's public key, and sending it to the server. Both parties then use this pre-master secret to derive a session key.
Session Key Establishment: The derived session key is used for symmetric encryption of the data exchanged between the client and server. Symmetric encryption is faster and more efficient for bulk data transfer.
Secure Data Transmission: With the session key in place, the client and server can securely exchange data. Each message is encrypted, ensuring confidentiality, integrity, and authenticity.
One of the tools I use to monitor network traffic is Fiddler Everywhere, which requires some steps to take to be ready to do its job.
The Role of Root Certificates in Fiddler
Fiddler is a web debugging proxy tool that can intercept and inspect HTTP and HTTPS traffic between a client and the internet. To decrypt and inspect HTTPS traffic, Fiddler uses a man-in-the-middle (MITM) approach, which involves generating and using root certificates.
Root Certificates and Certificate Authorities (CAs)
A root certificate is a digital certificate issued by a trusted Certificate Authority (CA). Root certificates are at the top of the certificate hierarchy in the Public Key Infrastructure (PKI) and are trusted implicitly by clients (browsers, operating systems, and other applications). These root certificates are pre-installed on devices and are used to validate the authenticity of other certificates issued by subordinate CAs.
How Fiddler Uses Root Certificates to Intercept HTTPS Traffic
Generating a Root Certificate: When Fiddler is installed, it generates a unique root certificate. This certificate is used to sign other certificates dynamically created by Fiddler for the domains it intercepts.
Installing the Root Certificate: The generated root certificate must be installed and trusted on the client device. This step ensures that the client will trust the certificates signed by Fiddler's root certificate.
Intercepting HTTPS Traffic: Once the root certificate is installed, Fiddler can intercept HTTPS traffic. When a client attempts to establish a secure connection to a server, Fiddler intercepts the connection and presents its own dynamically generated certificate to the client.
Decryption and Inspection: Because the client trusts Fiddler's root certificate, it accepts the dynamically generated certificate and establishes a secure connection with Fiddler. Fiddler then establishes its own secure connection to the actual server using the server's certificate. This MITM setup allows Fiddler to decrypt and inspect the HTTPS traffic.
Re-encryption and Forwarding: After inspecting the traffic, Fiddler re-encrypts the data using the session key it established with the client and forwards it to the server. This process is transparent to both the client and the server, allowing Fiddler to inspect the data without disrupting the secure connection.
As always, if you want to have a look at the code, you can access the GitHub repository, available for paying readers. For this article, there’s no code to show but you can see the full repository here.
If you’re one of them but don’t have access to it, please write me at pier@thewebscraping.club to get it.
If you prefer open-source alternatives we have mitmproxy, Wireshark, and Frida, each of them with different nuances and peculiarities.
But let’s continue with Fiddler and see how we can intercept network traffic from one app.
A real-world use case: Saks Fifth Avenue app
The first thing to do is to set up our device, virtual or physical.
Keep reading with a 7-day free trial
Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.