One of the things that fascinates me most about web scraping (and life in general) is finding a creative but pragmatic solution to complex challenges.
If we consider web scraping, one of these is bypassing anti-bots without emptying your pockets into commercial solutions.
I wrote about my journey creating a few homemade proxies some time ago. These proxies helped save us thousands of dollars in the early days of our firm, Re-Analytics.
With just a Raspberry PI and later with some mini PCs, we were able to avoid using mobile proxies from providers since, at that time, they were charged 40 USD per GB.
Today, with prices lowered to 6-8 USD per GB and our usage level, it makes no sense to keep this infrastructure, as it was unreliable in the long term and required too much attention.
This experiment did not sate my thirst for building this stuff, and recently, I’ve been sucked again into the rabbit hole of building an in-house mobile farm. Unfortunately, there’s not much information about it online, but the BlackHatWorld Forum is a good starting point.
In this article, I’m writing an initial brainstorm, combining ideas on how to create a mobile farm—a project I’d love to complete, and updating both these pages and the YouTube channel during the process.
Why a mobile proxy farm?
Have you ever seen those videos with thousands of remote-controlled mobile phones opening apps and browsing the web?
Well, every time I see this stuff, there’s the inner child's voice in me saying that I should own such a thing.
But vocations aside, it’s an interesting way to learn how things work and could also be convenient economically.
Crunching some numbers
Nowadays, at least in Italy, mobile plans with unlimited GB cost 29 EUR per month. These plans are usually not limitless, but they have a high cap on the GB number, usually around 500GB, which, if bought at 5 EUR per GB, would cost 2500 EUR. Imagine having just 10 mobile SIMs; you could get 5TB per month for 290 EUR, while buying it would be almost ten times more expensive.
Of course, there is a start-up fee, as you need to buy devices, the server(s) to manage them, USB hubs, and so on, but these costs are negligible compared to the long-term gains.
But it’s not just a matter of money.
Consumer-grade devices browsing the web
If we wanted to monetize our infrastructure by reselling GB of traffic, we could use dongles instead of mobile phones. However, if we want to build an infrastructure for scraping, we should set up a real device farm with mobile phones. If we distribute our scraping operations through them, the target website will see an actual device visiting it from a mobile IP, making it almost indistinguishable from a real user. That would be a first-class scraping infrastructure.
How a mobile proxy farm works
Developing a robust mobile proxy farm requires careful selection and configuration of several key components. In this section, we dive into the specifics of each component, discussing the pros and cons of using mobile devices versus dongles, the role of USB hubs, and additional details to help you design an efficient infrastructure.
Mobile Devices and Dongles
Generally, we have two options for generating mobile IP addresses: smartphones (mobile devices) and dedicated dongles (modems).
Mobile devices have some pros: They can browse the web via their browser, creating a perfect fingerprint for scraping. You can control them using apps and install scripts for their management. Last but not least, they’re usually more reliable than dongles. Of course, they’re also more expensive than dongles and require more power to work.
On the other hand, dongles are cheap, simple to deploy, and smaller, so they’re probably a better choice for larger deployments if you need just IPs.
In any case, if you choose dongles or mobiles, you need SIM cards (physical or virtual), to enable your connectivity
The Need for a USB Hub
When deploying multiple dongles or mobile devices, you may face connectivity constraints on your control server or PC. In these cases, a USB hub becomes essential:
Expansion of Ports: A USB hub allows you to connect multiple dongles to a single machine, overcoming the limited number of USB ports available on standard hardware.
Powered USB Hubs: For stable operation, especially with devices that draw significant power, a powered USB hub ensures that each dongle receives a consistent and adequate power supply.
Organized Setup: Using a hub can help organize and manage the physical connections, simplifying troubleshooting and maintenance.
Proxy Server Hardware and Software
To manage all the dongles or phones, you need to connect them via the USB hub to a central server (or a cluster of them).
The central server manages operations, restarts the devices, monitors the infrastructure status, and routes traffic over the SIMs.
In fact, once the mobiles are up and running, you cannot connect directly to them since each IP address is assigned to hundreds of different devices.
To connect to your devices and execute your scraping operations, you need to use the server address and a defined port. Depending on your chosen port, traffic will be directed to a device via a reverse SSH tunnel.
We can choose to build our own dongle management software, or we can be supported by solutions like:
localtonet: This service creates reverse SSH tunnels for you, so you can expose the IP of your devices without the need to centralize it all on your server (2 USD per Month per tunnel).
Iproxy: to make your mobile devices act like a proxy and manage them via a dashboard.
ProxySmart: proxy and manager for both mobiles and dongles
Proxidize: through them, you can rent with them both hardware and software
This is a high-level description of the architecture and the pieces of hardware and software needed for running your farm, and, depending on our use case, a piece of software or hardware is better than others.
Use cases and challenges
If we want to enter the proxy industry as IP sellers, we can probably use dongles and connect them via one of the paid software we’ve seen before.
We did the math before, and from the number, it seems a great business model, but we need to consider several aspects:
is your country attractive enough for the proxy market? the proxies you’ll put on the market will have the IPs of the country where your rack is physically located. For sure you need to do some marketing research to understand if your country is interesting enough for the buyers. Luckily, you can just start small and then scale as the demand rises.
You need to find buyers: the proxy market is quite crowded, and it’s difficult to stand out with a proper marketing effort. You can also sell your IPs to bigger companies, but of course, they would take a cut from the final price.
Legal issues: If you open a business to the public, people could use your IP addresses to commit malicious actions. Be prepared to keep the logs and actions of your customers to help authorities since the first person they would go to is the owner of the IP from where the action started.
The configuration is more complex if you want to use this infrastructure for internal scraping with mobiles. You need to install the Playwright server on the Android mobiles to keep it up and running and connect scrapers. You also still have to manage the devices, rebooting them to assign a new IP and being careful not to interrupt the scrapers’ execution.
In this case, you’ll probably need to code the managing software, but it’s also true that you can schedule IP changes less frequently.
In any case, this is a project I’d like to start sooner or later, and please let me know in the comments section if you’re interested in following it from the beginning to the end and where: YouTube, Newsletter or both?
I imagine this goes against most carrier ToS? You'll probably need each SIM to be under a different name too.
That's a very good article, Pierluigi.
I do not recommend Raspberry PI, but any desktop with a decent CPU (over 5000 rank on cpubenchmark).
For the USB hub, I would like to recommend any of the ORICO/SIPOLAR industrial hubs.
For the software management, go with ProxySmart.