What is a web unblocker and how does it work?
A brief introduction unblocker solutions that are becoming a must-have in any modern web scraping infrastructure
A few weeks ago I participated as a guest at a webinar hosted by NetNut where we introduced their new unblocker solutions (you can still register to see the recording).
By reading some of the questions raised during the webinar, I’ve felt that for some it’s not still clear what an unblocker is and what’s the difference from a traditional proxy.
In this article, I’m exploring these aspects and making it clear to people approaching these tools.
What is a web unblocker?
In a broader sense, a web unblocker is a tool that allows people to circumvent internet blocks, for example in the case of censorship or geofencing. VPNs could be considered a sort of web unblocker since they allow you to send and receive encrypted data and bypass restrictions in the browsing.
Referring to the web scraping industry, we define web unblocker as an API that allows our scraper to bypass anti-bot protections by using a set of features aimed at this purpose.
Since their development and maintenance require a lot of R&D and their usage needs computational power, I’m not aware of any unblocker that is not a commercial solution. Let me know in the comments section if you know some Open Source unblockers since I could have missed that.
How does a web unblocker work?
As mentioned before, web unblockers are commercial solutions so we’re not allowed to know all the technical details behind them, but we certainly can deduct the main features from the websites of their providers.
A set of common characteristics we can find is:
IP rotation via multiple types of proxies
browser fingerprinting
CAPTCHA solving
Javascript rendering
What we can assume from this list is that probably, the requests we’re sending to a web unblocker are routed in a browserless mode to the target website, by a simple datacenter or residential proxy. If it’s enough, the result is sent back to us, otherwise, a browser session, customized with real browser fingerprints and CAPTCHA-solving features is instantiated and used for trying to get the result.
This is a smart way to avoid wasting computing power for simple requests and keeping a flexible approach to scraping.
What are the differences between proxies and web unblockers?
Since most of the unblockers are provided by proxy providers, there could be some misunderstanding about the concept of web unblockers.
However, scrolling the features of web unblockers is enough to understand the differences between the two tools. While the task of proxies is limited to changing the IP address visible to the target website, web unblockers also do this but, on top, they add more features, like the ones listed before.
We can say that proxies are a component of the tech stack of a web unblocker and this also explains why many of the unblockers are sold by proxy providers.
Is there a web unblocker capable of bypassing every anti-bot solution?
You probably have already heard the sentence that says that web scraping is a cat-and-mouse game against anti-bot solutions. This is true not only if we decide to build your scraper fully by ourselves but also for companies that are building and maintaining unblockers. There are companies like Cloudflare that are pouring money into their R&D department to build more and more effective solutions for blocking bots and it’s not always so easy to keep their pace.
Additionally, for some solutions, every website could decide the level of protection needed or even combine two anti-bots together, so, unluckily, there’s no silver bullet for scraping every website available.
Since every web unblocker has its own success rate against certain anti-bots and also its pricing model, I’ve created the Great Web Unblocker Benchmark, a recurring article where I test different solutions against the most well-known anti-bots and the first edition went live in March.
The unblockers were measured for success rates, scraping time, and overall cost of extraction.
The results are quite interesting, and confirm the thesis that in a large web scraping project, probably one unblocker is not enough for tackling all the challenges we’re facing.
Do we really need a web unblocker for our scraping project?
If you asked me this question some years ago, probably I would have said no. The web was different, anti-bots were easier to bypass and in my web scraping venture, I started using them only recently.
Of course, they have a cost, and you need to select carefully your target solution if you don’t want to see your annual budget for scraping disappear in a few days. On the other side, how much does it cost, in terms of hours spent and eventual disruption of the data feed for your customer, to take the time to reverse engineer an anti-bot solution by yourself?
There’s no correct answer, probably companies or freelancers with tighter budgets are forced to find a home-made solution while bigger ones could approach easily these commercial solutions.