Hands On #5: Testing the Oxylabs Web Unblocker
Testing the Oxylabs Web Unblocker against the most used anti-bot solutions.
Hi everyone, this is a new series of posts from The Web Scraping Club, where I will try out products related to web scraping and make a sort of review about it. I hope this helps you to evaluate products before spending some money and time on testing them. Feel free to write me at pier@thewebscraping.club with any feedback and if you want me to test other products or solutions.
These Hands On episodes are not sponsored and the ideas expressed are my own, backed by quantitative tests, which change from the kind of product I’m testing. There might be some affiliate links in the article, which helps The Web Scraping Club be free and able to test even paid solutions.
If there’s a trend in the web scraping industry for this 2023 is the Unblocker’s one. Almost every month a new unblocker is released and, while this makes life easier for professionals, it’s also true that choosing the right one for your case it’s becoming more and more difficult. For this month’s episode of the Hands-on series, we’re testing the Oxylabs Web Unblocker.
What is the Web Unblocker
As other companies are doing in these months, Oxylabs launched its Web Unblocker, a super API that allows you, with a single URL to use as a proxy, to have also Javascript rendering, rotating IPs, and sessions management
Let’s see how powerful it is!
Our testing methodology
As we did to the other “unblocker” API, we’ll use a plain Scrapy spider that retrieves 10 pages from 5 different websites, one per each anti-bot solution tested (Datadome, Cloudflare, Kasada, F5, PerimeterX). It returns the HTTP status code, a string from the page (needed to check if the page was loaded correctly), the website, and the anti-bot names.
The base scraper is unable to retrieve any record correctly, so the benchmark is 0.
As a result of the test, we’ll assign a score from 0 to 100, depending on how many URLs are retrieved correctly on two runs, one in a local environment and the other one from a server. A score of 100 means that the anti-bot was bypassed for every URL given in input in both tests, while our starting scraper has a score of 0 since it could not avoid an anti-bot for any of the records.
You can find the code of the test scraper in our GitHub repository open to all our readers.
Preparing for the test
First of all, you need to create an account on Oxylabs’s website and then look for the Web Unblocker plans, which start from 15 to 11 USD per GB, depending on your needs. Following this link, you can have an extra 35% discount on the Unblocker and other Oxylabs services.
After changing your password, you can have a look at the documentation with some examples of integration in different programming languages and also a usage dashboard. In my opinion, one of the best recap dashboards I’ve seen, because of a tiny detail you don’t see often.
You can see the recap of traffic usage by domain, very useful when using the same URL for multiple websites. I would add also the recap of the costs per website, but it’s already very helpful.
Setting up the Scrapy scraper
As said before, I’ve manually chosen fifty URLs, ten per website, as a benchmark and input for our Scrapy spider.
The scraper basically returns the Antibot and website names, given in input, the return code of the request, and a field populated with an XPath selector, to be sure that we entered the product page and were not blocked by some challenge.
There’s no particular configuration to apply to the scraper, only the call to a proxy in the settings.py file.
First run: no Oxylabs Web Unblocker
With this run, we’re setting the baseline, so we’re running a Scrapy spider without the site unblocker.
As expected, the results after the first run are the following.
Basically, every website returned errors except Nordstrom, which returned the code 200 but without showing the product we requested.
Second run: using the Web Unblocker with only raw HTML requested
As we have already seen in other similar products, we can run the scraper without enabling the Javascript rendering of the website, making it faster, or enabling it when needed.
In this second test, we’re not enabling it, and here are the results.
Surprisingly, we’ve got already a great result even without Javascript rendering, with almost all the requests responding successfully. Only a few ones made to Cloudflare and PerimeterX returned some errors but with some retries handled in the scraper, we can get the results also from them.
Third run: using Web Unblocker with Javascript rendering
To enable Javascript rendering, as written in the documentation, adding a custom header to the request will be enough.
yield Request(url, callback=self.test_url, meta={'website':website, 'antibot':antibot.strip()}, headers={'X-Oxylabs-Render': 'html'}, dont_filter=True)
In this case, it’s redundant since the unblocker passed already all the anti-bots challenges even without it but for the completeness of the results we make also these tests.
Of course, results are great also in this case, with only two URLs that need a retry to get the correct response.
Final remarks
The Oxylabs’ Web Unblocker is one of the newest super APIs that arrived on the market this year. Nowadays, we’re quite familiar with this kind of product, and the list of features is the usual one:
Embedded proxy rotation
Javascript rendering
Session handling
Proxy-like integration
What really matters for the end user then are two things: price and effectiveness. And in both of these aspects, the Oxylabs’ web unblocker is one of the best choices.
Pros
Very effective against all the tested anti-bot solutions
Competitive pricing: the smallest plan is priced at 15 USD per GB
Effective dashboard, with traffic split per domain
Javascript rendering included in the basic price
Cons
Nothing relevant
Rating
As we have seen, Oxylabs’ Web Unblocker performed greatly, with only a few hiccups on some URLs. Given the fact that it resolved at the first try 96 URLs out of 100 its final score is 96\100. If you want to test it by yourself and have a 35% discount on Unblocker and other services, you can follow this link.