How to by-pass Kasada bot mitigation?
After seeing what is Kasada bot mitigation and how it works, let’s see how we can bypass it with both free and commercial solutions.
Free solutions
Playwright with Chrome
In our Anti-Detect Anti-Bot matrix, I’ve tested Kasada with Chrome with no success with the following setup, which has proven its reliability in the time.
Again, thanks to Dimitar and its post on the Crawlio page, I wanted to give it another try.
So I’ve modified my test scraper and reviewed the parameters used by Chrome and it worked!
I've left only a few options to disable the Chrome sandbox, the first run notice, and the automation-controlled flag, but I’ve added
ignore_default_args=["--enable-automation"]
This behaves like the old ‘—disable-infobars’ option, which was used to not show the notice that “Chrome is controlled by an automated test service”. I was struggling to disable it when using Playwright and finally found that in this way I could disable it.
I’ve added also the option
‘--disable-blink-features=AutomationControlled’
and the result was that I could open the first page of our target website without any trouble and infobar.
Playwright with Firefox
Another way to use Playwright to tackle Kasada is by opening a Firefox instance and loading the target website from it.
We have already seen it working in the Anti-Detect Anti-Bot matrix and the setup is pretty straightforward.
Commercial solutions
Playwright with GoLogin
Instead of opening a Chrome or Firefox browser, we can use GoLogin’s browser and its multiple profiles offered to bypass Kasada.
In this case, we’re going to attach the Playwright script to an open GoLogin instance but the result is the same.
Bright Data Web Unblocker
The last solution proposed it’s always a commercial one but it’s the only one that doesn’t require a headful browser.
As we have seen in its product review, the Bright Data Web Unblocker is a proxy API we can integrate into our Scrapy projects (or any other tool), and it automatically handles all the settings and features needed to bypass the Kasada challenge.
Final remarks
In my experience, Kasada has been a real issue for our web scraping activities until 2022, but with further studies on my side and tools and settings available, we have now plenty of options to bypass this solution.
There’s no need to say that all web scraping activities should be done ethically and all these techniques are needed to scrape only the data we are allowed to get.
In this landscape, it’s interesting to see how far this can and mouse game will still go on, at the cost of the user experiences and search engine positioning on websites heavily protected.
This post is written by Pierluigi Vinciguerra (pier@thewebscraping.club)