13 Comments
User's avatar
Neural Foundry's avatar

This is a really solid breakdown of Scrapling's capabilites! The Cloudflare bypass section is particularly intresting because most libraries just throw their hands up when they hit Turnstile. The fact that Scrapling can solve it in headless mode using Camoufox is impressive since that's usually where most stealth approaches break down. I'm curious about the long term maintainability though, given that Cloudflare is constantly evolving their detection methods. Does the library have a regular update cycle to keep pace with those changes, or does the fingerprint spoofing aproach provide enough flexibility that it stays ahead naturally?

Antonello Zanini's avatar

Thanks! Scrapling is definitely a fantastic library.

Regarding the maintainability aspects, I leave it to Karim, the author of the library.

Karim Shoair's avatar

Hello, Scrapling author here.

Thanks for all your kind words. Yes, I keep maintaining the solver since I released it, and it's actually now working better than when this article was published (updates and all).

The solver has been working since last December, even though I added it to Scrapling in v0.3 (I was using it in my daily job), but this shows that I have been maintaining it for nearly a year now, and I intend to keep doing that with the rest of the library :D

shivham's avatar

I really liked your suggestion.

I have tried Option 2 and used the web unblocker API. They are working and allowing me to scrape data that doesn't require a login, protecting me from hitting a login wall. However, to get complete and accurate data, we need to log in first to see the real and full information.

I am not in favor of using the LinkedIn API from the provider. It is costly and does not yield good results for my requirements. I am more inclined to develop my own LinkedIn solution.

Your valuable input needed. Thank you

Also, where can I find the Discord community link?

Antonello Zanini's avatar

You are welcome. Here is the link:

https://discord.com/invite/zpV3UvAhYu

(Regarding scraping data while being logged in, I do not recommend that for legal and ethical reasons.)

shivham's avatar

Hello, I have requested to join the community. Waiting for the acceptance

shivham's avatar
4dEdited

I am starting to use it and trying to parse Link*d*n public data, which does not require a login. But after opening it with StealthySession in the browser, it redirects me to the login page; with other proxies, it bypasses that page and gives me the real HTML.

Antonello Zanini's avatar

Scraping LinkedIn is always tricky, mostly because of the known login wall. You can bypass it on some pages with a simple trick, which I documented in a previous post:

https://substack.thewebscraping.club/p/scraping-linkedin-public-data

Keep in mind that Scrapling's StealthySession only tweaks the automated browser to look less “automated” and more human. The underlying IP is still yours (or one of your servers), so you can still get blocked.

The library itself can’t change your IP address, which is why, when scraping complex sites like LinkedIn, it’s always smart to pair good browser automation with high-quality proxies!

The IP’s location is also extremely important, sometimes even more than the IP’s quality itself. A European website is far more likely to accept a connection from a European IP (even if it comes from a datacenter) than from a very high-quality IP located in Asia or America.

shivham's avatar

Thank you for the reply. I have seen your article, and this is the same way I was scraping LinkedIn jobs with their api and proxies.

But when it comes to scraping profiles, the game is totally different. I have list of profiles and i am trying to scrape their public data which are available without login with the ISP and scrapling. Unfortunately, the login wall prevents me from viewing the data without logging in and redirects me to the login page. Some proxies work with the API provided by the proxy providers. But for the scaling purpose, it does not seem to be a very good solution. Any help or suggestions?

Antonello Zanini's avatar

Also, feel free to join our Discord and ask for help from other web scraping experts!

Antonello Zanini's avatar

Yes! Scraping LinkedIn profiles is indeed a different game. In this case, you can either try high-quality residential proxies (e.g., from Decodo, NetNut, etc.) with a retry mechanism for login challenges or go straight with Web Unlocker APIs. If I were you, I’d try the first approach to see if it works. But, realistically, it probably makes more sense to go directly with the second one.

Web unlockers handle all obstacles for you, making scraping much easier. If you don’t need custom data parsing, I’d opt for a LinkedIn Scraping API from a top provider (Bright Data, Apify, etc.) or a Web Unlocker API from providers like Zyte, Decodo, Oxylabs, Bright Data, etc.

Tamas Deak's avatar

Hi Karim. amazing work on Scrapling.

Quick question: would you be open to supporting Kameleo as a browser option as well? I can see how nicely you integrated Camoufox, but since it's no longer actively maintained by Daijiro, it may eventually hit limitations.

Kameleo is a paid solution, but that's precisely what allows us to continuously update our browser kernels and evolve our fingerprint masking, so we can reliably stay ahead in the anti-bot space - especially for serious web-scraping use cases.

Happy to chat anytime if you'd be interested in exploring this together.