Discussion about this post

User's avatar
Tamas Deak's avatar

Great article Jason! Fantastic job highlighting the real factors that influence scraping success. I wanted to share some real-world observations based on what we’ve seen from advanced teams. I hope you will trust my words as we support several teams who successfully perform enterprise-level scraping on Amazon, which gives us a solid view of what actually works at scale.

Respecting Rate Limits & Smart IP Rotation: Absolutely agree, engineers must take responsibility for respecting rate limits. One team we work closely with built a dynamic system to learn request thresholds per IP. They monitored how many requests each proxy could handle before being throttled or blocked. Just before they hit the calculated rate limit, they pause usage of that IP for an extended period. If that IP return later from their proxy provider, they reuse it with the exact same browser context (saved earlier). This approach maintained session continuity and improved trust from the target site. When an old IP returned but needed a fresh start, they generated a new browser context as a fresh start. My pro-tip: Kameleo can help you with this, as it saves the browsing context and the fingerprint that was used during the session. This file can be reloaded later.

You also make a great point about how IP count alone is often overrated. This is something nearly every advanced web scraper discovers eventually. One pattern we’ve consistently noticed among our users is how their scraping architectures evolve over time. At first, they start by launching a single browser instance per proxy, executing just one request per session. But soon they run into RPM (requests per minute) bottlenecks and begin optimizing. The evolution tends to move toward using the same browser-proxy pair for multiple, well-managed requests. This improves both performance and cost-efficiency without compromising on stealth or reliability.

Fingerprinting, Headers & Browser Behavior: Managing headers and fingerprints isn’t just about changing the user-agent—it’s about consistency across the entire browser fingerprint. Anti-detect browsers help maintain this consistency by simulating real devices (OS, screen resolution, fonts, GPU, etc.) and allowing users to persist sessions across scraping runs. While stealth plugins like those used with Puppeteer try to mask automation, we’ve found they often fall short. Our solution was to ship two different custom-built browsers, both designed specifically for scraping scenarios. These browsers emulate full browser behavior more reliably than general-purpose automation tools.

Again, thanks for the insights—glad to see more thoughtful discussion on what actually matters for successful scraping.

Expand full comment

No posts