Web Scraping with Proxies: How Many IPs Do You Really Need?
Tips and techniques to better evaluate an IP pool size
Do you actually need millions of IPs? Almost never.
Despite bold claims by residential proxy providers of having, for example, 70,000,000+ IPs, most scraping projects require only a fraction of that number. This guide breaks down what actually matters when determining proxy needs and why quality, diversity, and usage behavior outweigh sheer volume.
The Reality Behind Those Eye-Popping IP Numbers
When proxy providers advertise "70,000,000 IPs" or "the world's largest proxy pool", they typically refer to the total number of unique IP addresses that have passed through their network, often over many years. However, these advertised numbers do not indicate how many IPs are actually available to you right now.
An analog would be a rideshare app boasting about having a million drivers ever registered, but only 100 are currently active.
What truly matters goes beyond quantity:
IP Quality:
Located in your target regions
Unflagged by your target websites
Ethically sourced with proper consent
Concurrency:
Using multiple IPs simultaneously for fast and efficient data extraction
Method of Scraping:
Respecting rate limits of target websites
Using appropriate user-agent strings and other headers
Generating realistic browser fingerprints
Implementing effective IP rotation logic
Without these attributes, even the largest proxy pool becomes ineffective.
Most providers don't disclose current active IP counts, allocation limits, or success rates for specific websites. They don't need to—big numbers sound impressive and many customers mistakenly assume more is always better.
A well-maintained pool of 100 high-quality IPs, though, will consistently outperform thousands or tens of thousands of burned or unverified addresses for most scraping tasks.
How Many Proxies Do You Actually Need?
Instead of guessing or falling for marketing hype, calculate your proxy requirements rationally:
Understand your volume needs: Determine how many pages you need to scrape and how quickly. For example, if you're collecting 24,000 pages daily from an e-commerce site, that's 1,000 pages per hour if you’re scraping around the clock.
Assess website tolerance: Consider how many requests you can safely make per hour from a single IP before being rate-limited. The limit varies by website type:
Social media platforms might flag you after just 30-60 requests per hour from one IP
E-commerce sites generally tolerate 100-300 requests per hour
News and content sites often permit 300+ hourly requests without issue
Calculate your baseline: For our e-commerce example, assuming 200 requests per hour per IP is safe, dividing our hourly needs (1,000 pages) by our safe limit per IP (200 requests) gives a baseline requirement of just 5 proxies.
Add a safety buffer: Multiply by 2 for reliability, which brings us to 10 proxies, not hundreds or thousands.
This simple formula works for any project scale:
Hourly Request Volume ÷ Safe Limit Per IP × Safety Buffer
Real-World Case Study: Amazon Scraping Requirements
Amazon represents one of the more challenging scraping targets due to its sophisticated anti-bot systems. Let's examine real proxy requirements at different scraping volumes to illustrate how intelligent scraping matters more than massive IP counts.
Scenario 1: Testing with a Single Product Page
When you're just testing your parser or checking a product's availability:
Infrastructure Needed:
1-3 residential IPs (not just your home connection)
Basic request headers with randomized user agents
Proper cookie and session management
JavaScript rendering capabilities
Risk Level: Low but still present–even single-page scraping can trigger throttling if done improperly.
Proxy Cost: $5-15 for testing (using a minimal residential plan)
Scenario 2: Light Monitoring (100 Pages/Day)
For monitoring 10-20 products across different categories:
Recommended Setup:
8-12 residential or ISP IPs (not 3-5 as often suggested)
Proxy rotation every 10-15 requests (not 25)
Full browser fingerprint management
Random delays between 3-15 seconds
Different browsing patterns per session
Risk Level: Moderate
Expected Monthly Cost:
Residential IPs: $50-100
ISP Proxies: $80-150
Scenario 3: Category Monitoring (1,000 Pages/Day)
For comprehensive price tracking or inventory monitoring:
Recommended Setup:
50-100 residential or ISP IPs (more than typically suggested)
Session-based rotation with cooling periods
Full browser emulation (not just headers)
CAPTCHA handling solution
Distributed request architecture
Risk Level: High
Expected Monthly Cost:
Residential IPs: $200-400
ISP Proxies: $300-500
Scenario 4: Enterprise-Level Scraping (1,000,000+ Pages/Day)
For large-scale analytics or competitive intelligence:
Enterprise Infrastructure Required:
1,000-3,000 residential/ISP IPs across diverse subnets
Advanced scraping infrastructure with queue management
Real browser automation with stealth plugins
Geographic IP distribution matching your target audience
AI-based CAPTCHA solving
Sophisticated failure analysis and adaptation
Risk Level: Very High
Expected Monthly Cost:
$10,000-25,000+ for residential proxy infrastructure
Additional costs for engineering, maintenance, and scaling
The Real Reason You're Getting Blocked
It's not because you don't have enough proxies.
Websites don't block scrapers because your IP count is too low—they block you because your behavior raises red flags: too many requests in too little time, nonstandard headers, identical request intervals, or recognized datacenter IPs.
Modern anti-bot systems rely on sophisticated behavioral analysis rather than simply counting IPs. They monitor request frequency, header consistency, and patterns across entire IP subnets.
When they detect patterns, they don't just block individual IPs—they raise suspicion for your entire IP range, which leads to a critical issue to be discussed in a future post: subnet reputation.
Questions Worth Asking Proxy Providers
When evaluating proxy services, don’t get distracted by flashy claims about pool size. What really matters is performance, reliability, and transparency. Instead of asking “How many IPs do you offer?”, ask the questions that actually impact your success:
"How many daily active IPs do you have in my target regions?" This question reveals actual availability, not lifetime totals.
"What's your subnet diversity like?" Understanding how IPs are distributed across different networks predicts performance against sophisticated anti-bot systems, which is especially important for static ISP proxies. For rotating residential proxies, a wide range of originating networks and geographical distribution is generally expected; can you confirm the diversity of your pool?
"Are your residential IPs ethically sourced?" Proper sourcing ensures compliance, reduces legal risks, and reflects responsible network practices.
"What session management options do you offer?" The ability to maintain consistent sessions or control rotation timing significantly impacts success rates.
"Can you share performance data for my specific target websites?" For popular sites, providers confident in their service should be able to share actual success metrics for similar use cases.
“What kind of trial are you able to offer?” What is the duration, what are the data usage limits, and are there any restrictions on the websites you can test? Also, what level of support is provided during the trial?
Most importantly, see if you can get access to a trial account or buy a small package first in order to ensure the proxy solution you will be purchasing can meet your requirements.
Great article Jason! Fantastic job highlighting the real factors that influence scraping success. I wanted to share some real-world observations based on what we’ve seen from advanced teams. I hope you will trust my words as we support several teams who successfully perform enterprise-level scraping on Amazon, which gives us a solid view of what actually works at scale.
Respecting Rate Limits & Smart IP Rotation: Absolutely agree, engineers must take responsibility for respecting rate limits. One team we work closely with built a dynamic system to learn request thresholds per IP. They monitored how many requests each proxy could handle before being throttled or blocked. Just before they hit the calculated rate limit, they pause usage of that IP for an extended period. If that IP return later from their proxy provider, they reuse it with the exact same browser context (saved earlier). This approach maintained session continuity and improved trust from the target site. When an old IP returned but needed a fresh start, they generated a new browser context as a fresh start. My pro-tip: Kameleo can help you with this, as it saves the browsing context and the fingerprint that was used during the session. This file can be reloaded later.
You also make a great point about how IP count alone is often overrated. This is something nearly every advanced web scraper discovers eventually. One pattern we’ve consistently noticed among our users is how their scraping architectures evolve over time. At first, they start by launching a single browser instance per proxy, executing just one request per session. But soon they run into RPM (requests per minute) bottlenecks and begin optimizing. The evolution tends to move toward using the same browser-proxy pair for multiple, well-managed requests. This improves both performance and cost-efficiency without compromising on stealth or reliability.
Fingerprinting, Headers & Browser Behavior: Managing headers and fingerprints isn’t just about changing the user-agent—it’s about consistency across the entire browser fingerprint. Anti-detect browsers help maintain this consistency by simulating real devices (OS, screen resolution, fonts, GPU, etc.) and allowing users to persist sessions across scraping runs. While stealth plugins like those used with Puppeteer try to mask automation, we’ve found they often fall short. Our solution was to ship two different custom-built browsers, both designed specifically for scraping scenarios. These browsers emulate full browser behavior more reliably than general-purpose automation tools.
Again, thanks for the insights—glad to see more thoughtful discussion on what actually matters for successful scraping.