The Web Scraping Club

The Web Scraping Club

THE LAB #98: Scraping Google Search Results in 2026: Device, Location, and Identity

Google does not have one set of results. It has millions. The hard part is knowing which one you are looking at.

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Feb 19, 2026
∙ Paid

Using a search engine, you probably have noticed that results are not static. The same query returns different results depending on where you are, what device you use, and whether you are logged into a Google account.
When it comes to SERP scraping, this adds several layers of complexity. While for most scraping targets, you send a request and get the page, for search engines, you send a request and get *a version* of the page, shaped by signals you may not even be aware of.

Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.

Claim your offer


This makes SERP scraping fundamentally different from conventional web scraping. The data you collect is only as reliable as your control over these variables. Scrape from a datacenter IP in Virginia with a desktop Chrome fingerprint while logged out, and you will get one set of results. Scrape the same query from a mobile device in Milan while logged into a Google account, and you will get something entirely different. Both are “correct” Google results. Neither tells the full story.

In this article of The Lab, we wanted to understand how much these variables actually change the output, and more importantly, how to control them reliably.

Google does not want you scraping its results

Before we get into the technical setup, we need to acknowledge something that changed the landscape significantly in early 2025.

Starting in January 2025, Google began releasing SearchGuard, a technical protection measure designed to make scraping search results harder.

SearchGuard works by sending JavaScript challenges to search queries originating from unrecognized sources, as we covered on these pages when it started. When a query arrives, Google’s system transmits JavaScript code that requires the browser to compute and return a “solve”, a set of specific information about the browser environment and the user generating the request. For human users, the solution happens transparently in the browser. For automated systems, it is a wall.

This change in strategy put pressure on all “SEO tools” and the operators that needed to scrape Google search results, suddenly increasing their day-to-day operational costs.


Need public web data, not scraper headaches?

SerpApi turns search results into predictable JSON with built-in scale, location options, and speed. All with no maintenance.

Try for free


The change in this strategy, and especially its timing, prompted professionals to raise questions that will likely never receive an answer. Does this have something to do with the AI race? Is this a way to make it harder for other AI companies rely on Google searches for their answers?
We’ll probably never know the answers, but they’re legitimate questions: SERP scraping is as old as Google search, so why bothering stop bots in 2025 and not some years ago?
However, this is today’s reality, and we need to adapt to it. Let’s examine the specifics of SERP scraping on Google (as always, we’re showing this for educational purposes; be aware of current copyright and scraping laws).

What shapes a Google SERP response

To scrape Google Search reliably, we need to model the system we are interacting with. Google personalizes search results along several axes, each of which produces measurably different output.

Geographic location is one of the most impactful variables. Google determines your location through your IP address and, when available, browser geolocation permissions. A query for “pizza restaurant” from a New York IP returns local results for Manhattan. The same query from a Milan IP returns pizzerias in Milan. This extends beyond local searches: news results, shopping results, and even organic ranking order shift based on geography.

We’ll see in the test part of this article that changing location and mimicking another one is less trivial than expected, since not every proxy type works as expected.

Device type determines the structure and content of the SERP page itself. Mobile and desktop results are not just different layouts of the same data. Google serves genuinely different content. Mobile SERPs prioritize featured snippets, location-based answers, and nearby points of interest. Desktop SERPs give more space to organic links and Knowledge Panels. Some results appear exclusively on mobile or exclusively on desktop. For anyone collecting SERP data for analysis, this distinction is not cosmetic. It is structural.

Login state introduces personalization based on your Google account history. When you are logged in, Google uses your search history, location history, and account preferences to tailor results. When logged out, you get a more “generic” version of the results for your location and device. The difference can be subtle for generic queries and dramatic for anything Google considers personal.

Keywords, of course, are the main driver of change. But in addition to returning different results for different keywords, the answer layout also varies accordingly. If you look for “trousers”, you’ll see more shopping results and product data, while if you’re looking for “aspirin”, you’ll see a more traditional layout.

These four variables interact. A logged-in mobile user in Tokyo sees a fundamentally different page than a logged-out desktop user in London, even for the same query. Controlling all four simultaneously is what makes SERP scraping an infrastructure problem, not just a coding problem.


Check the TWSC YouTube Channel


Tools: An Anti-Detect browser and Selenium

Given the variables we need to control (device type and login state, specifically), and the fact that we are not building a massive scraping operation here, the best setup we can use is Playwright paired with an anti-detect browser.

We need a real browser, not just an HTTP request library like requests or httpx, because Google’s SearchGuard validates the browser environment through JavaScript challenges. A raw HTTP client has no JavaScript engine, no DOM, no window object. It cannot compute the “solve” that SearchGuard requires. The request simply fails or returns a challenge page. To pass these checks, we need something that renders JavaScript and exposes a complete browser environment.

But a standard browser is not enough either. Regular Chrome or Firefox, even when automated with Playwright or Selenium, carries detectable signals: the navigator.webdriver flag, predictable fingerprint values, and missing or inconsistent browser properties. Google’s systems can identify these inconsistencies and treat the session as automated.

That’s why we’re pairing Selenium with an anti-detect browser, which is a modified browser engine that spoofs the properties websites use for fingerprinting. Navigator properties, screen resolution, WebGL parameters, canvas behavior, AudioContext values, font lists, language headers, and device type. Instead of presenting the same default fingerprint every time, an anti-detect browser generates a consistent, realistic identity that looks like a genuine user on a specific device and operating system.

The critical feature for our use case is persistent profiles. An anti-detect browser manages browser profiles that survive across sessions. Each profile stores its fingerprint configuration, cookies, local storage, proxy, and device settings. When we start a profile, it resumes exactly where it left off. This means we can log into a Google account through one profile, close the browser, and reopen it days later with the session still active. Without persistent profiles, we would need to authenticate on every run, which is both impractical and a red flag for Google’s security systems.

For this article, we use Kameleo as our anti-detect browser. It runs as a local service (Kameleo.CLI) exposing a REST API on port 5050, controllable via a Python client. It supports Chromium-based profiles (Chroma) for Chrome and mobile device emulation, and Firefox-based profiles (Junglefox). Each profile is an isolated browser session with its own fingerprint, proxy, and cookies.


Need help with your scraping project?


Setting up the infrastructure: deploying Kameleo on AWS

Our Kameleo instance runs on a Windows EC2 in the US. This means that without a proxy, all traffic exits via a US-based AWS IP address. We will use this setup to demonstrate the difference between the instance’s own IP and a proxy claiming to be somewhere else. I’m sure you’ll be surprised by what we’ll find later.

Installing Kameleo on AWS

We installed Kameleo on a Windows EC2 instance using the standard graphical installer, no rocket science here. Once Kameleo is running on the AWS machine, it exposes its API on port 5050. Our Python scripts run locally and connect to the remote Kameleo instance over the network.

The architecture is straightforward: Kameleo manages browser profiles and runs the actual browsers on the AWS instance. Our local machine sends API commands (create profile, start browser, stop browser) and connects to the browser via WebSocket for Playwright automation. The AWS instance needs port 5050 open in its security group for this to work.

Every script in this article follows the same initialization pattern. We read the remote IP from an environment variable:

from kameleo.local_api_client import KameleoLocalApiClient
import os
kameleo_ip = os.getenv(’KAMELEO_IP’)

kameleo_port = os.getenv(’KAMELEO_PORT’, ‘5050’)

client = KameleoLocalApiClient(endpoint=f’http://{kameleo_ip}:{kameleo_port}’)

Test 1: setting the right location

As we said, one of the keys to extracting SERP data is setting the location we’d like to know more about. Our Kameleo installation is on an AWS US machine, so we expect to get SERP data from there. But if we want to change location?

We run the same query, “weather”, three times from the same AWS instance in the US. First, without any proxy, the traffic exits from the instance’s own IP. Then, through a residential proxy geolocated in Italy. Finally, through a datacenter proxy also claiming to be in Italy. For each run, we first visit whatismyipaddress.com to verify the exit IP, then navigate to Google, type the query in the search bar with randomized keystroke delays, and capture the results.
You can find the code in our GitHub repository reserved to paying users, inside the folder 98.SERP-DATA. If you’re one of them but cannot access the repository, please fill out this form.

In the file test_location_comparison.py, we’ll see how Google responds to us when we’re using different types of proxies.

User's avatar

Continue reading this post for free, courtesy of Pierluigi Vinciguerra.

Or purchase a paid subscription.
© 2026 Pierluigi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture