How to Scrape Booking.com in Python

Automatically retrieve data from Booking.com using a custom Playwright-based scraper

Aug 31, 2025

Isn’t it time for a vacation? If you’re dreaming of Europe, Latin America, South Asia, or anywhere else you want to explore, it’s no secret that hotels are getting more expensive. That’s why it makes sense to monitor sites like Booking.com for the best deals.

Or, maybe, you just want to analyze the market or study your competitors. No matter what your goal is, you need data. That’s where a Booking.com scraper comes in!

In this post, I’ll guide you through building a Python script to automatically retrieve property listing data from Booking.com.

Before proceeding, let me thank Decodo, the platinum partner of the month. They are currently running a 50% off promo on our residential proxies using the code RESI50.
Claim your discount here

Data You Can Scrape From Booking.com

Booking.com is one of the world’s largest travel platforms, with over 28 million listings and millions of users booking accommodations every day. Its vast inventory makes it a popular source of data for travelers, market analysts, researchers, data analysts, and more.

By scraping Booking.com, you can retrieve a wide range of information, including:

Property details: Names, addresses, and descriptions.
Pricing information: Current prices, discounts, and original rates.
Availability: Dates and room options.
Guest reviews: Scores, comments, and number of reviews.
Images: Property photos and thumbnails.
Location data: Distance from city centers or landmarks.
Amenities: Features like Wi-Fi, parking, and breakfast options.

What Are the Best Technologies to Scrape Booking.com?

A quick look at a Booking.com property listing page reveals that the site is highly dynamic. For example, search for properties in your desired destination (Rome, in the example below) and start scrolling:

The infinite scroll pattern in Booking.com

This episode is brought to you by our Gold Partners. Be sure to have a look at the Club Deals page to discover their generous offers available for the TWSC readers.
🧞 - Scrapeless is a one stop shop for your bypassing anti-bots needs.
💰 - 55% discount with the code WSC55 for static datacenter & ISP proxies
💰 - Get a 55% off promo on residential proxies by following this link.

You’ll notice that more results are automatically loaded as you scroll down. This, along with the loading spinner that appears when you apply filters, is enough to confirm that Booking.com is a dynamic site. In other words, it loads content on the fly in the browser.

Thus, you’ll need a browser automation tool like Playwright, Puppeteer, or Selenium to scrape Booking.com.

Scrape Booking.com in Python With Playwright

In this guided section, I’ll show you how to scrape Booking.com using Playwright in Python.

In detail, you’ll learn how to automatically extract property listing data for Rome from the following Booking.com page:

Step #1: Connect to the Target Page

I’ll assume you already have a Playwright Python project set up. If not, follow the official Playwright installation guide to get started.

To keep things simple, I’ll also suppose you already have a URL pointing to a Booking.com property listings page. In other words, I don't recommend simulating the full user interaction flow (e.g., filling out the search form on the homepage) to reach the target page. That’s tricky and not essential for scraping Booking.

Instead, perform the search manually in your browser, apply the filters you want, and copy the resulting URL. For example, here’s the URL I got for property listings in Rome:

https://www.booking.com/searchresults.html?label=gen173nr-1FCAQoggJCC3NlYXJjaF9yb21lSDNYBGinAogBAZgBMbgBF8gBDNgBAegBAfgBA4gCAagCA7gCgJfpwwbAAgHSAiRjNDA1NjNhMS1kYzQ2LTQ3MjYtYjc0YS1kZGM0NjRjYTQyZWHYAgXgAgE&aid=304142&ss=Rome&checkin=2025-08-01&checkout=2025-08-31&group_adults=2&no_rooms=1&group_children=0

Note: By the time you read this post, this URL will no longer produce the expected results, as it's based on past dates.

Now, utilize Playwright to load that page using the goto() function:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False) # Avoid headless=True for missing data reasons
    context = browser.new_context()
    page = context.new_page()

    # Visit the target Booking.com page
    booking_url = "https://www.booking.com/searchresults.html?ss=Rome%2C+Lazio%2C+Italy&efdco=1&label=gen173nr-1FCAQoggJCC3NlYXJjaF9yb21lSDNYBGinAogBAZgBMbgBF8gBDNgBAegBAfgBA4gCAagCA7gCgJfpwwbAAgHSAiRjNDA1NjNhMS1kYzQ2LTQ3MjYtYjc0YS1kZGM0NjRjYTQyZWHYAgXgAgE&sid=02599cbc4cb59fd631f047d5e547b490&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=-126693&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=127e60d050da05c0&ac_meta=GhAxMjdlNjBkMDUwZGEwNWMwIAAoATICZW46BHJvbWVAAEoAUAA%3D&checkin=2025-08-01&checkout=2025-08-31&group_adults=2&no_rooms=1&group_children=0"
    page.goto(booking_url)

    # Scraping logic...

    browser.close()

Keep in mind that disabling headless mode isn’t optional. The reason is that Booking.com seems to have some clever anti-bot mechanisms in place.

If it detects that you're running a headless browser, it will hide some data. As shown in the screenshot below (taken in headless mode), some of the property information is missing:

A screenshot taken in headless mode. Note the missing info from the property cards

Notice that the prices and other key details are missing from the property cards. That’s why you should configure Playwright to launch the browser in headful mode.

Before continuing with the article, I wanted to let you know that I've started my community in Circle. It’s a place where we can share our experiences and knowledge, and it’s included in your subscription. Enter the TWSC community at this link.

Join the community

Step #2: Prepare to Scrape the Property Listings

The target page contains several property listings, so you’ll need to:

Define a data structure to store the scraped data (a Python list is ideal).
Select the property listing cards using the appropriate selector and iterate over them to extract the data of interest.

Before doing anything, you must first get familiar with the DOM of the page. Open the target page in your browser, right-click on a property card, and choose “Inspect” to open the DevTools:

Inspecting a property listing card in the DevTools

As you can see, the class names used by Booking.com appear to be randomly generated. That’s quite common and is often done to deter scrapers (and avoid caching issues, since these classes change at each release because they’re generated at deploy time).

Fortunately, the most important HTML elements on Booking.com pages include custom data-* attributes. These attributes (like data-testid, in this case) are generally employed to attach extra metadata to HTML elements. Those HTML attributes are commonly introduced to support automated testing tools like Cypress, which need a reliable way to select and interact with the main elements on the page.

Because of that, data-* HTML attributes tend to remain consistent over time, which makes them great targets for CSS or XPath selectors. In particular, you can select all property listing cards with this CSS selector:

[data-testid="property-card"]

Since the property listings are loaded dynamically on the client, you need to wait for them to appear on the page. Do that through the Playwright wait_for() method.

Put it all together, and you’ll get:

# Where to store the scraped data
properties = []

# Wait for property listings to load
page.locator('[data-testid="property-card"]').first.wait_for()

# Iterate over each property card and scrape data from it
property_card_html_elements = page.locator('[data-testid="property-card"]')
for i in range(property_card_html_elements.count()):
    property_card_html_element = property_card_html_elements.nth(i)
    # Property listing scraping logic...

Check the TWSC YouTube Channel

Step #3: Scrape the Property Listing Data

Inside the for loop, you have to define the data extraction logic to collect the relevant fields from each property card. In this example, I’ll focus on scraping the following data points:

url: Link to the property page.
image: Main image of the property.
title: Property name.
address: Location of the property.
distance: Distance from the city center or landmark.
review_score: Average user rating.
review_count: Number of reviews.
description: Room/unit description.
original_price: Price before discount (if available).
price: Final price shown to the user.

If you're familiar with Booking.com, you're aware that not every property card contains all of the above fields. Sometimes, certain data points are omitted depending on availability, property type, and more. To avoid TimeoutErrors when locating missing elements, define a helper function:

def safely_get_data(locator, method, attr=None):
    # If the HTML element on the page
    if locator.count() == 0:
        return None
    if method == "text":
        return locator.first.inner_text()
    elif method == "attr":
        return locator.first.get_attribute(attr)

This only accesses an element's content (text or specific HTML attribute), if it's present on the page. Otherwise, it safely returns None.

Now, I’ll guide you through the process of retrieving a single data point of interest from an HTML listing element. You can then apply the same logic to scrape all the other data points from each property card.

I’ll show you how to retrieve the property listing link. Begin by identifying the corresponding element on a property card and inspecting it in the browser:

Inspecting the property listing link HTML element

In the DevTools, you can see that the URL is stored in the href attribute of an <a> tag identified by this CSS selector:

[data-testid="property-card-desktop-single-image"]

So, the line of code to extract the URL from the property card is:

url = safely_get_data(property_card_html_element.locator('a[data-testid="property-card-desktop-single-image"]'), method="attr", attr="href")

Using similar logic, you can extract the remaining property data:

image = safely_get_data(property_card_html_element.locator('img[data-testid="image"]'), method="attr", attr="src")
title = safely_get_data(property_card_html_element.locator('[data-testid="title"]'), method="text")
address = safely_get_data(property_card_html_element.locator('[data-testid="address"]'), method="text")
distance = safely_get_data(property_card_html_element.locator('[data-testid="distance"]'), method="text")

# Handle the review scraping logic
review_score = None
review_count = None
review_text = safely_get_data(property_card_html_element.locator('[data-testid="review-score"]'), method="text")
# Extract the review score and count from the raw review text
if review_text:
    parts = review_text.split("\n")
    for part in parts:
        part = part.strip()
        if part.replace(".", "", 1).isdigit():
            review_score = float(part)
        elif "reviews" in part:
            try:
                review_count = int(part.split(" ")[0].replace(",", ""))
            except:
                pass

decription = safely_get_data(property_card_html_element.locator('[data-testid="recommended-units"]'), method="text")

# Handle the price scraping logic
original_price = None
price = None
price_block_html_element = property_card_html_element.locator('[data-testid="availability-rate-information"]')
if price_block_html_element.count() > 0:
    original_price = safely_get_data(
        price_block_html_element.locator('[aria-hidden="true"]:not([data-testid])'), method="text"
    )
    if original_price:
        original_price = original_price.replace(",", "")
    price = safely_get_data(
        price_block_html_element.locator('[data-testid="price-and-discounted-price"]'), method="text"
    )
    if price:
        price = price.replace(",", "")

Note that for some data points (such as prices or review attributes) you may need to apply a bit of custom logic for element selection or data cleaning.

Need help with your scraping project?

Step #4: Gather the Scraped Data

Still inside the for loop, gather the scraped data into a dictionary and append it to the properties list:

property = {
    "url": url,
    "image": image,
    "title": title,
    "address": address,
    "distance": distance,
    "review_score": review_score,
    "review_count": review_count,
    "decription": decription,
    "original_price": original_price,
    "price": price
}
properties.append(property)

Great! The properties list should now contain all the property listings in a structured format. It only remains to export this data into a more useful format, such as CSV.

Step #5: Export to CSV

Outside the for loop, export the scraped data in properties to a CSV file:

with open("properties.csv", mode="w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=properties[0].keys())
    writer.writeheader()
    writer.writerows(properties)

Don’t forget to import csv from the Python Standard Library:

import csv

Step #6: Run Your Booking.com Scraper

The final Booking.com scraping script is:

# pip install playwright
# python -m playwright install

from playwright.sync_api import sync_playwright
import csv

def safely_get_data(locator, method, attr=None):
    # If the HTML element on the page
    if locator.count() == 0:
        return None
    if method == "text":
        return locator.first.inner_text()
    elif method == "attr":
        return locator.first.get_attribute(attr)

with sync_playwright() as p:
    # Open a new page in a controlled headeful browser instance
    browser = p.chromium.launch(headless=False) # Avoid headless=True for missing data reasons
    context = browser.new_context()
    page = context.new_page()

    # Visit the target Booking.com page
    booking_url = "https://www.booking.com/searchresults.html?ss=Rome%2C+Lazio%2C+Italy&efdco=1&label=gen173nr-1FCAQoggJCC3NlYXJjaF9yb21lSDNYBGinAogBAZgBMbgBF8gBDNgBAegBAfgBA4gCAagCA7gCgJfpwwbAAgHSAiRjNDA1NjNhMS1kYzQ2LTQ3MjYtYjc0YS1kZGM0NjRjYTQyZWHYAgXgAgE&sid=02599cbc4cb59fd631f047d5e547b490&aid=304142&lang=en-us&sb=1&src_elem=sb&src=index&dest_id=-126693&dest_type=city&ac_position=0&ac_click_type=b&ac_langcode=en&ac_suggestion_list_length=5&search_selected=true&search_pageview_id=127e60d050da05c0&ac_meta=GhAxMjdlNjBkMDUwZGEwNWMwIAAoATICZW46BHJvbWVAAEoAUAA%3D&checkin=2025-08-01&checkout=2025-08-31&group_adults=2&no_rooms=1&group_children=0"
    page.goto(booking_url)

    # Where to store the scraped data
    properties = []

    # Wait for property listings to load
    page.locator('[data-testid="property-card"]').first.wait_for()

    # Iterate over each property card and scrape data from it
    property_card_html_elements = page.locator('[data-testid="property-card"]')
    for i in range(property_card_html_elements.count()):
        # Retrieve the current card HTML element
        property_card_html_element = property_card_html_elements.nth(i)

        url = safely_get_data(property_card_html_element.locator('a[data-testid="property-card-desktop-single-image"]'), method="attr", attr="href")
        image = safely_get_data(property_card_html_element.locator('img[data-testid="image"]'), method="attr", attr="src")
        title = safely_get_data(property_card_html_element.locator('[data-testid="title"]'), method="text")
        address = safely_get_data(property_card_html_element.locator('[data-testid="address"]'), method="text")
        distance = safely_get_data(property_card_html_element.locator('[data-testid="distance"]'), method="text")

        # Handle the review scraping logic
        review_score = None
        review_count = None
        review_text = safely_get_data(property_card_html_element.locator('[data-testid="review-score"]'), method="text")
        # Extract the review score and count from the raw review text
        if review_text:
            parts = review_text.split("\n")
            for part in parts:
                part = part.strip()
                if part.replace(".", "", 1).isdigit():
                    review_score = float(part)
                elif "reviews" in part:
                    try:
                        review_count = int(part.split(" ")[0].replace(",", ""))
                    except:
                        pass

        decription = safely_get_data(property_card_html_element.locator('[data-testid="recommended-units"]'), method="text")

        # Handle the price scraping logic
        original_price = None
        price = None
        price_block_html_element = property_card_html_element.locator('[data-testid="availability-rate-information"]')
        if price_block_html_element.count() > 0:
            original_price = safely_get_data(
                price_block_html_element.locator('[aria-hidden="true"]:not([data-testid])'), method="text"
            )
            if original_price:
                original_price = original_price.replace(",", "")
            price = safely_get_data(
                price_block_html_element.locator('[data-testid="price-and-discounted-price"]'), method="text"
            )
            if price:
                price = price.replace(",", "")

        # Define a new property object with the scraped data and append it to the list
        property = {
            "url": url,
            "image": image,
            "title": title,
            "address": address,
            "distance": distance,
            "review_score": review_score,
            "review_count": review_count,
            "decription": decription,
            "original_price": original_price,
            "price": price
        }
        properties.append(property)

    # Export the scraped data to CSV
    with open("properties.csv", mode="w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=properties[0].keys())
        writer.writeheader()
        writer.writerows(properties)

    # Close the browser and release its resources
    browser.close()

Execute it, and it’ll produce a properties.csv file. In this case, the output CSV file contains:

Et voilà! Booking.com scraper complete.

Next Steps

To take your scraper to the next level, consider the following improvements:

Scrape all listings on a search results page by handling infinite scrolling to load more results.
Accept the target URL as a command-line argument to make the script reusable for different searches.
Integrate the script with Camoufox or another anti-detection browser to reduce the chance of being blocked.
Use proxy services for IP rotation to avoid rate limits and blocking when scraping Booking.com at scale.

Why Web Scraping When Booking.com Offers Free APIs?

As of this writing, Booking.com’s APIs are only accessible to approved official partners, limiting who can use them. Even if you gain access to them, the data these APIs provide can change over time, restricting your control over what data you receive and in which format.

On the other hand, web scraping puts you in full control. You decide exactly which data to extract from the website, offering greater flexibility and independence. That's why I generally recommend web scraping over APIs.

Conclusion

The goal of this post was to demonstrate how to scrape Booking.com. I illustrated how to achieve that using Playwright in Python. That's required because Booking.com is a highly dynamic and interactive site.

All instructions shared here are for educational purposes only. Please use them responsibly and respect Booking.com’s terms of service and robots.txt rules.

I hope you found this scraping guide helpful. Feel free to share your thoughts, questions, or experiences in the comments—until next time!

Tamas Deak

Sep 1

It is interesting that using simple playwright works ins headful mode against their anti-bot system, and headless is not working. I assume if they can detect headless mode they should be able to detect playwright as well. I strongly recommend an anti-detect browser for long-term solution. That can easily be integrated with Playwright while it ensures your browsers stays under the radar.

Expand full comment

1 reply

1 more comment...

The Web Scraping Club

Discussion about this post

Ready for more?