The Web Scraping Club

The Web Scraping Club

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #89: Camoufox as a Docker image

THE LAB #89: Camoufox as a Docker image

Step by step guide to scale your Camoufox web scraping infrastructure

Pierluigi Vinciguerra's avatar
Pierluigi Vinciguerra
Jul 17, 2025
∙ Paid
2

Share this post

The Web Scraping Club
The Web Scraping Club
THE LAB #89: Camoufox as a Docker image
1
Share

If you follow these pages, you should be pretty familiar with Camoufox, since we wrote many articles about it in the past. In case you missed them, Camoufox is an open-source, stealthy custom build of Firefox that offers robust fingerprint spoofing and anti-bot features to help scrapers evade detection.

But just like every other automated browser, CPU and memory requirements make it hard to scale on a single machine, and for this reason, an architecture based on containers is what we need to scale our operations.

The problem with Camoufox, compared to Playwright or more mainstream tools, is that running Camoufox in Docker can be tricky, due to its requirements chain and lack of step-by-step guides.


Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.

Claim the offer


I’ve worked extensively on this topic over the past few weeks and wanted to share with you the process that enabled me to create a working Camoufox Docker image and deploy it on AWS.

But before diving into the code, let’s understand why we should do this.

What Are the Advantages of Using Camoufox in a Container

Using Camoufox inside a Docker container provides several key benefits for web scraping deployments, just like it does for any other use case:

  • Reproducibility & Consistency: Containers bundle Camoufox with all its dependencies (Firefox, libraries, etc.), ensuring it runs the same everywhere. This eliminates the "works on my machine" problem – the environment is identical across development, testing, and production. You can reliably reproduce browser behavior on any server because the Docker image encapsulates exact versions of Firefox and Camoufox’s requirements.

  • Scalability: Containerizing Camoufox makes it much easier to scale out your scrapers. Need more browsing capacity? Just launch more container instances. Containers start up faster and use fewer resources than full VMs, enabling rapid horizontal scaling to handle spikes in load. In an AWS Auto Scaling group, new Camoufox containers can spin up on demand, letting you scrape hundreds of pages in parallel.

  • Portability: With Docker, you can “write once, run anywhere.” The Camoufox container can run on any host with Docker – your laptop, an EC2 VM, or a Kubernetes cluster – without requiring system configuration changes. This portability means you can develop locally and deploy to the cloud seamlessly. It also facilitates migration across AWS regions or different cloud providers, if needed, as the container encapsulates everything.

  • Isolation: Each Camoufox instance runs in its own isolated container, sandboxed from the host and from other containers. This isolation improves security and stability – one instance crashing or leaking resources won’t impact others or your host OS. It also avoids dependency conflicts (no more Firefox or library version clashes on your host). Essentially, Docker gives Camoufox its own clean playground to operate in.

In our case, we’re building an endpoint where all the scrapers will connect to use some Camoufox instances. The more scrapers are running, the more EC2 instances will be created automatically to match the need for CPU and memory.


This episode is brought to you by our Gold Partners. Be sure to have a look at the Club Deals page to discover their generous offers available for the TWSC readers.


💰 - 50% off promo on residential proxies using the code RESI50


💰 - 50% Off Residential Proxies


🧞 - Scrapeless is a one stop shop for your bypassing anti-bots needs.

Solution Outline

Our scalable scraping solution involves running Camoufox inside Docker containers on AWS EC2 instances, fronted by an AWS Application Load Balancer (ALB).

The basic architecture is: a scraping client connects to a load balancer, which distributes incoming WebSocket connections to an auto-scaling group of EC2 instances. Each EC2 instance runs a Docker container with Camoufox serving a remote Firefox browser. This setup allows us to handle many concurrent browser sessions by automatically adding or removing container instances based on load.

Camoufox in a container solution chart

How it works: The ALB (Application Load Balancer) acts as a single entry point (ws://<LoadBalancerAddress>) for your scraping script.

It forwards the WebSocket traffic to one of the Camoufox containers running in the EC2 Auto Scaling Group.

The Auto Scaling Group can have multiple instances (each running one Camoufox Docker container) and can scale up or down according to demand. For example, if you need to fetch hundreds of pages simultaneously, the group might scale out to 5–10 instances, all behind the load balancer. When load decreases, it can scale back in to save costs.

This design provides both load distribution (via the Application Load Balancer, or ALB) and elastic scalability (via the Auto Scaling Group), ensuring your web scraper can handle bursts of traffic without requiring manual intervention.

On the client side, you don’t need to worry about which instance you’re hitting – you just connect to the load balancer’s address, and it will route your WebSocket connection to an available Camoufox container.

In the next sections, we’ll dive into how to implement this architecture step by step: first by containerizing Camoufox, then setting up the AWS infrastructure, and finally connecting your scraping script to the cloud.


Need help with your scraping project?


Technical Implementation

Let’s start with the technical groundwork: packaging Camoufox and its environment into a Docker container. We’ll create a Dockerfile to build a Camoufox image, write a simple server launcher script, and configure a Docker entrypoint. After that, we’ll build the image and push it to AWS.

Note: Building a Docker image for Camoufox can be tricky due to various dependencies (I’m calling it Docker dependency hell). We spent hours debugging library issues and display errors. Below is the working solution we arrived at, which you can use as a template.

Camoufox Server Script (launch_server.py)

The launch_server.py script is straightforward. It imports Camoufox and starts the Camoufox server with our desired settings. Here’s the content:

from camoufox.server import launch_server

# Launch the Camoufox server with specified options
launch_server(
    headless=True,        # Run in headless mode (no visible UI)
    geoip=True,           # Enable GeoIP for timezone/locale spoofing
    port=59001,           # Port for the WebSocket server
    ws_path="hello",      # WebSocket path (ws://host:59001/hello)
    main_world_eval=True, # Allow main-world JS evaluation (use carefully)
    proxy={
        "server": "http://proxyserver:port",
        "username": "PROXY_USER",
        "password": "PROXY_PWD"
    }
)

A quick breakdown of the key parameters we used:

  • headless=True: Ensures Firefox runs in headless mode (no GUI). This is necessary for running on servers. We also set up Xvfb in the entry point, which is an extra safety net for headless operation.

  • geoip=True: Camoufox can use MaxMind GeoIP data to automatically set the browser’s timezone, locale, and other geo-dependent settings to match the IP’s location. This helps avoid bot detection based on mismatched timezone or language, especially if you’re using proxies.

  • port=59001 and ws_path="hello": Together, these mean the Camoufox server will accept WebSocket connections at ws://<server>:59001/hello. We chose “hello” as an arbitrary path; you can pick another, but make sure your client uses the same path. Using a non-root path can sometimes be useful to differentiate if multiple services are on one domain, but here it’s mainly just a required parameter for launch.

  • main_world_eval=True: This allows evaluating JavaScript in the page’s main context (Playwright’s main world). Camoufox defaults to isolating Playwright’s scripts to avoid detection, so enabling main-world JS evaluation can be dangerous on heavily protected sites. We included it here for completeness, but if you don’t explicitly need it, you might leave this as False for more stealth.

  • The proxy dictionary: This shows how you could pass proxy settings (if you want the browser to use a proxy). We left dummy values (proxyserver, PROXY_USER, etc.). If you’re not using a proxy, you can omit this parameter.


Before continuing with the article, I wanted to let you know that I've started my community in Circle. It’s a place where we can share our experiences and knowledge, and it’s included in your subscription. Give it a try at this link.

Join the community


When this script runs inside the container, it will launch the Camoufox headless browser and begin listening for incoming connections. Notably, Camoufox (via Uvicorn/FastAPI) will print out the WebSocket URL (e.g., ws://0.0.0.0:59001/hello) to the console. That’s helpful for debugging, but in our case, we already know we’ll expose port 59001 through the load balancer.

Entrypoint Script (entrypoint.sh)

The entrypoint script is executed when the container starts. Its job is to set up any required environment (like the Xvfb display) and then kick off the Camoufox server script. Here’s the script:

#!/bin/bash
set -e

# Start X virtual framebuffer for headless display
Xvfb :99 -screen 0 1920x1080x16 &
export DISPLAY=:99

# Launch the Camoufox server (runs indefinitely until container stops)
exec python3 /app/launch_server.py

What this does:

  • set -e makes the shell exit on any error. This is good practice for entrypoint scripts so that if any command fails, the container doesn’t silently keep running in a bad state.

  • Xvfb :99 -screen 0 1920x1080x16 & starts an Xvfb display server on display number :99 in the background. We allocate a virtual screen with 1920x1080 resolution and 16-bit color – more than enough for typical web pages. Even though we run Firefox in headless mode, having Xvfb can prevent potential issues with Firefox’s GPU or font rendering in some headless scenarios. It essentially fakes a screen so Firefox “thinks” it’s rendering to a monitor.

  • export DISPLAY=:99 tells Firefox (and any X11 applications) to use the Xvfb display we just started. This environment variable is critical; without it, Firefox might not find a display and could error out.

  • Finally, exec python3 /app/launch_server.py runs our Python server script. We use exec so that the Python process replaces the shell process (this makes signal handling cleaner, so e.g. a Ctrl-C or Docker stop will terminate the Python server properly). This process will keep running, serving the WebSocket for Camoufox.

At this point, we can have a look at the full Dockerfile that plugs all together.


The scripts mentioned in this article are in the GitHub repository's folder 88.CAMOUFOXDOCKER, available only to paying readers of The Web Scraping Club.

GitHub Repository

If you’re one of them and cannot access it, please use the following form to request access.


Dockerfile for Camoufox

Our Dockerfile uses Ubuntu 22.04 as the base image and installs all the necessary system packages for running Firefox in headless mode (Camoufox is essentially a modified Firefox). We also manually install the latest Firefox and then install Camoufox via pip. The Dockerfile ends by copying in our startup scripts and setting the container to launch Camoufox on port 59001.

Here’s the file:

Keep reading with a 7-day free trial

Subscribe to The Web Scraping Club to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share