5 Comments

I'm also using headful playwright, along with uBlock which seems to block many tracking requests and is an indicator of a "normal" browser.

Here's how I start my chromium crawler and haven't had any issues until today:

```javascript

const uBlock = require("path").join(

__dirname,

"chromium-extensions/cjpalhdlnbpafiamejdnhcphjbkeiagm/1.41.8_4"

);

const browser = await chromium.launchPersistentContext('persistent-data-dir', {

headless: false,

devtools: false,

ignoreDefaultArgs: ["--enable-automation", "--hide-scrollbars"],

args: [

`--start-maximized`,

`--disable-extensions-except=${uBlock}`,

`--no-default-browser-check`,

],

userAgent:

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",

viewport: null,

});

context = browser;

page = await context.newPage({

viewport: null

});

await enableExtensions(page);

// Pass the Webdriver Test.

await page.addInitScript(() => {

const np = navigator.__proto__;

delete np.webdriver;

navigator.__proto__ = np;

});

```

Expand full comment

thanks for 'useful websites to test our settings'

Expand full comment
author

Glad you've found it useful.

Info about web scraping is quite sparse over internet and I'm trying to collect what I can on this substack

Expand full comment
Sep 1, 2022Liked by Pierluigi Vinciguerra

I think a good number of people will find your concise posts useful as references, considering the gazillion hours I've spent searching for various coding tips and solutions.

my scraping is only for personal stuff, i.e. weather data, which I can format for terminal, so I typically use BeautifulSoup, but I always feel as if I'm climbing back on the learning curve every time I scrape :)

Expand full comment
author

same for me when i started and sometimes even now.

Expand full comment