Is web scraping becoming harder?

Aug 28, 2022

Rising costs, harder anti-bot softwares and a faster world that changes continuously

5 Comments

Sep 7, 2022

I'm also using headful playwright, along with uBlock which seems to block many tracking requests and is an indicator of a "normal" browser.

Here's how I start my chromium crawler and haven't had any issues until today:

```javascript

const uBlock = require("path").join(

__dirname,

"chromium-extensions/cjpalhdlnbpafiamejdnhcphjbkeiagm/1.41.8_4"

);

const browser = await chromium.launchPersistentContext('persistent-data-dir', {

headless: false,

devtools: false,

ignoreDefaultArgs: ["--enable-automation", "--hide-scrollbars"],

args: [

`--start-maximized`,

`--disable-extensions-except=${uBlock}`,

`--no-default-browser-check`,

userAgent:

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",

viewport: null,

});

context = browser;

page = await context.newPage({

viewport: null

});

await enableExtensions(page);

// Pass the Webdriver Test.

await page.addInitScript(() => {

const np = navigator.__proto__;

delete np.webdriver;

navigator.__proto__ = np;

});

```

Expand full comment

nick3499

Sep 1, 2022

thanks for 'useful websites to test our settings'

Expand full comment

Reply (1)

Pierluigi Vinciguerra

Sep 1, 2022

Glad you've found it useful.

Info about web scraping is quite sparse over internet and I'm trying to collect what I can on this substack

Expand full comment

Reply (1)

nick3499

Sep 1, 2022

I think a good number of people will find your concise posts useful as references, considering the gazillion hours I've spent searching for various coding tips and solutions.

my scraping is only for personal stuff, i.e. weather data, which I can format for terminal, so I typically use BeautifulSoup, but I always feel as if I'm climbing back on the learning curve every time I scrape :)

Expand full comment

Reply (1)

Pierluigi Vinciguerra

Sep 1, 2022

same for me when i started and sometimes even now.

Expand full comment

The Web Scraping Club

Is web scraping becoming harder?