I think a good number of people will find your concise posts useful as references, considering the gazillion hours I've spent searching for various coding tips and solutions.
my scraping is only for personal stuff, i.e. weather data, which I can format for terminal, so I typically use BeautifulSoup, but I always feel as if I'm climbing back on the learning curve every time I scrape :)
I'm also using headful playwright, along with uBlock which seems to block many tracking requests and is an indicator of a "normal" browser.
Here's how I start my chromium crawler and haven't had any issues until today:
```javascript
const uBlock = require("path").join(
__dirname,
"chromium-extensions/cjpalhdlnbpafiamejdnhcphjbkeiagm/1.41.8_4"
);
const browser = await chromium.launchPersistentContext('persistent-data-dir', {
headless: false,
devtools: false,
ignoreDefaultArgs: ["--enable-automation", "--hide-scrollbars"],
args: [
`--start-maximized`,
`--disable-extensions-except=${uBlock}`,
`--no-default-browser-check`,
],
userAgent:
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
viewport: null,
});
context = browser;
page = await context.newPage({
viewport: null
});
await enableExtensions(page);
// Pass the Webdriver Test.
await page.addInitScript(() => {
const np = navigator.__proto__;
delete np.webdriver;
navigator.__proto__ = np;
});
```
thanks for 'useful websites to test our settings'
Glad you've found it useful.
Info about web scraping is quite sparse over internet and I'm trying to collect what I can on this substack
I think a good number of people will find your concise posts useful as references, considering the gazillion hours I've spent searching for various coding tips and solutions.
my scraping is only for personal stuff, i.e. weather data, which I can format for terminal, so I typically use BeautifulSoup, but I always feel as if I'm climbing back on the learning curve every time I scrape :)
same for me when i started and sometimes even now.