The Web Scraping Club

Share this post

Ensuring data quality in web scraping projects

substack.thewebscraping.club

Ensuring data quality in web scraping projects

An example of modern web data quality control pipeline

Pierluigi Vinciguerra
Sep 16, 2023
∙ Paid
3
Share this post

Ensuring data quality in web scraping projects

substack.thewebscraping.club
Share

Data quality is one of the critical pain points in web scraping: how do you know the fields in the output are correctly mapped to the information you’re looking for? Did you scrape the whole scope of your interest? Is data formatted in the correct way?

As web scraping projects increase their scope, answering all these questions becomes more and more diff…

Keep reading with a 7-day free trial

Subscribe to

The Web Scraping Club
to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Previous
Next
© 2023 Pierluigi
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing