Web Scraping News: November Monthly Recap
“It's not a faith in technology. It's faith in people.” - Steve Jobs
Hi, this is Pierluigi from The Web Scraping Club, a newsletter where you can find news, insights, and tutorials with real-world examples of web scraping.
Being a paying user gives:
Access to Paid Content, like the post series called “The LAB”, where we’ll go deep diving with code real-world cases (view here as an example).
Access to the GitHub repository with the code seen on ‘The LAB”
Access to private channels on our Discord server
But in case you want to read this newsletter for free, you will always get a post per week about:
News about web scraping
Anti-bot software and techniques insights
Interviews with key people in the industry
And you can always join the Web Scraping Club Discord server.
Today we’ll see what happened during this month in the web scraping industry.
Jeremy Singer-Vine Tracks the Government, Not the Midterms
original source: cjr.org
In September Jeremy Singer-Vine, a data journalist and computer programmer in New York started the Data Liberation Project, which aims to create datasets from public government data not easily accessible, then clean it up and publish it for the benefit of reporters. As he states to cjr.org, this data “can be broadly useful regardless of what is happening right now in the news cycle.”
He asks an average of five Freedom of Information Act requests per month and at the same time, he uses web scraping to generate datasets from other public sources where data is not readily usable.
He got the idea to start the Data Liberation Project while he was a data editor at BuzzFeed: journalists need data to add for complementing their articles with quantitative insights, but accessing it can take time, especially for government data. They require, in fact, FoIA requests and while these are examined, maybe a journalist already hadn't anymore the need for it. And not to mention data cleaning and quality issues. That’s why the Data Liberation Project aims to be a library for datasets ready for journalists, that can use them to track how the U.S. government is acting and impacting citizens’ life. That’s why Jeremy skipped the election data involving the recent mid-term for some broader depth projects.
The digitalization of data and its broader availability enable a new way to be the watchdog for politicians all over the world. Not so long time ago, as reported by Financial Times, an independent team of Turkish researchers, found out that the inflation rate officially reported by the government was probably misleading. Web scraping data from online retailers each day, they measured that the prices rose up to 40% instead of the 19% claimed by the government.
Oxylabs AI-driven Web Scraping Solution for Illegal Content Detection Listed as Baltic Sustainability Awards Finalist
original source: accesswire.com
Oxylabs, a provider of premium proxies and public web data scraping solutions, has been selected as a finalist at the Baltic Sustainability Awards. Together with other finalists, the company's representatives will take the stage to pitch the achievement in the Baltic Sustainability Forum & Awards Ceremony, which will take place on November 30 in Splendid Palace, Riga.
Oxylabs has been recognized in the "Impact" category and is a finalist in the "Social Initiatives" subcategory for its work in fighting illegal content (incl. child sexual abuse and pornography) online. It created a unique AI-powered web scraping tool that automatically scans the Lithuanian IP address space to find harmful images, mainly related to child sexual abuse and pornography. If found, the content is automatically forwarded to the hotline of Communications Regulatory Authority in Lithuania (CRA) specialists for review.
"We are grateful for the recognition of our joint endeavors with CRA to make the internet a safer space," said Julius Černiauskas, CEO of Oxylabs. "When creating this tool, there was a lot of excitement in the office, as it was not only a great challenge but also benefited the greater good, protecting the most vulnerable groups of our society. We believe that such partnerships contribute to a world where justice and strong public institutions are the centerpieces of society."
Projects like these show or the ones aimed to fight human trafficking once again that there’s no good or bad technology but it’s only how humans decide to take advantage of it.
Is any of our you working on something spectacular in web scraping and want to share with us? please write to pier@thewebscraping.club and you could be in the next interview!
The Lab - premium content with real-world cases
THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting
THE LAB #4: Scrapyd - how to manage and schedule a fleet of scrapers
THE LAB #2: scraping data from a website with Datadome and xsrf tokens
If you liked this post, please share it with your friends and colleagues and spread the word about The Web Scraping Club