Web Scraping News: November Monthly Recap
“It's not a faith in technology. It's faith in people.” - Steve Jobs
Jeremy Singer-Vine Tracks the Government, Not the Midterms
In September Jeremy Singer-Vine, a data journalist and computer programmer in New York started the Data Liberation Project, which aims to create datasets from public government data not easily accessible, then clean it up and publish it for the benefit of reporters. As he states to cjr.org, this data “can be broadly useful regardless of what is happening right now in the news cycle.”
He asks an average of five Freedom of Information Act requests per month and at the same time, he uses web scraping to generate datasets from other public sources where data is not readily usable.
He got the idea to start the Data Liberation Project while he was a data editor at BuzzFeed: journalists need data to add for complementing their articles with quantitative insights, but accessing it can take time, especially for government data. They require, in fact, FoIA requests and while these are examined, maybe a journalist already hadn't anymore the need for it. And not to mention data cleaning and quality issues. That’s why the Data Liberation Project aims to be a library for datasets ready for journalists, that can use them to track how the U.S. government is acting and impacting citizens’ life. That’s why Jeremy skipped the election data involving the recent mid-term for some broader depth projects.
The digitalization of data and its broader availability enable a new way to be the watchdog for politicians all over the world. Not so long time ago, as reported by Financial Times, an independent team of Turkish researchers, found out that the inflation rate officially reported by the government was probably misleading. Web scraping data from online retailers each day, they measured that the prices rose up to 40% instead of the 19% claimed by the government.
Oxylabs AI-driven Web Scraping Solution for Illegal Content Detection Listed as Baltic Sustainability Awards Finalist
Oxylabs, a provider of premium proxies and public web data scraping solutions, has been selected as a finalist at the Baltic Sustainability Awards. Together with other finalists, the company's representatives will take the stage to pitch the achievement in the Baltic Sustainability Forum & Awards Ceremony, which will take place on November 30 in Splendid Palace, Riga.
Oxylabs has been recognized in the "Impact" category and is a finalist in the "Social Initiatives" subcategory for its work in fighting illegal content (incl. child sexual abuse and pornography) online. It created a unique AI-powered web scraping tool that automatically scans the Lithuanian IP address space to find harmful images, mainly related to child sexual abuse and pornography. If found, the content is automatically forwarded to the hotline of Communications Regulatory Authority in Lithuania (CRA) specialists for review.
"We are grateful for the recognition of our joint endeavors with CRA to make the internet a safer space," said Julius Černiauskas, CEO of Oxylabs. "When creating this tool, there was a lot of excitement in the office, as it was not only a great challenge but also benefited the greater good, protecting the most vulnerable groups of our society. We believe that such partnerships contribute to a world where justice and strong public institutions are the centerpieces of society."
Projects like these show or the ones aimed to fight human trafficking once again that there’s no good or bad technology but it’s only how humans decide to take advantage of it.
If you wish to receive articles like this directly in your email, you can subscribe below.