Web Scraping News: October Monthly Recap

Scraping personal data? Not a good idea

Oct 30, 2022

Meta settles a lawsuit against two companies scraping Facebook and Instagram data

original source: Techcrunch

Meta, Facebook's parent company, has settled a lawsuit against two companies that were scraping data from Facebook and Instagram users for marketing intelligence purposes.

The original complaint was filed in October 2020 against BrandTotal Ltd, which claims to offer its customers a real-time competitive intelligence platform to monitor their competitors' social media strategy and paid campaigns.

The second company named in the suit is Unimania which offered apps to access social networks in different ways, like seeing Instagram stories anonymously.

In order to evade the websites protections against scraping, these companies exploited users’ access to the service through a set of browser extensions called “UpVoice” and “Ads Feed” designed to access and collect data. When people installed the extensions and visited the websites the browser extensions used automated programs to scrape their name, user ID, gender, date of birth, relationship status, location information, and other information related to their accounts.

According to the filing that detailed the proposed settlement, both companies agreed to stop scraping or assisting others in data collection practices, delete their software and code and agree to a ban on distributing or selling any data they collected through their operations, among other things. It also notes that they agreed to pay monetary damages in a confidential settlement.

This doesn't matter that web scraping is always illegal but means that, once you log in to a website, you are accepting its terms and conditions and Facebook ones don't allow web scraping.

French Government Hits Clearview With The Maximum Fine For GDPR Violations

original source: Techdirt

Another day, another fine for Clearview, the US company that maintains a database of ten billion pictures of people's faces, from all over the world, scraped from several sources.

The company uses this biometric data for selling law enforcement and retail services, in what seems the worst possible use of web scraping on Earth. In fact, the French government just fined the company 20 Million USD for GDPR violations, following the example of the Italian one (21 million fine in March) and the UK (9.4 Million fine).

Clearview’s CEO Hoan Ton-That commented the fine saying:

"There is no way to determine if a person has French citizenship, purely from a public photo from the internet, and therefore it is impossible to delete data from French residents. Clearview AI only collects publicly available information from the internet, just like any other search engine like Google, Bing or DuckDuckGo."

If this is true, Clearview’s product is illegal everywhere in Europe. Clearview is admitting it cannot determine the origin of the images and data it scrapes, which also means it can’t comply with its own agreements/legal settlements where it has agreed to stop collecting in certain locales and delete data pertaining to these residents.

The issue here — especially in countries subject to the GDPR — is consent. While scraping data from the open web breaks no US laws, collecting data without consent does violate some state laws and clearly violates the GDPR. The only privacy standard Clearview appears to recognize is that whatever can be scraped without a login isn’t private and that it has a right to collect it, compile it, make it searchable, and sell it to government agencies and other customers.

Web Scraping Adoption in E-commerce keeps increasing

A new white paper about web data adoption in e-commerce in UK and US, provided by Oxylabs, is just available.

Findings indicate that web scraping has firmly entrenched itself within the e-commerce industry – more than three-quarters (75.7%) of companies employ it in their daily operations. Additionally, most have already seen impressive returns from the data collection method, as 32.4% state that web scraping has generated the most revenue.

You can download at this page the full white paper.

For today is enough, Please be respectful of the target websites and the privacy laws when scraping, as we have seen bad things happen.

The Web Scraping Club

Discussion about this post

Ready for more?