Is it legal to scrape social networks like Facebook or Instagram?
Scraping social media websites like Facebook, Instagram, Twitter, or Linkedin must be done with all the cautions possible because of the sensitivity of the data they contain. We can focus on two main topics for a better understanding of potential issues: platforms’ ToS and privacy concerns.
Scrape following the platform Tos
As for every website you decide to scrape, also for social media platforms, you must be well aware of their Terms and Conditions of Use. And since the data they hold is pretty sensitive, generally web scraping is not tolerated, especially after some scandals like the Cambridge Analytica case with Facebook.
Some platforms like Twitter give access to their API to selected partners, so they can gather the data they need without infringing any law but there’s not always this option.
Generally speaking, but this is not a bit of legal advice, if you can get data without logging into the platforms, you can scrape it without breaking any Tos, as you formally not accepted any by not logging in.
And even when scraping is done legally, it takes time and money to demonstrate it, as the HiQ vs Linkedin case shows. Even if HiQ for the law has the right to scrape public information available on Linkedin, it took years of legal battles to get to a sentence.
Privacy concerns when scraping
On top of the previous aspects, there’s also another important aspect to consider when scraping social media: privacy.
In fact, you’re dealing with people’s personal data and this has a huge array of impacts.
Every country has its own laws about privacy and you must be aware of all of them, not only the ones from the country where you or your business are located.
As an example, if you’re scraping Facebook from India and you incur in data of French people, this needs to be treated following the European GDPR rules. And this applies to every nationality of the people you encounter while scraping.
For this reason, all the data gathered should be anonymized and synthesized, to avoid any leak of personal data and only then sold for sentiment analysis or other purposes.
With this article, I wanted to give a brief introduction to some peculiar aspects of scraping social media platforms. Unlike product data, personal data is much more dangerous to handle because of the issues we’ve seen, so before starting any social media-related project, please consult a legal office for opinions. Every month we have news of legal battles on this topic, so it can be a mined ground to walk in without any legal advisory.
If you want to know more, you can have a look at this video from Zyte
This post is written by Pierluigi Vinciguerra (firstname.lastname@example.org)
If you liked this post and want to receive in your inbox a weekly article about web scraping, please consider subscribing to The Web Scraping Club for free.