Web scraping and journalism: the Chiara Ferragni case
Validating ideas and news with web scraped data
Climate change? Political elections? Economic crisis? No, one of the most followed news streams in Italy is about the infamous “Pandoro gate” of Chiara Ferragni.
But let’s start from the basics (I promise I’ll keep this part short, but enough to understand the context).
Who is Chiara Ferragni?
Chiara Ferragni is an Italian blogger, businesswoman, fashion designer, and model who has gained global fame through her blog The Blonde Salad, and as of today, she has 29 million followers on Instagram.
Her business ventures grossed about $8 million in 2014, mostly from her shoe line. As of 2024, her net worth is estimated at over $30 million. She has collaborated with many major fashion and beauty brands.
Some years ago she married a famous singer in Italy and they were considered “the Italian royal family”, celebrated everywhere also because of their charity projects.
What is the Pandoro Gate then?
Long story short, one of these charity projects was linked to the sales of the traditional Christmas sweet bread called Pandoro.
The claim of the campaign was quite ambiguous and seemed to link the sales of this overpriced Pandoro branded by Chiara Ferragni to the amount of money given to charity: the more Pandoro sold, the more money for the good cause. After a long investigation by the journalist Selvaggia Lucarelli (edit: she just opened her substack
in Italian in case you’re interested), the real schema of the campaign has been discovered. The model has been paid for the promotion of the Pandoro by its producer, and a fixed fraction of the money has been sent to the charity initiative, while the sales were completely unlinked from it.On top, it seems that the scheme has been repeated on other different occasions, generating a violent reaction both in the business and in the common people.
Most of the collaborations and advertising centered on the Ferragni figure have been put on hold or canceled (Pantene, Safilo, Tod’s just to mention some), while the comments under Instagram’s pictures of the celebrity have been locked for several weeks.
You can imagine the economic damage that such a famous influencer is facing by becoming, all of a sudden, from a queen to an “untouchable" person, with no one willing to be linked with her.
OK, you can thank me later for giving you such an interesting topic of discussion with your friends.
Jokes apart, what does this story have to do with web scraping?
Measuring brand crisis with web scraping
One of the businesses owned by Chiara Ferragni is her own clothing brand, which in 2022 made 14 Million EUR in revenues, mostly sold via third-party channels (wholesalers).
After the reputational crisis, many articles reported that shops were not able to sell her products anymore and needed to heavily discount them, but without showing any number.
If only they could have a place where to get some web-scraped data for their analysis, with ready-made datasets and their history…
Oh, it exists, and it is called Databoutique.com! And all this story is a shameless plug for showing you the cool analysis you can make when you have history on web-scraped data.
Let’s see together some of the analyses that can be created having this data available.
Are there more discounts than before?
Before diving into the discount theme, we should distinguish the sale channels we can analyze.
In the fashion industry, we have websites specializing in the current season offer from the brands (so-called in-season websites), and others that are offering past seasons of multiple brands (off-season websites).
Since the Chiara Ferragni brand offers affordable items, we should avoid websites specialized in high-end luxury like Net-a-porter and Mytheresa, but focus on e-commerce with a more varied offer, like Farfetch.
Farfetch is also a great magnifying glass for the fashion industry since its supply side consists of hundreds of physical stores all over the world that share their inventory with the platform. Prices are not (at least totally) controlled by the website but decided by the single boutique, giving a more realistic picture of what’s happening on the market.
So, is it true that after the scandal, more items of Chiara Ferragni were discounted?
Discounts on these platforms follow a seasonality and, more or less every six months, we have periods with the strongest offering of discounted items.
In the chart, we can see that during these discount periods, even before the “Pandoro Gate”, the percentage of items on sale at Farfetch Italy was more than double that of all the other brands.
Unluckily the brand re-entered the Farfetch website only in August 2023 so we don’t have a longer history for this case, but we can see that probably boutiques were already struggling to sell the influencer’s clothing line even before the crisis.
How many items are available today on Farfetch?
Depending on the website, we can be lucky enough to find in the code the exact amount of items available online. In the case of Farfetch, we’re inside this case, so we can see how the crisis affected the availability of the influencer’s clothing line.
From the Inventory Dataset for Farfetch, available on Databoutique, we can see that the stock available for Chiara Ferragni products, since November 2023, is quietly but continuously descending.
The same is happening to the number of different items available for sale on Farfetch for the same brand.
Given the high number of discounts seen before, it’s unluckily that all these items were sold but it’s more likely that fewer boutiques that compose the sell side of the Farfetch marketplace are selling the brand. Some of them could have given away their inventory to off-season websites like Yoox, to free space in their inventory and, at the same time, monetize their clearance (or at least reduce the losses).
Items dumped to off-season websites?
Let’s see if we can find another clue that confirms our theory, so let’s have a look at the Yoox prices dataset and see if there are more items available and their discounts.
Unluckily, for Yoox, we don’t have the inventory levels on the website so we cannot understand the depth of the stock.
What we can see from the chart is that we don’t have an increase in the number of items for the Chiara Ferragni brand, but, starting from April, more and more items are getting discounted.
So it seems that the items from the boutiques didn’t land on Yoox, but the website itself is struggling to sell what’s on the shelf.
The latest chart seems to consolidate this thesis. Discounts on the brand are at the highest level from the beginning of 2023. Even in the latest holiday season, the discounts were not so high. It seems another clue that the reputational crisis for the influencer is heavily affecting her businesses, as the latest news is reporting.
This case is only one of the many examples of how web-scraped data can add context and numbers to the news we daily read. Making them more accessible, even for non-tech-savvy people, means they can make more informed decisions and can add strength (or confute) their ideas and theories.