8 Comments

Thank you Andrea; this is very well-written.

Expand full comment

Thanks, Alex. I'd love to chat some times on how the industry is approaching this. I've hears of several efforts of rebuilding CPIs bottom-up, so I'm curoius.

Expand full comment

I believe people are trying to use a combination of different sources, not just web-scraped data. See for example Yipit's US Inflation Index (mostly built using email receipts as far as I can tell); there are some other alt data companies that do this.

As you well explained in your article: with web scraping we can't get historical data, so we have to build and maintain bespoke scrapers that _consistently_ collect data over a long period of time. So using web scraping is probably one of the hardest ways to get to the signal.

Expand full comment

Yes, integrating different sources is indeed a widely pursued strategy. What I appreciate about web scraping—despite its significant limitations—is the public availability and non-proprietary nature of the datasets it generates.

I like to draw a parallel to the rise of GenAI: true, mind-blowing advancements only became possible when someone decided to "scrape them all" and train the largest language models ever seen at that time. It might not be perfect, but online prices can fill the gaps left by other sources. Combined, they could offer a very powerful signal—not just to anticipate CPIs, but to actually build better ones.

Of course, I’m a bit biased since I work with web data, but it’s exciting to think about the possibilities! :)

Expand full comment

Hi, and thank you very much for your answer. I'm not economist, but i ever heard "inflation runs over basic products" and yes you will have the way to get data for more products than this.

But i think the goal of data (in a lot of cases) is help us to take decisions. In this case, will help too to build inflation index once a month. Suppose an scenario that 20 products up price between 0.5 and 1%, this is really very hard to perceive in traditional methods, and at end of month will noise the infl index. People don't lost in one product, but lost the same or more money with a little of each. i continue thinking that fast and opportune data in the right hands, will help peoples, more than others in this system..

Your new friend, Cristian

Expand full comment

This reminds me of an old saying: "You can’t control what you don’t measure" (or, more informally, "If you want to lose weight, buy a scale"). This principle applies perfectly here. A more precise, detailed, and frequent measurement of prices could provide insights, as prices are ultimately the foundation of all economic interactions.

Expand full comment

Dear Mr Andrea, thank you very much, this is the best good scraping usage i heard.

A system like this may help a lot of people, and a entire country.

And the most important (i think) is : Knowing in "near real time" the inflation variables , open the door to correct some situations in "near real time" too. So country could "manage" and correct (if apply) the situation, cause isn't the same know and correct than only do the numbers once at month. Thanks again regards!

Expand full comment

Thanks, Cristian. I completely agree that the speed of information is a significant advantage. However, I believe the real game-changer lies in the level of detail such a system can provide. By offering transparency on which specific goods have increased in price and when—rather than speaking about "inflation" in general terms—we can gain a much deeper understanding of the economic dynamics at play.

Expand full comment