Web data and automotive industry
How public web data can help understand the current state of the industry and see trends.
The automotive industry, especially in Europe, is facing tumultuous times. Factories are closing to raise margins, and the complete transition to EVs is going slower than expected. These vehicles are still too expensive for the masses, and the infrastructure is not homogeneous across the continent. R&D expenses for EVs and stricter regulations on ICE (internal combustion engine) vehicles are pushing up prices, making sales plummet and raising used car prices. In addition to all this, new players, especially from China, are coming to the European market with good products and affordable prices.
Before proceeding, let me thank NetNut, the platinum partner of the month. They have prepared a juicy offer for you: up to 1 TB of web unblocker for free.
Given this complexity, web data can add more pieces to the puzzle analysts and managers have to solve, allowing them to make more informed decisions.
In this post, we’ll examine several use cases. Some examples and websites will be specific to the Italian market, but the approach applies to every country.
In particular, we’ll see the following use cases:
Monitoring Manufacturer Websites for Vehicle Availability and Dealers’ network
Scraping Used Car Listing Websites for Pricing and Demand Analysis
Analyzing EV Charger Data from Piattaforma Unica Nazionale (Italy’s EV charging platform)
Extracting Market Trends from UNRAE (Italian automotive sales statistics)
Scraping prices of new cars for market analysis
Each section highlights the use case, key data to collect, and tools/methodologies to implement the scraping and analysis. Let’s explore how data analysts in automotive can put web data to work.
1. Monitoring Manufacturer Websites for Vehicle Availability and Dealers’ network
During the first year of The Web Scraping Club, I wrote an article about scraping the Tesla website to create a dataset for alternative data, which is still valuable today.
The website remained the same, and the internal API we used is still available. With some basic requests, we can understand not only how many cars are available for sale in every zip code in the country but also the price of used models.
Tracking this API often enough can show us whether the availability of new cars ready to be sold is contracting or expanding. Combined with used car prices, this data can show the depreciation over time for every model.
Another interesting data we can scrape from manufacturer websites is the prices in different countries. Although some countries, like the USA, typically have different car configurations than Europe, it could be interesting to monitor the different prices around the world for the same model, especially in this period when tariffs seem to be the new buzzword.
A third interesting use case is monitoring a manufacturer's dealer network per country. We mentioned the Chinese brands that are now entering the European market. BYD is one of them. Until some years ago, it was almost unknown in Italy. Today, its dealer network covers nearly every major city.
Monitoring dealer networks can help determine whether a brand is gaining traction in a country or losing its appeal.
Scraping every manufacturer’s website can be time-consuming, given their number and the fact that manufacturers have websites in different countries. This means creating many custom scrapers, but as we have seen, there’s a goldmine of information we can extract from them.
Thanks to the gold partners of the month: Smartproxy, Oxylabs, Massive and Scrapeless. They’re offering great deals to the community. Have a look yourself.
2. Scraping Car Listing Websites for Pricing and Demand Analysis
Luckily, there are listing websites specializing in used and new cars in almost every country. In Italy, for example, we have AutoScout24.
By scraping both used and new vehicles available, we can gather information about the prices for new models and their valuation decay over time. Measuring the number of cars on sale for a specific model after a few years, related to the number of vehicles sold for the same model, can also indicate how much people liked the car they bought.
Scraping these websites gives a great opportunity for understanding how the market is performing and the pricing dynamics for every brand, but it also has some cons.
The data quality of the listings is not uniform, and some additional work is required to get a structured and valuable dataset. Per each model of car, we may have several versions that have succeeded over the years, making it difficult to compare apples with apples.
By scraping both new and used car listings, the key attributes to get are:
Make and Model: The brand and specific model of the vehicle (e.g. Volkswagen Golf).
Year of manufacture: Model year or registration year, which affects value.
Price: The asking price listed. This is critical for price trend analysis and comparison.
Mileage: The number of kilometers the car has been driven (odometer reading) is another key factor in used car valuation.
Fuel type: Gasoline, Diesel, Hybrid, Electric, etc. (fuel type popularity can indicate trends in consumer preference for eco-friendly cars, etc.).
Engine/Specs: Engine size, horsepower, or other specs if listed (e.g. transmission type, features like navigation or sunroof).
Condition and age: Whether the car is used, certified, pre-owned, or nearly new (km 0); also, the year gives the age.
Location: Where the car is being sold (city/region). Regional data helps analyze demand and price differences across areas.
Seller type: Dealer or private seller, which might impact pricing (dealers often give warranty, etc.).
3. Analyzing EV Charger Data from Piattaforma Unica Nazionale (PUN)
As electric vehicle adoption accelerates, the supporting charging infrastructure becomes a critical piece of the automotive ecosystem. Italy’s Piattaforma Unica Nazionale (PUN) for EV charging is a national platform that maps all public charging points in the country. According to news at launch, the platform already mapped over 32,000 charging points in Italy, out of roughly 44,000 total existing public points at that time.
For automotive analysts (especially those focusing on EV strategy) and energy providers, scraping data from PUN can reveal how the charging network is expanding over time and where gaps exist. This data helps answer questions like: Are there enough chargers to support the growing EV fleet? Which regions are investing heavily in chargers, and which are lagging? Such insights inform decisions on EV rollout plans, infrastructure investments, and even sales strategies for electric cars.
.The map displays each charger’s location along with key attributes like the type of power supply (AC or DC), the maximum power output (in kW), the operator managing the station (Charging Point Operator), and the status of the charging point (available, in use, or out of service). In short, PUN consolidates EV charger data that was previously scattered among various operators.
In general, from this kind of websites, we could get this kind of data for our analysis:
Charger Location: The geographic coordinates or address of each charging station. This allows mapping and spatial analysis (e.g., number of chargers per square km or per city).
Charger ID and network: This field identifies the charger and the network/operator it belongs to (e.g., Enel X, Tesla Supercharger, Ionity). The operator field can show the company's market share of infrastructure.
Power Details: The power supply type (AC, DC) and max power output (e.g., 22 kW AC, 50 kW DC, 150 kW DC) indicate whether a station is a slow charger or a fast charger, which is important for assessing capability.
Number of Points at the station: Some locations have multiple charging points (e.g., a station with four plugs). If available, note the number of connectors or points and their types.
Status/Availability: This indicates whether the charger is operational and possibly in use. At a minimum, PUN marks whether a point is active in the network; future real-time integration might show live availability, just like it happens for other websites.
Install Date (if available): Not sure if PUN provides this, but if one tracks data over time, the first appearance of a new charger can serve as an “install date.”
The real value comes from analyzing trends over time. By scraping the PUN data (or similar websites) regularly (say monthly), one can track how the number of charging points grows and where new stations appear:
Growth Metrics: Calculate the total number of charging points over time. Italy’s public chargers count has been growing rapidly. In 2024 alone, over 13,700 new charging points were installed, bringing the total to more than 64,000 by year’s end.
This was about a 27% increase year over year. Tracking such growth is essential for forecasting when infrastructure will meet specific targets (such as EU requirements or support for a projected number of EVs).
Geographical Distribution: Map the chargers by region or province. Analysts can produce choropleth maps (e.g., chargers per 100,000 inhabitants by region) to identify infrastructure-rich areas versus underserved areas. If Lombardy has the highest number of chargers (which it does, being a populous region) and some southern regions have far fewer, that disparity is important for policymakers and businesses to address.
Urban vs. Rural: By overlaying chargers on population density maps or major highway routes, one can see if the infrastructure is concentrated mainly in cities and along highways (which is likely the case). Equal distribution is a goal of a nationwide EV strategy, so highlighting gaps (e.g., few highway fast chargers in specific stretches or few chargers in small towns) can drive targeted installation efforts.
Charger Capacity Trends: It’s not just the count of chargers but also their capacity. Over time, are new installations trending toward higher-power fast chargers? For example, you might find that in 2023, most new installations were 50kW DC, but in 2024, more ultra-fast 150kW+ stations were added. This indicates a technological shift to support faster charging.
Operator Market Share: Analysts can use the operator field to determine which companies are leading the rollout. Perhaps Enel X (the utility) operates 40% of all public chargers, while Tesla’s network accounts for 5%, etc. Changes in these shares over time might indicate new players entering or aggressive expansion by certain operators.
The insights gleaned from EV charger data are actionable for several stakeholders:
Automotive OEMs (especially EV manufacturers): They can identify if the charging infrastructure growth is keeping pace with their EV sales. If not, they might delay certain regional launches or push for infrastructure development. Also, knowing that certain areas have many chargers can guide marketing efforts for EVs in those regions (consumers there may feel more confident buying an EV).
Energy and Utility Companies: They can use the data to plan where to install the next stations. If a city has very few public chargers, that’s an opportunity for investment. Trends showing high utilization (if one can scrape status/availability repeatedly to gauge usage) would indicate where to add capacity.
Government and Policy Makers: They can monitor progress towards policy goals (e.g., “X number of chargers by 2025”) using the data. If some regions are behind, they can allocate funding or incentives accordingly. Data on the network’s growth can also be communicated to the public to build confidence in EV adoption.
Urban Planners: Knowing locations of all chargers helps integrate EV infrastructure planning with other urban development (e.g., ensuring new parking lots or malls include charging, and avoiding clustering too many stations in one area while others have none).
4. Extracting Market Trends from UNRAE Data (Sales Statistics)
The automotive market is rapidly evolving – with shifts from diesel to electric, changes in consumer preferences for vehicle types (e.g., SUVs vs sedans), and the rise and fall of brands. In Italy, UNRAE (Unione Nazionale Rappresentanti Autoveicoli Esteri) publishes detailed vehicle registration statistics that are goldmine data for analysts. By scraping these sales statistics (often released monthly), analysts can quantify trends in new vehicle sales and identify shifts in the market. This helps manufacturers and dealers align their strategies with consumer demand and helps investors or policymakers understand the market trajectory.
Let’s keep the UNRAE’s data portal as an example, even if I suppose there are similar websites in every country. It provides granular reports on vehicle registrations. For each month (and year aggregates), you can find data broken down by brand, automotive group, model rankings, fuel type, segment (size class), vehicle body type, geographical area, and more . For example, they publish a “Top 50 models” report for each month (which shows the 50 best-selling models in that month), a report on the market structure by fuel type (how many petrol, diesel, hybrid, electric, etc.), and even by user type (private, rental, corporate fleets). This comprehensive coverage means a scraper can collect a very rich dataset to analyze market trends over time.
The only problem is that all these reports are PDFs. Ouch! Luckily for us, at least the pages containing the link to the reports have the same URL structure, so using the website's sitemap, it’s easy to find all the PDFs for a certain report created so far.
To programmatically extract tables from these PDFs, a Python tool like Tabula or Camelot can be used, or GPT can be used.
By analyzing the scraped data across months and years, an analyst can spot critical trends:
Shift in Fuel Types: Perhaps the most talked-about trend is the decline of diesel and the rise of electrified powertrains. For instance, data from early 2025 showed gasoline and diesel car sales plummeting year-over-year (diesel was down 38.5% in Feb 2025, capturing only 9.7% of the market), while hybrid and electric sales climbed sharply (battery EVs up 38.2% year-over-year, reaching a 5.0% market share in Feb 2025). Hybrid electric vehicles (non-plug-in) even made up nearly 45% of sales, indicating a major consumer shift toward cleaner options. Tracking this trend over each month or quarter shows how quickly the transition is happening. An analyst might plot the market share of each fuel type over time, likely seeing petrol and diesel lines decline and hybrid/EV lines rise. This informs manufacturers how soon they need to pivot their lineups.
Segment and Body Style Trends: The data might show that SUVs continue to gain market share at the expense of smaller cars or sedans. UNRAE’s segment breakdown would let you see if, say, segment C (compact cars) is shrinking while crossovers are growing. Over a few years, these shifts are pronounced and can influence product planning (e.g., more crossover models and fewer sedans introduced).
Top Models and Brands: By compiling the monthly top 10 or top 50 models, one can see which new models are hits and which formerly popular models are fading. For example, the Fiat Panda often tops Italy’s sales, but the data might reveal new entrants climbing the ranks (in Feb 2025, the Jeep Avenger – a new compact EV – appeared among the top 5 best-selling cars, signaling strong acceptance of a new model). Brand-wise, tracking market share over time can show if certain automakers are on the rise (e.g., Tesla’s share of the Italian market creeps up yearly from near-zero to a noticeable percentage or Chinese EV brands emerging).
Regional Variations: UNRAE also sometimes provides sales by region. An analyst could find, for instance, that Northern areas account for a disproportionate share of EV sales relative to the South or that certain car types are more prevalent in rural vs. urban areas. Such insights can tailor regional marketing strategies for automakers.
Total Market Health: Basic metrics like monthly new registrations (seasonally adjusted) indicate the overall market trend. Is the market rebounding or declining? The data show that total sales slightly dipped in early 2025 compared to 2024, possibly due to expiring incentives or economic factors. This information is important for forecasting and stakeholders like dealerships to anticipate business volume.
Conclusion
In this article, I wanted to emphasize the role that web data could play in the automotive sector. By using online sources—from real-time inventory on manufacturer websites to expansive listings on used car marketplaces, from EV charging station maps to official sales statistic portals—data analysts can gain actionable insights across the value chain.