Editor’s note: We’re hearing today from Auto Trader UK, the UK and Ireland’s largest online automotive marketplace, about how BigQuery’s robust performance has become the data engine powering real-time inventory and pricing information across the entire organization.
Auto Trader UK has spent nearly 40 years perfecting our craft of connecting buyers and sellers of new and used vehicles. We host the largest pool of sellers, listing more than 430,000 cars every day and attract an average of over 63 million cross platform visits each month. For the more than 13,000 retailers who advertise their cars on our platform, it’s important for them (and their customers) to be able to quickly see the most accurate, up-to-date information about what cars are available and their pricing.
BigQuery is the engine feeding our data infrastructure
Like many organizations, we started developing our data analytics environment with an on-premise solution and then migrated to a cloud-based data platform, which we used to build a data lake. But as the volume and variety of data we collected continued to increase, we started to run into challenges that slowed us down.
We had built a fairly complex pipeline to manage our data ingestion, which relied on Apache Spark to ingest data from a variety of data sources from our online traffic and channels. However, ingesting data from multiple data sources in a consistent, fast, and reliable way is never a straightforward task.
Our initial interest in BigQuery came after we discovered it integrated with a more robust event management tool for handling data updates. We had also started using Looker for analytics, which already connected to BigQuery and worked well together. As a result, it made sense to replace many parts of our existing cloud-based platform with Google Cloud Storage and BigQuery.
Originally, we had only anticipated using BigQuery for the final stage of our data pipeline, but we quickly discovered that many of our data management jobs could take place entirely within a BigQuery environment. For example, we use the command-line tool DBT, which offers support for BigQuery, to transform our data. It’s much easier for our developers and analysts to work with than Apache Spark since they can work directly in SQL. In addition, BigQuery allowed us to further simplify our data ingestion. Today, we mainly use Kafka Connect to sync data sources with BigQuery.
Looker + BigQuery puts the power of data in the hands of everyone
When our data was in the previous data lake architecture, it wasn’t easy to consume. The complexity of managing the data pipeline and running Spark jobs made it nearly impossible to expose it to users effectively. With BigQuery, ingesting data is not only easier, we also have multiple ways we can consume it through easy-to-use languages and interfaces. Ultimately, this makes our data more useful to a much wider audience.
Now that our BigQuery environment is in place, our analysts can query the warehouse directly using the SQL interface. In addition, Looker provides an even easier way for business users to interact with our data. Today, we have over 500 active users on Looker—more than half the company. Data modeled in BigQuery gets pushed out to our customer-facing applications, so that the dealers can log into a tool and manage stock or see how their inventory is performing.
Striking a balance between optimization and experimentation
Performance in BigQuery can be almost too robust: It will power through even very unoptimized queries. When we were starting out, we had a number of dashboards running very complex queries against data that was not well-modeled for the purpose, meaning every tile was demanding a lot of resources. Over time, we have learned to model data more appropriately before making it available to end-user analytics. With Looker, we use aggregate awareness, which allows users to run common query patterns across large data sets that have been pre-aggregated. The result is that the number of interactively run queries are relatively small.
The overall system comes together to create a very effective analytics environment — we have the flexibility and freedom to experiment with new queries and get them out to end users even before we fully understand the best way to model. For more established use cases, we can continue optimizing to save our resources for the new innovations. BigQuery’s slot reservation system also protects us from unanticipated cost overruns when we are experimenting.
One of the examples where this played out was when we rolled new analytic capabilities out to our sales teams. They wanted to use analytics to drive conversations with customers in real-time to demonstrate how advertisements were performing on our platform and show the customer’s return on their investment. When we initially released those dashboards, we saw a huge jump in usage of the slot pool. However, we were able to reshape the data quickly and make it more efficient to run the needed queries by matching our optimizations to the pattern of usage we were seeing.
Enabling decentralized data management
Another change we experienced with BigQuery is that business units are increasingly empowered to manage their own data and derive value from it. Historically, we had a centralized data team doing everything from ingesting data to modeling it to building out reports. As more people adopt BigQuery across Auto Trader, distributed teams build up their own analytics and create new data products. Recent examples include stock inventory reporting, trade marketing and financial reporting.
Going forward, we are focused on expanding BigQuery out into a self-service platform that enables analysts within the business to directly build what they need. Our central data team will then evolve into a shared service, focused on maintaining the data infrastructure and adding abstraction layers where needed so it is easier for those teams to perform their tasks and get the answers they need.
BigQuery kicks our data efforts into overdrive
At Auto Trader UK, we initially planned for BigQuery to play a specific part in our data management solution, but it has become the center of our data ingestion and access ecosystem. The robust performance of BigQuery allows us to get prototypes out to business users rapidly, which we can then optimize once we fully understand what types of queries will be run in the real world.
The ease of working with BigQuery through a well-established and familiar SQL interface has also enabled analysts across our entire organization to build their own dashboards and find innovative uses for our data without relying on our core team. Instead, they are free to focus on building an even richer toolset and data pipeline for the future.
Source : Data Analytics Read More