Editor’s note: The post is part of a series showcasing our partners, and their solutions, that are Built with BigQuery.
Data collaboration, the act of gathering and connecting data from various sources to unlock combined data insights, is the key to reaching, understanding, and expanding your audience. And enabling global businesses to accurately connect, unify, control, and activate data across different channels and devices will ultimately optimize customer experiences and drive better results.
LiveRamp is a leader in data collaboration, with clients that span every major industry. One of LiveRamp’s enterprise platforms, Safe Haven, helps enterprises do more with their data, and is especially valuable for brands constructing clean room environments for their partners and retail media networks. It enables four universal use cases to facilitate better customer experiences:
The core challenge: Creating accurate cross-channel marketing analytics
As brand marketers accelerate their move to the cloud, they struggle to execute media campaigns guided by data and insights. This is due to challenges in building effective cross-channel marketing analytics, which need to overcome the following hurdles:
Lack of a common key to accurately consolidate and connect data elements and behavioral reports from different data sources that should be tied to the same consumer identity. Such a “join key” can not be a constructed internal record ID, as it must be semantically rich enough to work for both individual and household-level data across all the brand’s own prospects and customers, and across all of the brand’s partner data (e.g., data from publishers, data providers, co-marketing partners, supply-chain providers, agency teams).Reduced data availability from rising consumer authentication requirements makes it difficult to reach a sufficient sample volume for accurately driving recommendation and personalization engines, creating lookalikes, or for creating unbiased data inputs for incorporating into machine learning training.Brand restrictions on analytic operations to guard against data leaks of sensitive consumer personally-identifiable information (PII). By reducing operational access to consumer data, brands increase their data protections but decrease their data science team’s ability to discover key insights and perform cross-partner audience measurements.Decreased partner collaboration from using weak individual record identifiers, such as hashed emails, as the basis for record matching. Hashed emails are often produced by default by many customer data platforms (CDPs), but these identifiers are insecure due to their easy-reversibility and have limited capacity to connect the same individual and household across partners.
LiveRamp solves these challenges for marketers by building a suite of dedicated data connectivity tools and services centered on Google’s BigQuery ecosystem — with the objective of creating the ultimate open data-science environment for marketing analytics on Google Cloud.
The resulting LiveRamp Safe Haven environment is deployed, configured and customized for each client’s Google Cloud instance. The solution is scalable, secure and future-proof by being deployed alongside Google’s BigQuery ecosystem. As Google Cloud technology innovation continues, users of Safe Haven are able to naturally adopt new analytic tools and libraries enabled by BigQuery.
All personally-identifiable data in the environment is automatically processed and replaced with LiveRamp’s brand-encoded pseudonymized identifiers, known as RampIDs:
These identifiers are derived from LiveRamp’s decades of dedicated work on consumer knowledge and device-identity knowledge graphs. LiveRamp’s identifiers represent secure people-based individual and household-level IDs that let data scientists connect audience records, transaction records, and media behavior records across publishers and platforms.Because these RampIDs are based on actual demographic knowledge, these identifiers can connect data sets with real person-centered accuracy and higher connectivity than solutions that rely on string matching alone.RampIDs are supported in Google Ads, Google’s Ads Data Hub, and hundreds of additional leading destinations including TV and Connected TV, Walled Gardens, ecommerce platforms, and all leading social and programmatic channels.
The Safe Haven data, because of its pseudonymization, presents a much safer profile for analysts working in the environment, with little risk to insider threats due to PII removal, a lockdown of data exports, transparent activity logging, and Google Cloud’s powerful encryption and role-based permissioning.
LiveRamp’s Safe Haven solutions on Google Cloud have been deployed by many leading brands globally, especially brands in retail, CPG, pharma, travel, and entertainment. Success for all of these brands is due in large part to the combination of the secure BigQuery environment and the ability to increase data connectivity with LiveRamp’s RampID ecosystem partners.
One powerful example in the CPG space is the success achieved by a large CPG client who needed to enrich their understanding of consumer product preferences, and piloted a focused effort to assess the impact of digital advertising on audience segments and their path to purchase at one large retailer.
Using Safe Haven running on BigQuery, they were able to develop powerful person-level insights, create new optimized audience segments based on actual in-store product affinities, and greatly increase their direct addressability to over a third of their regional purchasers. The net result was a remarkable 24.7% incremental lift over their previous campaigns running on Google and on Facebook.
Built with BigQuery: How Safe Haven empowers analysts and marketers
Whether you’re a marketer activating media audience segments, or a data scientist using analytics across the pseudonymized and connected data sets, LiveRamp Safe Haven delivers the power of BigQuery to either end of the marketing function.
Delivering BigQuery to data scientists and analysts
Creating and configuring an ideal environment for marketing analysts is a matter of selecting and integrating from the wealth of powerful Google and partner applications, and uniting them with common data pipelines, data schemas, and processing pipelines. An example configuration LiveRamp has used for retail analysts combines Jupyter, Tableau, Dataproc and BigQuery as shown below:
Data scientists and analysts need to work iteratively and interactively to analyze and model the LiveRamp-connected data. To do this, they have the option of using either the SQL interface through the standard BigQuery console, or for more complex tasks, they can write Python spark jobs inside a custom JupyterLab environment hosted on the same VM that utilizes a Dataproc cluster for scale.
They also need to be able to automate, schedule and monitor jobs to provide insights throughout the organization. This is solved by a combination of BigQuery scheduling (for SQL jobs) and Google Cloud Scheduler (for Python Spark jobs), both standard features of Google Cloud Platform.
Performing marketing analytics at scale utilizes the power of Google Cloud’s elasticity. LiveRamp Safe Haven is currently running on over 300 tenants workspaces deployed across multiple regions today. In total, these BigQuery instances contain more than 350,000 tables, and over 200,000 load jobs and 400,000 SQL jobs execute per month — all configured via job management within BigQuery.
Delivering BigQuery to marketers
SQL is a barrier for most marketers, and LiveRamp faced the challenge of unlocking the power of BigQuery for this key persona. TheAdvanced Audience Builder is one of the custom applications that LiveRamp created to address this need. It generates queries automatically and auto-executes them on a continuous schedule to help marketers examine key attributes and correlations of their most important marketing segments.
Queries are created visually off of the customers’ preferred product schema:
Location qualification, purchase criteria, time windows and many other factors can be easily selected through a series of purpose-built screens that marketers, not technical analysts, find easy to navigate and which quickly unlock the value of scalable BigQuery processing to all team members.
By involving business and marketing experts to work and contribute insights alongside the dedicated analysts, team collaboration is enhanced and project goals and handoffs are much more easily communicated across team members.
What’s next for LiveRamp and Safe Haven?
We’re excited to announce that LiveRamp was recently named Cloud Partner of the Year at Google Cloud Next 2023. This award celebrates the achievements of top partners working with Google Cloud to solve some of today’s biggest challenges.
Safe Haven is the first version of LiveRamp’s identity-based platform. Version 2 of the platform, currently in development, is designed to have even more cloud-native integrations within Google Cloud. There will be more updates on the next version soon.
The Built with BigQuery advantage for ISVs and data providers
Built with BigQuery helps companies like LiveRamp build innovative applications with Google Data and AI Cloud. Participating companies can:
Accelerate product design and architecture through access to designated experts who can provide insight into key use cases, architectural patterns, and best practices.Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.
Anybody familiar with the nature of technology recognizes the contributions of artificial intelligence. AI technology has been instrumental in transforming the healthcare and financial industries, as well as many other sectors. Fewer people talk about the role that AI has played in the creative arts professions. However, there are a number of reasons that AI […]
Big data has led to a number of promising changes for consumers and businesses all over the world. It has led to the growth of nuanced search engines, website personalization features and many other promising benefits. This is one of the reasons that the market for big data is currently worth over $271 billion. However, […]
Stop, visualize and listen, our Looker hackathon is back with a brand new edition.
This December 5th, we are kicking off Looker Hackathon 2023, a virtual two day event for developers, innovators and data scientists to collaborate, build, learn and inspire each other, as you design new innovative applications, tools and data experiences on Looker and Looker Studio. The best and most exciting entry will be awarded the title of “Best Hack”.
At the event, you can expect to:
Meet and team up with your developer community and Google Cloud InnovatorsGain hands-on experience and learn about the latest Looker capabilitiesMeet and talk to Google Cloud engineers and staff, and possibly play some trivia tooTurn your idea into reality and have fun along the way
In this post, we’ll be showing how to manage BigQuery costs with budgets and custom quota – keep reading, or jump directly into tutorials for creating budgetsor setting custom quota!
Early in your journey to build or modernize on the cloud, you’ll learn that cloud services are often pay-as-you-go; and running analytics on BigQuery is no exception. While BigQuery does offer several pricing models, the default on-demand pricing model (the one most new users start with) charges for queries by the number of bytes processed.
This pricing structure has some major benefits: you only pay for the services you use, and avoid termination charges and up-front fees. However, the elastic nature of BigQuery means that it’s important to understand and take advantage of the tools available to help you stay on top of your spending and prevent surprises on your cloud bill.
Budgets and custom quotas are two powerful tools provided by Google Cloud that you can (and I’d argue you should!) use to manage BigQuery costs. So let’s dive into how each of these work and help you get started.
As your cloud usage grows and changes over time, your costs will change too. Budgets allow you to monitor all of your Google Cloud charges in one place, including BigQuery. They can track both your actual and forecasted spend, and alert you when you’re exceeding your defined budgets, which helps you to both avoid unexpected expenses and plan for growth.
Budgets can be configured for a Cloud Billing account (that can include more than one project linked to the billing account), or for individual projects. To manage budgets for a Cloud Billing account, you need the Billing Account Administrator or Billing Account Costs Manager role on the Cloud Billing account. To manage budgets for an individual project, you need the Project Owner or Project Editor role on the project.
Budgets can be created within the Billing Console in the Budgets & alerts page. At a high-level, you will define the following areas when creating a budget:
ScopeBudgets are a tool that span Google Cloud, and you can scope budgets to apply to the spend in an entire Cloud Billing account, or narrow the scope by filtering on projects, services, or labels. To create a budget focused on BigQuery spend, you can scope it to the BigQuery service. Note that this scope includes both BigQuery on-demand query usage and BigQuery storage.Budget amountThe budget amount can be a total that you specify, or you can base the budget amount on the previous calendar period’s spend.ActionsAfter setting the budget amount, you can set multiple threshold rules to trigger email notifications. Each threshold can be customized as a percentage of the total budget amount, and can be based on actual costs (as you’re charged for using services) or forecasted costs (as Google Cloud forecasts that you’re going to be spending a certain amount). Using forecasted costs can alert you before you actually spend and stay ahead of any issues!
Screen capture of creating a budget in the Billing Console
You have several options for creating a budget: you can use the Cloud Console (as shown in the above screenshot), the gcloud command-line tool, or the Cloud Billing API.
Once your budget is in place, email alert notifications will be sent when you hit (or are forecasted to hit) your budget!
Monitoring spending with budgets
In addition to any of the alert actions you choose when setting up a budget, you can monitor your spending against your budgets using the Cloud Billing dashboard or the Budget API. You can see how much of your budget has been consumed, which resources are contributing the most to your costs, and where you might be able to optimize your usage.
Sample screen capture of of budget report
Tips for using budgets
Budget email alerts are sent to users who have Billing Account Administrator and Billing Account User roles by default. While this is a great first step, you can also configure budgets to notify users through Cloud Monitoring, or you can have regular budget updates sent to Pub/Sub for full customization over how you want to respond to budget updates such as sending messages to Slack.With the Cloud Billing Budget API, you can view, create, and manage budgets programmatically at scale. This is especially useful if you’re creating a large number of budgets across your organization.While this blog post focuses on using budgets for BigQuery usage, budgets are a tool that can be used across Google Cloud, so you can use this tool to manage Cloud spend as a whole or target budgets for particular services or projects.
Custom quotas are a powerful feature that allow you to set hard limits on specific resource usage. In the case of BigQuery, quotas allow you to control query usage (number of bytes processed) at a project- or user-level. Project-level custom quotas limit the aggregate usage of all users in that project, while user-level custom quotas are separately applied to each user or service account within a project.
Custom quotas are relevant when you are using BigQuery’s on-demand pricing model, which charges for the number of bytes processed by each query. When you are using the capacity pricing model, you are charged for compute capacity (measured in slots) used to run queries, so limiting the number of bytes processed is less useful.
By setting custom quotas, you can control the amount of query usage by different teams, applications, or users within your organization, preventing unexpected spikes in usage and costs.
Note that quotas are set within a project, and you must have the Owner, Editor, or Quota Administrator role on that project in order to set quotas.
Custom quota can be set by heading to the IAM & Admin page of the Cloud console, and then choosing Quotas. This page contains hundreds of various quota, so use the filter functionality with Metric: bigquery.googleapis.com/quota/query/usage to help you zero in on the two quota options for BigQuery query usage:
Query usage per day <- this is the project-level quotaQuery usage per user per day <- this is the user-level quota
Screen capture of the BigQuery usage quotas in the Cloud Console
After selecting one or both quotas, click toEdit Quotas. Here you will define your daily limits for each quota in tebibytes (TiB), so be sure to make any necessary conversions.
Screen capture of setting new custom quota amounts in the Cloud Console
To set custom quotas for BigQuery, you can use the Cloud Console (as described above), the gcloud command-line tool, or the Service Usage API. You can also monitor your quotas and usage within the Quotas page or using the Service Usage API.
Screen capture of monitoring quota usage within the Quota page of the Cloud Console
Tips for using custom quotas
You may use either project-level or user-level of these quota options, or both in tandem. Used in tandem, usage will count against both quotas, and adhere to the stricter of the two limits.Once quota is exceeded, the user will receive a usageQuotaExceeded error and the query will not execute. Quotas are proactive, meaning, for example, you can’t run an 11 TB query if you have a 10 TB quota.Daily quotas reset at midnight Pacific Time.Separate from setting a custom quota, you can also set a maximum bytes billed for a specific query (say, one you run on a schedule) to limit query costs.
Differences between budgets and custom quotas
Now that you’ve learned more about budgets and custom quota, let’s look at them side-by-side and note some of their differences:
Their scope: Budgets are tied to a billing account (which can be shared across projects), while quotas are set for individual projects.What they track: Budgets are set for a specific cost amount, while quotas are set for specific resource or service usage.How they are enforced: Budgets track your costs and alert you when you’re exceeding your budget, while quotas enforce a hard limit on the amount of resources that can be used in a project and will return an error when a user/service tries to exceed the limit.
Tracking and analyzing your BigQuery costs will make you feel more at ease when running queries within the on-demand pricing model, and it can help you make informed decisions, optimize your costs, and maximize the value of your cloud spend.
As you scale your BigQuery environment, you may want to move your workloads to the BigQuery editions pricing model, which charges by the amount of capacity allocated to your workload, measured in slots (a unit of measure for BigQuery compute power) rather than per each query. This model also can provide discounted capacity for long term commitments. One of BigQuery’s unique features is the ability to combine the two different pricing models (on-demand and capacity) to optimize your costs.
Smart business owners are realizing that it is virtually impossible to succeed in 2023 without investing in AI or big data technology. A survey from surveyOpens shows that 91.7% of executives at top companies are increasing investments in big data. The explosion of content created with AI tools like ChatGPT and MidJourney has made it […]
Thirty years ago, businesses were starting to recognize that data was the future. However, they never imagined that big data technology would have the impact that we have witnessed in recent years. More companies are using big data to drive business decisions than ever before. However, many companies still neglect to have formal data strategies […]
Google Cloud BigQuery is a key service that helps you create a Data Warehouse that provides the scale and ease of querying large data sets. Let’s say that you have standardized on using BigQuery and have set up data pipelines to maintain the datasets. The next question would be to determine how best to make this data available to applications. APIs are often the way forward for this and what I was looking to experiment with is to consider a service that helps me create an API around my data sources (BigQuery in this case) and do it easily.
In this blog post, we shall see how to use Hasura, an open-source solution, that helped me create an API around my BigQuery dataset.
The reason to go with Hasura is the ease with which you can expose your domain data via an API. Hasura supports a variety of data sources including BigQuery, Google Cloud SQL and AlloyDB. You control the model, relationships, validation and authorization logic through metadata configuration. Hasura consumes this metadata to generate your GraphQL and REST APIs. It’s a low-code data to API experience, without compromising any of the flexibility, performance or security you need in your data API.
While Hasura is open-source, it also has fully managed offerings on various cloud providers including Google Cloud.
You need to have a Google Cloud Project. Do note down the Project Id of the project since we will need to use that later in the configuration in Hasura.
BigQuery dataset – Google Trends dataset
Our final goal is to have a GraphQL API around our BigQuery dataset. So what we need to have in place is a BigQuery dataset. I have chosen the Google Trends database that is made available in the Public Datasets program in BigQuery. This is an interesting dataset that makes available (both US and Internationally), the top 25 overall or top 25 rising queries from Google Trends from the past 30 days.
I have created a sample dataset in BigQuery in my Google Cloud project named ‘google_trends’ and have copied the dataset and the tables from the bigquery-public-data dataset. The tables are shown below:
Google Trends dataset
What we are interested in is the international_top_terms that helps me to see the trends across countries that are supported in the Google Trends dataset that has been made available.
The schema for the international_top_terms dataset schema is shown below:
International Top Terms table schema
A sample BigQuery query (Search terms from the previous day in India) that we eventually would like to expose over the GraphQL API is shown below:
If I run this query in the BigQuery workspace, I get the following result (screenshot below):
International Trends sample data
Great ! This is all we need for now from a BigQuery point of view. Remember you are free to use your own dataset if you’d like.
We will come to the Hasura configuration in a while, but before that, do note that the integration between Hasura and Google Cloud will require that we generate a service account with the right permissions. We will provide that service account to Hasura, so that it can invoke the correct operations on BigQuery to configure and retrieve the results.
Service account creation in Google Cloud is straightforward and you can do that from the Google Cloud Console → IAM and Admin menu option.
Create a Service account with a name and description.
Service Account Creation
In the permissions for the service account, ensure that you have the following Google Cloud permissions, specific to BigQuery:
Service Account Permissions
Once the service account is created, you will need to export this account via its credentials (JSON) file. Keep that file safely as we will need that in the next section.
This completes the Google Cloud part of the configuration.
You need to sign up with Hasura as a first step. Once you have signed it, click on New Project and then choose the Free Tier and Google Cloud to host the Hasura API Layer, as shown below. You will also need to select the Google Cloud region to host the Hasura service in and then click on the Create Project button.
Hasura Project Creation
Setting up the data connection
Once the project is created, you need to establish the connectivity between Hasura and Google Cloud and specifically in this case, set up the Data Source that Hasura needs to configure and talk to.
For this, visit the Data section as shown below. This will show that currently there are no databases configured i.e. Databases(0). Click on the Connect Database button.
Hasura Data Source creation
From the list of options available, select BigQuery and then click on Connect Existing Database.
Hasura BigQuery Data Source creation
This will bring up a configuration screen (not shown here), where you will need to entire the service account, Google Project Id and BigQuery Dataset name.
Create an environment variable in the Hasura Settings that contains your Service Account Key (JSON file contents). A sample screenshot from my Hasura Project Settings is shown below. Note that the SERVICE_ACCOUNT_KEY variable below has the value of the JSON Key contents.
Hasura Project Settings
Coming back to the Database Connection configuration, you will see a screen as shown below. Fill out the Project Id and Dataset value accordingly.
Hasura BigQuery Datasource configuration
Once the data connection is successfully set up, you can now mark which tables need to be tracked. Go to the Datasource settings and you will see that Hasura queried the metadata to find the tables in the dataset. You will see the tables listed as shown below:
Hasura BigQuery Datasource tables
We select the table that we are interested in tracking i.e. select it and then click on the Track button.
Hasura BigQuery Datasource table tracking
This will mark the table as tracked and we will now be able to go to the GraphQL Test UI to test out the queries.
The API tab provides us with a nice Explorer UI where you can build out the GraphQL query in an intuitive manner.
One of the trickiest things for businesses to navigate in the age of social media is the customer complaint. On one hand, companies (especially startups) should take customer concerns into account when considering improvements or design changes to a product. Let’s take a look at why it matters and possible motivations for malicious complaints, and […]
Data analytics technology has helped change the future of modern business. The ecommerce sector is among those most affected by advances in analytics. We have previously pointed out that a number of ecommerce sites are using data analytics to optimize their business models. Therefore, it should be no surprise that the market for data analytics […]