Built with BigQuery: Zeotap uses Google BigQuery to build highly customized audiences at scale

Built with BigQuery: Zeotap uses Google BigQuery to build highly customized audiences at scale

Zeotap’s mission is to help brands monetise customer data in a privacy-first Europe. Today, Zeotap owns three data solutions. Zeotap CDP is the next-generation Customer Data Platform that empowers brands to collect, unify, segment and activate customer data. Zeotap CDP puts privacy and security first while empowering marketers to unlock and derive business value in their customer data with a powerful and marketer-friendly user interface. Zeotap Data delivers quality targeting at scale by enabling the activation of 2,500 tried-and-tested Champion Segments across 100+ programmatic advertising and social platforms. ID+ is a universal marketing ID initiative that paves the way for addressability in the cookieless future. Zeotap’s CDP is a SaaS application that is hosted on Google Cloud. A client can use Zeotap CDP SaaS product suite to onboard its first-party data, use the provided tools to create audiences and activate them on marketing channels and advertising platforms. 

Zeotap partnered with Google Cloud to provide a customer data platform that is differentiated in the market with a focus on privacy, security and compliance. Zeotap CDP, built with BigQuery, is empowered with tools and capabilities to democratize AI/ML models to predict customer behavior and personalize the customer experience to enable the next generation digital marketing experts to drive higher conversion rates, return on advertising spend and reduce customer acquisition cost.

The capability to create actionable audiences that are highly customized the first time, improve speed to market to capture demand and drive customer loyalty are differentiating factors. However, as the audiences get more specific it becomes more difficult to estimate and tune the size of the audience segment. Being able to identify the right customer attributes is critical for building audiences at scale. 

Consider the following example, a fast fashion retailer has a broken size run and is at risk of taking a large markdown because of an excess of XXS and XS sizes. What if you are able to instantly build an audience of customers who have a high propensity for this brand or style, tend to purchase at full price, and match the size profile for the remaining inventory to drive full price sales and avoid costly markdowns. 

Most CDPs provide size information only after a segment is created and its data processed. If the segment sizes are not relevant and quantifiable, the target audiences list has to be recreated impacting speed to market and capturing customer demand. Estimating the segment size and tuning the size of the audience segment is often referred to as the segment size estimation problem. The segment size needs to be estimated and segments should be available for exploration and processing with a sub-second latency to provide a near real-time user experience.

Traditional approaches to solve this problem relies on pre-aggregation database models which involve sophisticated data ingestion and failure management, thus wasting a lot of compute hours and requiring extensive pipeline orchestration. There are a number of disadvantages with this traditional approach:

Higher cost and maintenance as multiple Extract, Transform and Load (ETL) processes are involved

Higher failure rate and re-processing required from scratch in case of failures

Takes hours/days to ingest data at large-scale

Zeotap CDP relies on the power of Google Cloud Platform to tackle this segment size estimation problem using BigQuery for processing and estimation, the BI Engine to provide sub-second latency required for online predictions and Vertex AI ecosystem with BigQuery ML to provide a no-code AI segmentation and lookalike audiences. Zeotap CDP’s strength is to offer this estimation at the beginning of segment creation before any kind of data processing using pre-calculated metrics. Any correction in segment parameters can be made near real time, saving a lot of user’s time.

The data cloud, with BigQuery at its core, functions as a data lake at scale and the analytical compute engine that calculates the pre-aggregated metrics. The BI engine is used as a caching and acceleration layer to make these metrics available with near sub-second latency. Compared to the traditional approach this setup does not require a heavy data processing framework like Spark/Hadoop or sophisticated pipeline management. Microservices deployed on the GKE platform are used for orchestration using BigQuery SQL ETL capabilities. This does not require a separate data ingestion in the caching layer as the BI engine works seamlessly in tandem with BigQuery and is enabled using a single setting.

The below diagram depicts how Zeotap manages the first party data and solves for the segment size estimation problem.

The API layer, powered by Apigee provides secure client access to Zeotap’s API infrastructure to read and ingest first party data in real-time. The UI Services Layer, backed by GKE and Firebase provides access to Zeotap’s platform front-ending audience segmentation, real-time workflow orchestration / management, analytics & dashboards. The Stream & Batch processing manages the core data ingestion using PubSub, Dataflow and Cloud Run. Google BigQuery, Cloud SQL, BigTable and Cloud Storage make up all of the Storage layer. 

The Destination Platform allows clients to activate its data across various marketing channels, data management and ad management platforms like Google DDP, TapTap, TheTradeDesk etc (plus more than 150+ such integrations). Google BigQuery is at the heart of the Audience Platform to allow clients to slice and dice its first party assets, enhance it with Zeotap’s universal ID graph or its third-party data assets and push to downstream destinations for activation and funnel analysis. The Predictive Analytics layer allows clients to create and activate machine-learned (e.g. CLV and RFM modeling) based segments with just a few clicks. Cloud IAM, Cloud Operations suite and Collaborations tools deliver the cross-sectional needs of security, logging and collaboration. 

For segment/audience size estimation, the core data that is client’s first party data resides in its own GCP project. First step here is to identify low cardinality columns using BigQuery’s “approx count distinct” capabilities. At this time, Zeotap supports a sub-second estimation on only low cardinality ( represents the number of unique values) dimensions, like Gender with Male/Female/M/N values and Age with limited age buckets. A sample query looks like this,

Once pivoted by columns, the results look like this

Now the cardinality numbers are available for all columns, they are divided into two groups, one below the threshold (low cardinality) and one above the threshold (high cardinality). Next step is to run a reverse ETL query to create aggregates on low cardinality dimensions and corresponding HLL sketches for user count (measure) dimensions.

A sample query looks like this

The resultant data is loaded into a separate estimator Google Cloud project for further processing and analysis. This project contains a metadata store with datasets required for processing client requests and is front ended with BI engine to provide acceleration to estimation queries. With this process, the segment size is calculated using pre-aggregated metrics without processing the entire first party dataset and enables the end user to create and experiment with a number of segments without incurring any delays as in the traditional approach.

This approach obsoletes ETL steps required to realize this use-case which drives a benefit of over 90% time reduction and 66% cost reduction for the segment size estimation. Also, enabling BI engine on top of BigQuery boosts query speeds by more than 60%, optimizes resource utilization and improves query response as compared to native BigQuery queries. The ability to experiment with audience segmentation is one of the many capabilities that Zeotap CDP provides their customers. The cookieless future will drive experimentation with concepts like topics for IBA (Interest-based advertising) and developing models that support a wide range of possibilities in predicting customer behavior.

There is an ever increasing demand for shared data, where customers are requesting access to the finished data in the form of datasets to share both within and across the organization through external channels. These datasets unlock more opportunities where the curated data can be used as-is or coalesced with other datasets to create business centric insights or fuel innovation by enabling ecosystem or develop visualizations. To meet this need, Zeotap is leveraging Google Cloud Analytics Hub to create a rich data ecosystem of analytics-ready datasets. 

Analytics Hub is powered by Google BigQuery, which provides a self-service approach to securely share data by publishing and subscribing to trusted data sets as listings in Private and Public Exchanges. It allows Zeotap to share the data in place having full control while end customers have access to fresh data without the need to move data at large scale. 

Click here to learn more about Zeotap’s CDP capabilities or to request a demo.

The Built with BigQuery advantage for ISVs 

Google is helping tech companies like Zeotap build innovative applications on Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs through the Built with BigQuery initiative, launched in April as part of the Google Data Cloud Summit. Participating companies can: 

Get started fast with a Google-funded, pre-configured sandbox. 

Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence who can provide insight into key use cases, architectural patterns, and best practices. 

Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

BigQuery gives ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge partner ecosystem and support for multi-cloud, open source tools and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in. 

Click here to learn more about Built with BigQuery.

We thank the Google Cloud and Zeotap team members who co-authored the blog:
Zeotap: Shubham Patil, Engineering Manager; Google: Bala Desikan, Principal Architect and Sujit Khasnis, Cloud Partner Engineering

Related Article

Built with BigQuery: How True Fit’s data journey unlocks partner growth

True Fit, a data-driven personalization platform built on Google Data Cloud to provide fit personalization for retailers by sharing curat…

Read Article

Source : Data Analytics Read More

BigQuery Geospatial Functions – ST_IsClosed and ST_IsRing

BigQuery Geospatial Functions – ST_IsClosed and ST_IsRing

Geospatial data analytics lets you use location data (latitude and longitude) to get business insights. It’s used for a wide variety of applications in industry, such as package delivery logistics services, ride-sharing services, autonomous control of vehicles, real estate analytics, and weather mapping. 

BigQuery, Google Cloud’s large-scale data warehouse, provides support for analyzing large amounts of geospatial data. This blog post discusses two geography functions we’ve recently added in order to expand the capabilities of geospatial analysis in BigQuery: ST_IsClosed and ST_IsRing.

BigQuery geospatial functions

In BigQuery, you can use the GEOGRAPHY data type to represent geospatial objects like points, lines, and polygons on the Earth’s surface. In BigQuery, geographies are based on the Google S2 Library, which uses Hilbert space-filling curves to perform spatial indexing to make the queries run efficiently. BigQuery comes with a set of geography functions that let you process spatial data using standard ANSI-compliant SQL. (If you’re new to using BigQuery geospatial analytics, start with Get started with geospatial analytics, a tutorial that uses BigQuery to analyze and visualize the popular NYC Bikes Trip dataset.) 

The new ST_IsClosed and ST_IsRing functions are boolean accessor functions that help determine whether a geographical object (a point, a line, a polygon, or a collection of these objects) is closed or is a ring. Both of these functions accept a GEOGRAPHY column as input and return a boolean value. 

The following diagram provides a visual summary of the types of geometric objects.

For more information about these geometric objects, see Well-known text representation of geometry in Wikipedia.

Is the object closed? (ST_IsClosed)

The ST_IsClosed function examines a GEOGRAPHY object and determines whether each of the elements of the object has an empty boundary. The boundary for each element is defined formally in the ST_Boundary function. The following rules are used to determine whether a GEOGRAPHY object is closed:

A point is always closed.

A linestring is closed if the start point and end point of the linestring are the same.

A polygon is closed only if it’s a full polygon.

A collection is closed if every element in the collection is closed. 

An empty GEOGRAPHY object is not closed. 

Is the object a ring? (ST_IsRing)

The other new BigQuery geography function is ST_IsRing. This function determines whether a GEOGRAPHY object is a linestring and whether the linestring is both closed and simple. A linestring is considered closed as defined by the ST_IsClosed function. The linestring is considered simple if it doesn’t pass through the same point twice, with one exception: if the start point and end point are the same, the linestring forms a ring. In that case, the linestring is considered simple.

Seeing the new functions in action

The following query shows you what the ST_IsClosed and ST_IsRing function return for a variety of geometric objects. The query creates a series of ad-hoc geography objects and uses the UNION ALL statement to create a set of inputs. The query then calls the ST_IsClosed and ST_IsRing functions to determine whether each of the inputs are closed or are rings. You can run this query in the BigQuery SQL workspace page in the Google Cloud console.

code_block[StructValue([(u’code’, u”WITH example AS(rn SELECT ST_GeogFromText(‘POINT(1 2)’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘LINESTRING(2 2, 4 2, 4 4, 2 4, 2 2)’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘LINESTRING(1 2, 4 2, 4 4)’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘POLYGON((0 0, 2 2, 4 2, 4 4, 0 0))’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘MULTIPOINT(5 0, 8 8, 9 6)’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘MULTILINESTRING((0 0, 2 0, 2 2, 0 0), (4 4, 7 4, 7 7, 4 4))’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘GEOMETRYCOLLECTION EMPTY’) AS geographyrn UNION ALLrn SELECT ST_GeogFromText(‘GEOMETRYCOLLECTION(POINT(1 2), LINESTRING(2 2, 4 2, 4 4, 2 4, 2 2))’) AS geography)rnSELECTrn geography,rn ST_IsClosed(geography) AS is_closed, rn ST_IsRing(geography) AS is_ring rnFROM example;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e1d11501f50>)])]

The console shows the following results. You can see in the is_closed and is_ring columns what each function returns for the various input geography objects.

The new functions with real-world geography objects

In this section, we show queries using linestring objects that represent line segments that connect some of the cities in Europe. We show the various geography objects on maps and then discuss the results that you get when you call ST_IsClosed and ST_IsRing for these geography objects. 

You can run the queries by using the BigQuery Geo Viz tool. The maps are the output of the tool. In the tool you can click the Show results button to see the values that the functions return for the query.

Start point and end point are the same, no intersection

In the first example, the query creates a linestring object that has three segments. The segments are defined by using four sets of coordinates: the longitude and latitude for London, Paris, Amsterdam, and then London again, as shown in the following map created by the Geo Viz tool:

The query looks like the following:

code_block[StructValue([(u’code’, u”WITH example AS (rnSELECT ST_GeogFromText(‘LINESTRING(-0.2420221 51.5287714, 2.2768243 48.8589465, 4.763537 52.3547921, -0.2420221 51.5287714)’) AS geography)rnSELECT rn geography, rn ST_IsClosed(geography) AS is_closed,rn ST_IsRing(geography) AS is_ringrnFROM example;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e1d11501ed0>)])]

In the example table that’s created by the query, the columns with the function values show the following:

ST_IsClosed returns true. The start point and end point of the linestring are the same.

ST_IsRing returns true. The geography is closed, and it’s also simple because there are no self-intersections.

Start point and end point are different, no intersection

Another scenario is when the start and end points are different. For example, imagine two segments that connect London to Paris and then Paris to Amsterdam, as in this map:

The following query represents this set of coordinates:

code_block[StructValue([(u’code’, u”WITH example AS (rnSELECT ST_GeogFromText(‘LINESTRING(-0.2420221 51.5287714, 2.2768243 48.8589465, 4.763537 52.3547921)’) AS geography)rnSELECT rn geography, rn ST_IsClosed(geography) AS is_closed,rn ST_IsRing(geography) AS is_ringrnFROM example;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e1d112f9610>)])]

This time, the ST_IsClosed and ST_IsRing functions return the following values:

ST_IsClosed returns false. The start point and end point of the linestring are different.

ST_IsRing returns false. The linestring is not closed. It’s simple because there are no self-intersections, but ST_IsRing returns true only when the geometry is both closed and simple.

Start point and end point are the same, with intersection

The third example is a query that creates a more complex geography. In the linestring, the start point and end point are the same. However, unlike the earlier example, the line segments of the linestring intersect. A map of the segments shows connections that go from London to Zürich, then to Paris, then to Amsterdam, and finally back to London:

In the following query, the linestring object has five sets of coordinates that define the four segments:

code_block[StructValue([(u’code’, u”WITH example AS (rnSELECT ST_GeogFromText(‘LINESTRING(-0.2420221 51.5287714, 8.393389 47.3774686, 2.2768243 48.8589465, 4.763537 52.3547921, -0.2420221 51.5287714)’) AS geography)rnSELECT rn geography,rn ST_IsClosed(geography) AS is_closed,rn ST_IsRing(geography) as is_ringrnFROM example;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e1d112f97d0>)])]

In the query, ST_IsClosed and ST_IsRing return the following values:

ST_IsClosed returns true. The start point and end point are the same, and the linestring is closed despite the self-intersection.

ST_IsRing returns false. The linestring is closed, but it’s not simple because of the intersection.

Start point and end point are different, with intersection

In the last example, the query creates a linestring that has three segments that connect four points: London, Zürich, Paris, and Amsterdam. On a map, the segments look like the following:

The query is as follows:

code_block[StructValue([(u’code’, u”WITH example AS (rnSELECT ST_GeogFromText(‘LINESTRING(-0.2420221 51.5287714, 8.393389 47.3774686, 2.2768243 48.8589465, 4.763537 52.3547921)’) AS geography)rnSELECT rn geography, rn ST_IsClosed(geography) AS is_closed,rn ST_IsRing(geography) AS is_ringrnFROM example;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e1d108378d0>)])]

The new functions return the following values:

ST_IsClosed returns false. The start point and end point are not the same.  

ST_IsRing returns false. The linestring is not closed and it’s not simple.

Try it yourself

Now that you’ve got an idea of what you can do with the new ST_IsClosed and ST_IsRing functions, you can explore more on your own. For details about the individual functions, read the ST_IsClosed and ST_IsRing entries in the BigQuery documentation. To learn more about the rest of the geography functions available in BigQuery Geospatial, take a look at the BigQuery geography functions page.

Thanks to Chad Jennings, Eric Engle and Jing Jing Long for their valuable support to add more functions to BigQuery Geospatial.  Thank you Mike Pope for helping review this article.

Source : Data Analytics Read More

Secure data exchanges with Analytics Hub, now generally available

Secure data exchanges with Analytics Hub, now generally available

In today’s world, organizations view data sharing to be a critical component of their overall data strategy. Businesses are striving to unlock new insights and make more informed decisions by sharing and consuming data from partners, customers, and other sources. There are many organizations also looking to generate new revenue streams by monetizing their data assets. However, existing technologies used to exchange data pose many challenges for customers. Traditional data sharing techniques such as FTP, email, and APIs are expensive to maintain and often result in multiple copies of stale data, especially when sharing in scale. Organizations are looking for ways to make data sharing more reliable and consistent.

We recently announced the general availability of Analytics Hub. This fully-managed service enables organizations to securely exchange data and analytics assets within or across organizational boundaries. Backed by the unique architecture of BigQuery, customers can now share real-time data at scale without moving the data, leading to tremendous cost savings for their data management. As part of this launch, we have added functionality for both data providers and subscribers to realize the full potential of shared data, including:

Regional support: Analytics Hub service is now available in all the supported regions in BigQuery.

Subscription Management: Data providers can now easily view and manage subscriptions for all their shared datasets in a single view.

Governance & Access: Administrators can now monitor the usage of Analytics Hub through Audit Logging and Information Schema, while enforcing VPC Service Controls to securely share data.

Search & Discovery: We have revamped the search experience with filter facets to help subscribers quickly find relevant listings

Data Ecosystem: We added hundreds of new public and commercial listings in Analytics Hub across industries such as finance, geospatial, climate, retail, and more to help organizations consume data from third-party sources. We have also added first-party data from Google including Google Trends, Google’s Diversity Annual Report, Google Cloud Release Notes, Carbon-Free Energy Data for GCP Data Centers, COVID-19 Open Data: Vaccination Search Insights.

Publish-and-Subscribe model to securely share data

Analytics Hub uses a publish-and-subscribe model to distribute data at scale. As a data provider, you can create secure data exchanges and publish listings that contain the datasets you want to share. Exchanges enable you to control the users or groups that can view or subscribe to the listings. By default, exchanges are private in Analytics Hub. However, if you have public or commercial datasets that you want to make available for all Google Cloud customers, you can also request to make an exchange public. Organizations can create hundreds of exchanges to meet their data sharing needs.

Analytics Hub also provides a seamless experience to browse and search listings across all exchanges. As a data subscriber, you can easily find the dataset of interest (1) and request access or subscribe to listings that you have access to (2). By subscribing to a listing, Analytics Hub creates a read-only linked dataset within your project that you can query (3). A linked dataset is not a copy of the data; it is just a symbolic link to the shared dataset that stays in sync with any changes made to the source.

Data sharing use cases for Analytics Hub

Over a one-week period in September 2022, BigQuery saw more than 6,000 organizations sharing over 275 petabytes of data across organizational boundaries. Many of these customers also used Analytics Hub in preview to share data at scale in various scenarios. Some of these use cases include:

Internal data sharing – Customers can create exchanges for various business functions or geographics to share data internally within an organization. For example, an organization can set up a marketing exchange to publish all the latest channel performance, customer profiles, product performance, etc. 

Collaboration across organizations – When sharing data across organizational boundaries, customers can create private exchanges with each partner or business (B2B). A common example is a retailer sharing sales data with each of their suppliers.

Monetizing data assets – Data providers can also monetize their datasets and distribute the data through commercial exchanges. Today, commercial providers use an offline entitlement and billing process and provision access to the data using Analytics Hub.

Enriching insights with third-party data – Customers can discover new insights or gain a competitive advantage by leveraging external or third-party data. Analytics Hub and its rich data ecosystem provide easy access to analytics-ready public and commercial datasets. An example of a popular dataset on the platform has been Google Trends.

Here is what some of our customers and partners had to say:

“Analytics Hub allows data scientists to discover and subscribe to new data assets in the cloud with ease,” said Kimberly Bloomston, SVP of Product at LiveRamp. “With the addition of this offering, LiveRamp now fully supports GCP with a complete suite of native solutions that unlock greater accuracy, partner connectivity and audience activation for marketing and advertising. This expanded partnership provides a must-have analytic infrastructure that excels at unlocking more value from data while respecting strict global privacy regulations.”

“Securely sharing data with partners and clients is always a challenge. The questions of ownership, billing and security are not straightforward for any organization. Analytics Hub, with its publish/subscribe model, provides answers to these questions baked right into the platform.” said Jono MacDougall , Principal Software Engineer at Ravelin.

“One of our key driving factors for BigQuery adoption is availability of Analytics Hub (AH). In a prior model sharing and receiving data as flat files was laborious, inefficient and expensive. We changed that significantly with an early adoption of Analytics Hub, introducing its capabilities to our customers and partners who are also primarily on GCP, enabling multi-way data exchange between these entities and are on our way to monetizing the valuable insights we learn along the way.” said Raj Chandrasekaran, CTO at True Fit.

Next steps

Get started with Analytics Hub today by using this guide, starting a free trial with BigQuery, or contacting the Google Cloud sales team. Stay tuned for updates to our product with features such as usage metrics for providers, approval workflows, privacy-safe queries though data clean rooms, commercialization workflows, and much more.

Source : Data Analytics Read More

BigQuery’s performance powers AutoTrader UK’s real-time analytics

BigQuery’s performance powers AutoTrader UK’s real-time analytics

Editor’s note: We’re hearing today from Auto Trader UK, the UK and Ireland’s largest online automotive marketplace, about how BigQuery’s robust performance has become the data engine powering real-time inventory and pricing information across the entire organization. 

Auto Trader UK has spent nearly 40 years perfecting our craft of connecting buyers and sellers of new and used vehicles. We host the largest pool of sellers, listing more than 430,000 cars every day and attract an average of over 63 million cross platform visits each month. For the more than 13,000 retailers who advertise their cars on our platform, it’s important for them (and their customers) to be able to quickly see the most accurate, up-to-date information about what cars are available and their pricing. 

BigQuery is the engine feeding our data infrastructure 

Like many organizations, we started developing our data analytics environment with an on-premise solution and then migrated to a cloud-based data platform, which we used to build a data lake. But as the volume and variety of data we collected continued to increase, we started to run into challenges that slowed us down. 

We had built a fairly complex pipeline to manage our data ingestion, which relied on Apache Spark to ingest data from a variety of data sources from our online traffic and channels. However, ingesting data from multiple data sources in a consistent, fast, and reliable way is never a straightforward task. 

Our initial interest in BigQuery came after we discovered it integrated with a more robust event management tool for handling data updates. We had also started using Looker for analytics, which already connected to BigQuery and worked well together. As a result, it made sense to replace many parts of our existing cloud-based platform with Google Cloud Storage and BigQuery.

Originally, we had only anticipated using BigQuery for the final stage of our data pipeline, but we quickly discovered that many of our data management jobs could take place entirely within a BigQuery environment. For example, we use the command-line tool DBT, which offers support for BigQuery, to transform our data. It’s much easier for our developers and analysts to work with than Apache Spark since they can work directly in SQL. In addition, BigQuery allowed us to further simplify our data ingestion. Today, we mainly use Kafka Connect to sync data sources with BigQuery.

Looker + BigQuery puts the power of data in the hands of everyone

When our data was in the previous data lake architecture, it wasn’t easy to consume. The complexity of managing the data pipeline and running Spark jobs made it nearly impossible to expose it to users effectively. With BigQuery, ingesting data is not only easier, we also have multiple ways we can consume it through easy-to-use languages and interfaces. Ultimately, this makes our data more useful to a much wider audience.

Now that our BigQuery environment is in place, our analysts can query the warehouse directly using the SQL interface. In addition, Looker provides an even easier way for business users to interact with our data. Today, we have over 500 active users on Looker—more than half the company. Data modeled in BigQuery gets pushed out to our customer-facing applications, so that the dealers can log into a tool and manage stock or see how their inventory is performing. 

Striking a balance between optimization and experimentation

Performance in BigQuery can be almost too robust: It will power through even very unoptimized queries. When we were starting out, we had a number of dashboards running very complex queries against data that was not well-modeled for the purpose, meaning every tile was demanding a lot of resources. Over time, we have learned to model data more appropriately before making it available to end-user analytics. With Looker, we use aggregate awareness, which allows users to run common query patterns across large data sets that have been pre-aggregated. The result is that the number of interactively run queries  are relatively small. 

The overall system comes together to create a very effective analytics environment — we have the flexibility and freedom to experiment with new queries and get them out to end users even before we fully understand the best way to model. For more established use cases, we can continue optimizing to save our resources for the new innovations. BigQuery’s slot reservation system also protects us from unanticipated cost overruns when we are experimenting.

One of the examples where this played out was when we rolled new analytic capabilities out to our sales teams. They wanted to use analytics to drive conversations with customers in real-time to demonstrate how advertisements were performing on our platform and show the customer’s return on their investment. When we initially released those dashboards, we saw a huge jump in usage of the slot pool. However, we were able to reshape the data quickly and make it more efficient to run the needed queries by matching our optimizations to the pattern of usage we were seeing.

Enabling decentralized data management

Another change we experienced with BigQuery is that business units are increasingly empowered to manage their own data and derive value from it. Historically, we had a centralized data team doing everything from ingesting data to modeling it to building out reports. As more people adopt BigQuery across Auto Trader, distributed teams build up their own analytics and create new data products. Recent examples include stock inventory reporting, trade marketing and financial reporting. 

Going forward, we are focused on expanding BigQuery out into a self-service platform that enables analysts within the business to directly  build what they need. Our central data team will then evolve into a shared service, focused on maintaining the data infrastructure and adding abstraction layers where needed so it is easier for those teams to perform their tasks and get the answers they need.

BigQuery kicks our data efforts into overdrive

At Auto Trader UK, we initially planned for BigQuery to play a specific part in our data management solution, but it has become the center of our data ingestion and access ecosystem. The robust performance of BigQuery allows us to get prototypes out to business users rapidly, which we can then optimize once we fully understand what types of queries will be run in the real world. 

The ease of working with BigQuery through a well-established and familiar SQL interface has also enabled analysts across our entire organization to build their own dashboards and find innovative uses for our data without relying on our core team. Instead, they are free to focus on building an even richer toolset and data pipeline for the future.

Related Article

How Telus Insights is using BigQuery to deliver on the potential of real-world big data

BigQuery’s impressive performance reduces processing time from months to hours and delivers on-demand real-world insights for Telus.

Read Article

Source : Data Analytics Read More

Seer Interactive gets the best marketing results for their clients using Looker

Seer Interactive gets the best marketing results for their clients using Looker

Marketing strategies based on complex and dynamic data get results. However, it’s no small task to extract easy-to-act-on insights from increasing volumes and ever-evolving sources of data including search engines, social media platforms, third-party services, and internal systems. That’s why organizations turn to us at Seer Interactive. We provide every client with differentiating analysis and analytics, SEO, paid media, and other channels and services that are based on fresh and reliable data, not stale data or just hunches. 

More data, more ways

As digital commerce and footprints have become foundational for success over the past five years, we’ve experienced exponential growth in clientele. Keeping up with the unique analytics requirements of each client has required a fair amount of IT agility on our part. After outgrowing spreadsheets as our core BI tool, we adopted a well-known data visualization app only to find that it couldn’t scale with our growth and increasingly complex requirements either. We needed a solution that would allow us to pull hundreds of millions of data signals into one centralized system to give our clients as much strategic information as possible, while increasing our efficiency. After outlining our short- and long-term solution goals, we weighed the trade-offs of different designs. It was clear that the data replication required by our existing BI solution design was unsustainable. 

Previously, all our customer-facing teams created their own insights. More than 200 consultants were spending hours each week pulling and compiling data for our clients, and then creating their own custom reports and dashboards. As data sets grew larger and larger, our desktop solutions simply didn’t have the processing power required to keep up, and we had to invest significant money in training any new employees in these complex BI processes. Our ability to best serve our customers was being jeopardized because we were having trouble serving basic needs, let alone advanced use cases.

We selected Looker, Google Cloud’s business intelligence solution, as our BI platform. As the direct query leader, Looker gives us the best available capabilities for real-time analytics and time to value. Instead of lifting and shifting, we designed a new, consolidated data analytics foundation with Looker that uses our existing BigQuery platform, which can scale with any amount and type of data. We then identified and tackled quick-win use cases that delivered immediate business value for our team and clients.  

Meet users where they are in skills, requirements, and preferences

One of our first Looker projects involved redesigning our BI workflows. We built dashboards in Looker that automatically serve up the data our employees need, along with filters they use to customize insights and set up custom alerts. Users can now explore information on their own to answer new questions, knowing insights are reliable because they’re based on consistent data and definitions. More technical staff create ad hoc insights with governed datasets in BigQuery and use their preferred visualization tools like Looker  Studio, Power BI, and Tableau. We’ve also duplicated some of our data lakes to give teams a sandbox that they can experiment in using Looker embedded analytics. This enables them to quickly see more data and uncover new opportunities that provide value to our clients. Our product development team is also able to build and test prototypes more quickly, letting us validate hypotheses for a subsection of clients before making them available across the company. And because Looker is cloud based, all our users can analyze as much data as they want without exceeding the computing power of their laptops.

Seamless security and faster development

We leverage BigQuery’s access and permissioning capabilities. Looker can inherit data permissions directly from BigQuery and multiple third-party CRMs, so we’ve also been able to add granular governance strategies within our Looker user groups. This powerful combination ensures that data is accessed only by users who have the right permissions. And Looker’s unique “in-database” architecture means that we aren’t replicating and storing any data on local devices, which reduces both our time and costs spent on data management while bolstering our security posture. 

Better services and hundreds of thousands of dollars in savings

Time spent on repetitive tasks adds up over months and years. With Looker, we automate reports and alerts that people frequently create. Not only does this free up teams to discover insights that they previously wouldn’t have time to pinpoint, but they have fresh reports whenever they are needed. For instance, we automated the creation of multiple internal dashboards and external client analyses that utilize cross-channel data. In the past, before we had automation capabilities, we used to only generate these analyses up to four times a year. With Looker, we can scale and automate refreshed analyses instantly—and we can add alerts that flag trends as they emerge. We also use Looker dashboards and alerts to improve project management by identifying external issues such as teams who are nearing their allocated client budgets too quickly or internal retention concerns like employees who aren’t taking enough vacation time.

Using back-of-the-napkin math, let’s say every week 50 different people spend at least one hour looking up how team members are tracking their time. By building a dashboard that provides time-tracking insights at a glance, we save our collective team 2,500 hours a year. And if we assume the hourly billable rate is $200 an hour, we’re talking $500,000 in savings—just from one dashboard. Drew Meyer
Director of Product, Seer Interactive

The insights and new offerings to stay ahead of trends 

Looker enables us to deliver better experiences for our team members and clients that weren’t possible even two years ago, including faster development of analytics that improve our services and processes. For example, when off-the-shelf tools could not deliver the keyword-tracking insights and controls we required to deliver differentiating SEO strategies for clients, we created our own keyword rank tracking application using Looker embedded analytics. Our application provides deep-dive SEO data-exploration capabilities and gives teams unique flexibility in analyzing data while ensuring accurate, consistent insights. Going forward, we’ll continue adding new insights, data sources, and automations with Looker to create even better-informed marketing strategies that fuel our clients’ success.

Source : Data Analytics Read More

Migrating your Oracle and SQL Server databases to Google Cloud

Migrating your Oracle and SQL Server databases to Google Cloud

For several decades, before the rise of cloud computing upended the way we think about databases and applications, Oracle and Microsoft SQL Server databases were a mainstay of business application architectures. But today, as you map out your cloud journey, you’re probably reevaluating your technology choices in light of the cloud’s vast possibilities and current industry trends.

In the database realm, these trends include a shift to open source technologies (especially to MySQL, PostgreSQL, and their derivatives), adoption of non-relational databases, and multi-cloud and hybrid-cloud strategies, and the need to support global, always-on applications. Each application may require a different cloud journey, whether it’s a quick lift-and-shift migration, a larger application modernization effort, or a complete transformation with a cloud-first database.

Google Cloud offers a suite of managed database services that support open source, third-party, and cloud-first database engines. At Next 2022, we published five new videos specifically for Oracle and SQL Server customers looking to either lift-and-shift to the cloud or fully free themselves from licensing and other restrictions. We hope you’ll find the videos useful in thinking through your options, whether you’re leaning towards a homogeneous migration (using the same database you have today) or a heterogeneous migration (switching to a different database engine).

Let’s dive into our five new videos.

#1 Running Oracle-based applications on Google Cloud

By Jagdeep Singh & Andy Colvin

Moving to the cloud may be difficult if your business depends on applications running on an Oracle Database. Some applications may have dependencies on Oracle for reasons such as compatibility, licensing, and management. Learn about several solutions from Google Cloud, including Bare Metal Solution for Oracle, a hardware solution certified and optimized for Oracle workloads, and solutions from cloud partners such as VMware and Equinix. See how you can run legacy workloads on Oracle while adopting modern cloud technologies for newer workloads.

#2 Running SQL Server-based applications on Google Cloud

By Isabella Lubin

Microsoft SQL Server remains a popular commercial database engine. Learn how to run SQL Server reliably and securely with Cloud SQL, a fully-managed database service for running MySQL, PostgreSQL and SQL Server workloads. In fact, Cloud SQL is trusted by some of the world’s largest enterprises with more than 90% of the top 100 Google Cloud customers using Cloud SQL. We’ll explore how to select the right database instance, how to migrate your database, how to work with standard SQL Server tools, and how to monitor your database and keep it up to date.

#3 Choosing a PostgreSQL database on Google Cloud

By Mohsin Imam

PostgreSQL is an industry-leading relational database widely admired for its permissive open source licensing, rich functionality, proven track record in the enterprise, and strong community of developers and tools. Google Cloud offers three fully-managed databases for PostgreSQL users: Cloud SQL, an easy-to-use fully-managed database service for open source PostgreSQL; AlloyDB, a PostgreSQL-compatible database service for applications that require an additional level of scalability, availability, and performance; and Cloud Spanner, a cloud-first database with unlimited global scale, 99.999% availability and a PostgreSQL interface. Learn which one is right for your application, how to migrate your database to the cloud, and how to get started.

#4 How to migrate and modernize your applications with Google Cloud databases

By Sandeep Brahmarouthu

Migrating your applications and databases to the cloud isn’t always easy. While simple workloads may just require a simple database lift-and-shift, custom enterprise applications may benefit from more complete modernization and transformation efforts. Learn about the managed database services available from Google Cloud, our approach to phased modernization, the database migration framework and programs that we offer, and how we can help you get started with a risk-free assessment.

#5 Getting started with Database Migration Service

By Shachar Guz & Inna Weiner

Migrating your databases to the cloud becomes very attractive as the cost of maintaining legacy databases increases. Google Cloud can help with your journey whether it’s a simple lift-and-shift, a database modernization to a modern, open source-based alternative, or a complete application transformation. Learn how Database Migration Service simplifies your migration with a serverless, secure platform that utilizes native replication for higher fidelity and greater reliability. See how database migration can be less complex, time-consuming and risky, and how to start your migration often in less than an hour.

We can’t wait to partner with you

Whichever path you take in your cloud journey, you’ll find that Google Cloud databases are scalable, reliable, secure and open. We’re looking forward to creating a new home for your Oracle- and SQL Server-based applications.

Start your journey with a Cloud SQL or Spanner free trial, and accelerate your move to Google Cloud with the Database Migration Program.

Related Article

What’s new in Google Cloud databases: More unified. More open. More intelligent.

Google Cloud databases deliver an integrated experience, support legacy migrations, leverage AI and ML and provide developers world class…

Read Article

Source : Data Analytics Read More

Built with BigQuery: How Connected-Stories leverages Google Data Cloud and AI/ML for creating personalized Ad Experiences

Built with BigQuery: How Connected-Stories leverages Google Data Cloud and AI/ML for creating personalized Ad Experiences

Editor’s note: The post is part of a series highlighting our awesome partners, and their solutions, that are Built with BigQuery

In the field of producing engaging video content such as ads, many marketers ignore the power of data to improve their creative efforts to meet the consumers’ need for personalized messages. The demand for creative tech to efficiently personalize is real as marketers need personalized video Ads to reach their audience with the right message at the right time. Data, Insights and Technology are the main ingredients to deliver this value while ensuring security and privacy requirements are met. The Connected-Stories team partnered with Google Cloud to build a platform for Ad personalization. Google Data Cloud and BigQuery are at the forefront to assimilate data, leverage ML models, create personalized ads, and capitalize on real-time intelligence as the core features of the Connected-Stories NEXT platform.

Connected-Stories NEXT is an end-to-end creative management platform to develop, serve, and optimize interactive video and display ads that scale across any channel. The platform ingests first-party data to create custom ML models, measure numerous third-party data points to help brands develop unique customer journeys and create videos that their data signals can drive. An intelligent feedback loop passes real-time data back, enabling brands to make data-driven and actionable video ads that take the brand’s campaigns to the next level.

The core use case of the NEXT platform revolves around collecting user’s interaction data and optimizing for precision and speed to create an actionable Ad experience that is personalized for each user. The platform processes complex data points to create interactive data visualizations that allow for accurate analysis. The platform uses Vertex AI to access managed tools, workflows, and infrastructure to build, deploy, and scale ML models that have improved the accuracy to identify segments for further analysis. 

The platform ingests 200M data events with peaks and valleys of activity. These events are processed to generate dashboards that enable users to visualize metrics based on filters in real-time. These dashboards have high performance requirements in terms of a responsive user interface under constantly changing data dimensions.

Google Cloud’s serverless stack coupled with limitless data cloud infrastructure has been the core to the NEXT platform’s data-driven innovation. The growing volume of data ingested, streamed and processed were scaled uniformly across the compute, storage and analytical layers of solution. A lean development team at Connected-Stories were able to focus all-in on the solution, while the serverless stack scaled, lowered attack service in terms of security and optimized the cost footprint through pay-as-you-go features. 

BigQuery has been the backbone to support the vast amounts of data spreading over multiple geos resulting in workloads running at petabyte scale. BigQuery’s fully managed serverless architecture, real-time streaming, built-in machine learning and rich business intelligence capabilities distinguishes itself from a cloud data warehouse. It is the foundation needed to approach data and serve users in an unlimited number of ways. For an application with zero tolerance for failure, given its fully managed nature, BigQuery handles replication, recovery, data distributed optimization and management. 

The platform’s requirements include the need for low maintenance, constantly ingesting and refreshing data and smart-tuning of aggregated data. These capabilities can be implemented by BigQuery’s materialized views feature. Materialized views are useful for precomputed views that regularly cache query results for better performance. These views possess the innate feature to read only the delta change from base tables and calculate the up-to-date aggregations. Materialized views impart faster outputs and consume fewer resources while reducing the cost footprint.

Some key considerations in using Google cloud and focusing on the Serverless stack include:  quick onboarding to development, prototyping in short sprints and ease of preparing data in a rapidly changing environment. Typical considerations around low code / no code include data transformation, aggregation and reduced deployment time. These considerations are fulfilled through  using serverless capabilities within Google Cloud such as PubSub, Cloud Storage, Cloud Run, Cloud Composer, Dataflow and BigQuery as described in the Architecture diagram below. The use of each of these components and services are described below.

Input/Ingest: At a high-level, microservices hosted in Cloud Run collect and aggregate incoming Ads events. 

Enrichment: The output of this stage is a Pub-Sub message enriched with more attributes based on a pre-configured campaign. 

Store: a Cloud Dataflow streaming job to create text files in Cloud Storage buckets. 

Trigger: Cloud Composer triggers the spark jobs based on text files to process and group them to produce desired output as one record per impression, a logical group of events. 

Deploy: Cloud Build is then used to automate all deployments. 

Thus far, all Google cloud managed services work together to ingest, store and trigger the orchestration, all of which are scalable based on configurations including autoscaling capabilities. 

Visualization: A visualization tool reads data from BigQuery to compute pre-aggregations required for each dashboard. 

Data Model Evolution considerations: Though the solution served the purpose of creating pre-aggregations, as the data model evolved by adding a column or creating a new table, it led to recreating pre-aggregations and querying the data again. Alternatively, creating aggregate tables as an extra output of current ETLs seemed like a viable option. However, this would increase the cost and complexity of jobs. A similar situation to reprocess or update aggregated tables would occur as data is updated. 

Precomputed views of data that is periodically cached are critical to reach the audience with the right message at the right time. 

Performance: In order to increase the performance of the platform, we need to have regularly precomputed views of the data, cached . 

Materialized Views: Consumers of these views needed faster response times, to consume fewer resources and output only the changes in comparison to a base table. BigQuery Materialized views were used to solve this very requirement. Materialized views have been highly leveraged to optimize the design resulting in lesser maintenance and access to fresh data with high performance with a relatively low technical investment in creating and maintaining SQL code. 

Dashboards: Application dashboards pointing to the Materialized views are highly performant and provide a view into fresh data. 

Custom Reports with Vertex AI Notebooks: Vertex AI notebooks directly read data from BigQuery to produce custom reports for a subset of customers. Vertex AI has been hugely beneficial to data analysts, where an environment with pre-installed libraries simplifies the readiness to use. Vertex AI Workbench notebooks are used to share these reports within the team allowing them to work always on the cloud without having the need to download data at any time. Besides, it increases the velocity to develop and test ML models faster.

The NEXT platform has yielded benefits such as customers having the ability to create unique consumer journeys powered by AI / ML personalization triggers, using first-party data and business intelligence tools to capitalize on real-time creative intelligence, which is a dashboard to measure campaign performance for cross-functional teams to analyze the impact of Ad content experience at a granular level. All of these while ensuring controlled access to data to enrich data without moving across clouds. The NEXT platform can keep up with increased demands for agility, scalability and reliability through the underlying usage of Google Cloud.

Partnering with Google, in the context of the Google Built with BigQuery program has surfaced the differentiated value in areas of creating interactive personalized Ads by using real-time data. In addition, by sharing this data across organizations as assets, ML models have fueled higher levels of innovation. Connected-Stories plan to deepen the penetration into the entire spectrum of services offered in the AI/ML area to enhance core functionality and provide newer capabilities to the platform. 

Click here to learn more about Connected-Stories NEXT Platform capabilities.

The Built with BigQuery Advantage for ISVs 

Through Built with BigQuery, launched in April ‘22 as part of Google Data Cloud Summit, Google is helping tech companies like Connected-Stories co-innovate in building  applications that leverage Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs. Participating companies can:

Get started fast with a Google-funded, pre-configured sandbox. 

Accelerate product design and architecture through access to designated technical experts from the ISV Center of Excellence who can share insights from key use cases, architectural patterns, and best practices encountered in the field. 

Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

The Google Data Cloud spectrum of products and specifically BigQuery give ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge and expanding partner ecosystem and support for multi-cloud, open source tools and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in and exercise choice. 

We thank the Google Cloud and Connected-Stories team members who co-authored the blog: Connected-Stories: Luna Catini, Marketing Director, Google: Sujit Khasnis, Cloud Partner Engineering

Related Article

Built with BigQuery: How True Fit’s data journey unlocks partner growth

True Fit, a data-driven personalization platform built on Google Data Cloud to provide fit personalization for retailers by sharing curat…

Read Article

Source : Data Analytics Read More

How The FA is moving the goal posts with a data cloud approach in Qatar

How The FA is moving the goal posts with a data cloud approach in Qatar

We’re moments away from the kick-off of another historic tournament. After the England men’s football team reached the Euro 2020 final in last year’s pandemic-delayed competition, there is genuine confidence in a successful run in Qatar.

The Football Association (The FA) is the governing body of association football in England, and has left no stone unturned in its preparations; they have increasingly looked to physical performance data as a way to help support players on the pitch. Maintaining accurate and insightful information on fitness, conditioning, and nutrition also helps ensure player welfare – something that gets more important with every fixture in a tournament environment.

The need for improved understanding of how players are faring was the reason The FA set up the Performance Insights strand of its Physical Performance, Medicine, and Nutrition department during lockdown in 2020. And they used Google Cloud to help them revolutionize the way they capture, store, and process information.

A single 90-minute squad training session can generate millions of rows of data. In football, things change so quickly that this data begins to lose relevance as soon as the players are back in the dressing room. That’s why The FA needed a solution which could turn raw data into valuable, easy-to-understand insights. This led the team to BigQuery, Google Cloud’s data warehouse solution.

BigQuery enables The FA’s Performance Insights team to automate previously labor-intensive tasks, and for all the information to be stored in a single, centralized platform for the first time. By collating different data sources across The FA’s squads, there can be greater clarity and fewer siloes – everyone is working towards the same goals. 

 A unique solution for a unique tournament

Access to insights is vital in any tournament situation, but this year there is a need for speed like never before. 

Unlike previous tournaments, Qatar will start in the middle of domestic league seasons throughout the world. Traditionally, international sides are able to meet up for nearly a month between the end of the league season and the start of the tournament – a critical time to work on all aspects of team preparation, including tactics and conditioning. By contrast, this year the England players will have less than a week to train together before the first kick-off.

BigQuery allows The FA’s data scientists to combine data on many aspects of a player’s physical performance captured during a training camp, from intensity to recovery. This can enable more useful conversations on the ground and can help create more individualized player management. And by using BigQuery’s user-defined customisable functions, the same data can be tweaked and tailored to fit the needs across departments. 

This customizability provides a foundation for a truly ‘interdisciplinary’ team in which doctors, strength and conditioners, physios, psychologists, and nutritionists have a common understanding of the support a player needs.

Every minute will count during such a compressed training window, so automation is key. While BigQuery is the core product The FA uses to store and manipulate data, it’s just one part of a suite of Google Cloud products and APIs that help them easily turn data into insights. 

In-game and training performance data, along with data pertaining to players’ sleep, nutrition, recovery, and mental health can be captured and fed through Python, which links straight into BigQuery using its Pub/Sub functionality. BigQuery’s native connectors then stream insights to visual dashboards that convey them in a meaningful, tangible format.  

Before leveraging the power of Google Cloud, this work could take several hours each day. Now, it can take a minute from data capture to the coaches having access to clear and actionable information. 

Predicting a bright future for the Beautiful Game

We won’t have long to wait to see how England will perform in Qatar. But the benefits of The FA’s cloud-enabled approach to data science will continue long after the final whistle has blown.

The short preparation window has posed challenges for The FA, but it has also given the organization a unique opportunity to discover how predictive analytics and machine learning on Google Cloud could further enhance its player performance strategy. 

The Physical Performance, Medicine, and Nutrition department has collected performance data from players throughout this year’s league season, taking into account fixture density and expected physical demand. They hope to use this to support the players’ physical preparation and recovery during the tournament based on individual physical performance profiles.

This ML work is still in the early stages. But the Performance Insights team is confident that by developing even closer relationships with Google Cloud and even greater familiarity with its technology, they will be able to unlock an even greater level of insight into player performance.

Learn more about how Google Cloud can turn raw data into actionable insights, fast.

Source : Data Analytics Read More

What can you build with the new Google Cloud developer subscription?

What can you build with the new Google Cloud developer subscription?

To help you grow and build faster – and take advantage of the 123 product announcements from Next ‘22 – last month we launched theGoogle Cloud Skills Boost annual subscription with new Innovators Plus benefits. We’re already hearing rave reviews from subscribers from England to Indonesia, and want to share what others are learning and doing to help inspire your next wave of Google Cloud learning and creativity.

First, here’s a summary of what the Google Cloud Skills Boost annual subscription1 with Innovators Plus benefits includes;

Access to 700+ hands-on labs, skill badges, and courses

$500 Google Cloud credits

A Google Cloud certification exam voucher

Bonus $500 Google Cloud credits after the first certification earned each year

Live learning events led by Google Cloud experts

Quarterly technical briefings hosted by Google Cloud executives

Celebrating learning achievements

Subscribers get access to everything needed to prepare for a Google Cloud certification exam, which are among the top paying IT certifications in 20222. Subscribers also receive a certification exam voucher to redeem when booking the exam.

Jochen Kirstätter, a Google Developer Expert and Innovator Champion is using the subscription to prepare for his next Google Cloud Professional certification exam, and has found the labs and courses on Google Cloud Skills Boost have helped him feel ready to go get #GoogleCloudCertified 

“‘The only frontiers are in your mind’ – with the benefits of #InnovatorsPlus I can explore more services and practice real-life scenarios intensively for another Google Cloud Professional certification.”

Martin Coombes, a web developer from PageHub Design, is a new subscriber and has already become certified as a Cloud Digital Leader. That means he’s been able to unlock the bonus $500 of Google Cloud credit benefit to use on his next project. 

“For me, purchasing the annual subscription was a no brainer. The #InnovatorsPlus benefits more than pay back the investment and I’ve managed to get my first Google Cloud certification within a week using the amazing Google Cloud Skills Boost learning resources. I’m looking forward to further progressing my knowledge of Google Cloud products.”

Experimenting and building with $500 of Google Cloud credits 

We know how important it is to learn by doing. And isn’t hands-on more fun? Another great benefit of the annual subscription is $500 of Google Cloud credits every year you are a subscriber. And even better, once you complete a Google Cloud certification, you will unlock a bonus $500 of credits to help build your next project just like Martin and Jeff did. 

Rendy Junior, Head of Data at Ruangguru and a Google Cloud Innovator Champion, has already been able to apply the credits to an interesting data analysis project he’s working on. 

“I used the Google Cloud credits to explore new features and data technology in DataPlex. I tried features such as governance federation and data governance whilst data is located in multiple places, even in different clouds. I also tried DataPlex data cataloging; I ran a DLP (Data Loss Prevention) inspection and fed the tag where data is sensitive into the DataPlex catalog. The credits enable me to do real world hands-on testing which is definitely helpful towards preparing for certification too.”

Jeff Zemerick, recently discovered the subscription and has been able to achieve his Professional Cloud Database certification using the voucher and Google Cloud credits to prepare.  

“I was preparing for the Google Cloud Certified Professional Cloud Database exam and the exam voucher was almost worth it by itself. I used some of the $500 cloud credits to prepare for the exam by learning about some of the Google Cloud services where I felt I might need more hands-on experience. I will be using the rest of the credits and the additional $500 I received from passing the exam to help further the development of our software to identify and redact sensitive information in the Google Cloud environment. I’m looking forward to using the materials available in Google Cloud Skills Boost to continue growing my Google Cloud skills!”

Grow your cloud skills with live learning events 

Subscribers gain access to live learning events, where a Google Cloud trainer teaches popular topics in a virtual classroom environment. Live-learning events cover topics like BigQuery, Kubernetes, CloudRun, Cloud Storage, networking and security. We’ve set these up to go deep: mini live-learning courses consist of two highly efficient hours of interactive instruction, and gamified live learning events are three hours of challenges and fun. We’ve already had over 400 annual subscribers reserve a spot for upcoming live learning events. Seats are filling up fast for the November and December events, so claim yours before it’s too late. 

Shape the future of Google Cloud products through the quarterly technical briefings  

As a subscriber, you are invited to join quarterly technical briefings, getting insight into the latest product developments and new features, with the opportunity for subscribers to engage and shape future product development for Google Cloud. Coming up this quarter, get face time with Matt Thompson, Google Cloud’s Director of Developer Adoption, who will demonstrate some of the best replicable uses of Google Cloud he’s seen from leading developers. 

Start your subscription today 

Take charge of your cloud career today by visiting cloudskillsboost.google to get started with your annual subscription. Make sure to activate your Innovators Plus badge once you do and enjoy your new benefits. 

1. Subject to eligibility limitations. 
2. Based on responses from the Global Knowledge 2022 IT Skills and Salary Survey.

Source : Data Analytics Read More

BigQuery helps Soundtrack Your Brand hit the high notes without breaking a sweat

BigQuery helps Soundtrack Your Brand hit the high notes without breaking a sweat

Editor’s note: Soundtrack Your Brand is an award-winning streaming service with the world’s largest  licensed music catalog built just for businesses, backed by Spotify. Today, we hear how BigQuery has been a foundational component in helping them transform big data into music. 

Soundtrack Your Brand is a music company at its heart, but big data is our soul. Playing the right music at the right time has a huge influence on the emotions a brand inspires, the overall customer experience, and sales.  We have a catalog of over 58 million songs and their associated metadata from our music providers and a vast amount of user data that helps us deliver personalized recommendations, curate playlists and stations, and even generate listening schedules. As an example, through our Schedules feature our customers can set up what to play during the week.  Taking that one step further, we provide suggestions on what to use in different time slots and recommend entire schedules.

Using BigQuery, we built a data lake to empower our employees to access all this content and metadata in a structured way. Ensuring that our data is easily discoverable and accessible allows us to build any type of analytics or machine learning (ML) use case and run queries reliably and consistently across the complete data set. Today, our users are benefiting from this advanced analytics through the personalized recommendations we offer across our core features: Home, Search, Playlists, Stations, and Schedules.

Fine-tuning developer productivity

The biggest business value that comes from BigQuery is how much it speeds up our development capabilities and allows us to ship features faster. In the past 3 years, we have built more than 150 pipelines and more than 30 new APIs within our ML and data teams that total about 10 people. That is an impressive rate of a new pipeline every week and a new API every month.  With everything in BigQuery, it’s easy to simply write SQL and have it be orchestrated within a CI/CD toolchain to automate our data processing pipelines. An in-house tool built as a github template, in many ways very similar to Dataform, helps us build very complex ETL processes in minutes, significantly reducing the time spent on data wrangling. 

BigQuery acts as a cornerstone for our entire data ecosystem, a place to anchor all our data and be our single source of truth. This single source of truth has expanded the limits of what we can do with our data. Most of our pipelines start from a data lake, or end at a data lake, increasing re-usability of data and collaboration. For example, one of our interns built an entire churn prediction pipeline in a couple of days on top of existing tables that are produced daily. Nearly a year later, this pipeline is still running without failure largely due to its simplicity. The pipeline is BigQuery queries chained together into a BigQuery ML model running on a schedule withKubeflow Pipelines

Once we made BigQuery the anchor for our data operations, we discovered we could apply it to use cases that you might not expect, such as maintaining our configurations or supporting our content management system. For instance, we created a Google Sheet where our music experts are able to correct genre classification mistakes for songs by simply adding a row to a Google Sheet. Instead of hours or days to create a bespoke tool, we were able to set everything up in a few minutes. 

BigQuery’s ability to consume Excel spreadsheets allows business users who play key roles in improving our recommendations engine and curating our music, such as our content managers and DJs, to contribute to the data pipeline.

Another example is our use of BigQuery as an index for some of our large Cloud Storage buckets. By using cloud functions to subscribe to read/write events for a bucket, and writing those events to partitioned tables, our pipelines can easily and in a natural way quickly search and access files, such as downloading and processing the audio of new track releases. We also make use of Log Events when a table is added to a dataset to trigger pipelines that process data on demand, such as JSON/CSV files from some of our data providers that are newly imported into BQ. Being the place for all file integration and processing, BQ allows new data to be quickly available to our entire data ecosystem in a timely and cost effective manner while allowing for data retention, ETL, ACL and easy introspection.

BigQuery makes everything simple. We can make a quick partitioned table and run queries that use thousands of CPU hours to sift through a massive volume of data in seconds — and only pay a few dollars for the service. The result? Very quick, cost-effective ETL pipelines. 

In addition, centralizing all of our data in BigQuery makes it possible to easily establish connections between pipelines providing developers with a clear understanding of what specific type of data a pipeline will produce. If a developer wants a different outcome, she can copy the github template and change some settings to create a new, independent pipeline.

Another benefit is that developers don’t have to coordinate schedules or sync with each other’s pipelines: they just need to know that a table that is updated daily exists and can be relied on as a data source for an application. Each developer can progress their work independently without worrying about interfering with other developers’ use of the platform.

Making iteration our forte

Out of the box, BigQuery met and exceeded our performance expectations, but ML performance was the area that really took us by surprise. Suddenly, we found ourselves going through millions of rows in a few seconds, where the previous method might have taken an hour.  This performance boost ultimately led to us improving our artist clustering workload from more than 24 hours on a job running 100 CPU workers to 10 minutes on a BigQuery pipeline running inference queries in a loop until convergence.  This more than 140x performance improvement also came at 3% of the cost. 

Currently we have more than 100 Neural Network ML models being trained and run regularly in batch in BQML. This setup has become our favorite method for both fast prototyping and creating production ready models. Not only is it fast and easy to hypertune in BQML, but our benchmarks show comparable performance metrics to using our own Tensorflow code. We now use Tensorflow sparingly. Differences in input data can have an even greater impact on the experience of the end user than individual tweaks to the models. 

BigQuery’s performance makes it easy to iterate with the domain experts who help shape our recommendations engine or who are concerned about churn, as we are able to show them the outcome on our recommendations from changes to input data in real time. One of our favorite things to do is to build a Data Studio report that has the ML.predict query as part of its data source query. This report shows examples of good/bad predictions in the report along with bias/variance summaries and a series of drop-downs, thresholds and toggles to control the input features and the output threshold. We give that report to our team of domain experts to help manually tune the models, putting the model tuning right in the hands of the domain experts. Having humans in the loop has become trivial for our team. In addition to fast iteration, the BigQuery ML approach is also very low maintenance. You don’t need to write a lot of Python or Scala code or maintain and update multiple frameworks—everything can be written as SQL queries run against the data store.

Helping brands to beat the band—and the competition 

BigQuery has allowed us to establish a single source of truth for our company that our developers and domain experts can build on to create new and innovative applications that help our customers find the sound that fits their brand. 

Instead of cobbling together data from arbitrary sources, our developers now always start with a data set from BigQuery and build forward.  This guarantees the stability of our data pipeline and makes it possible to build outward into new applications with confidence. Moreover, the performance of BigQuery means domain experts can interact with the analytics and applications that developers create more easily and see the results of their recommended improvements to ML models or data inputs quickly. This rapid iteration drives better business results, keeps our developers and domain experts aligned, and ensures Soundtrack Your Brand keeps delivering sound that stands out from the crowd.

Related Article

How Telus Insights is using BigQuery to deliver on the potential of real-world big data

BigQuery’s impressive performance reduces processing time from months to hours and delivers on-demand real-world insights for Telus.

Read Article

Source : Data Analytics Read More