How Livesport activates data and saves engineering resources with BigQuery and Dataddo

How Livesport activates data and saves engineering resources with BigQuery and Dataddo

Today, organizations can build a robust, end-to-end data infrastructure with a fraction of the in-house engineering resources. Livesport — a provider of real-time data, results, and statistics for virtually all sporting events around the globe — recognized this early on when building their data team.

Livesport was already using BigQuery as a data warehouse for all layers of their infrastructure, but the company didn’t want to build a data team based on syncing data to BigQuery from other databases and non-Google sources. Instead, Livesport sought a solution from Dataddo, a recognized Google Cloud Ready data integration partner for BigQuery.

Offloading data integration tasks to a dedicated tool

Livesport’s data team has strong SQL skills, so it was important to give the team space to focus on data activation and analytics while outsource engineering workloads associated with data automation and ingestion.

The data team was already syncing data from Google services like GA4 and Google Ads to BigQuery, which cost them nothing to maintain thanks to native integrations. However, they needed a tool that could sync a mountain of data from sources outside the Google ecosystem, such as Livesport’s internal databases (e.g., sports data and data from their app), ERP system, third-party services, social media accounts, and affiliate partner APIs.

At the same time, Livesport was also looking for a solution that could provide some other nice-to-haves that could help reduce the burden on the data team. These benefits included fixed pricing due to high fluctuation in the amount of data synced monthly, close support to connect data from unique internal solutions, and a willingness to build out new connectors quickly if needed.

Flexible, customizable, end-to-end connectivity with Dataddo

Livesport first evaluated another popular data integration tool, but eventually chose Dataddo because it met all their essential criteria and more.

With Dataddo, Livesport can:

Connect data from all of its sources to BigQuery, including internal databases (via CDC data replication), third-party services, custom sources, and affiliate partner APIs.Gain real-time support from dedicated specialists, with a Slack channel where specialists from both sides can interact and collaborate to implement custom integrations.Build new connectors within 10 days, free of charge.Ensure no surprises at the end of billing periods with fixed pricing.

Livesport is also taking advantage of Dataddo capabilities that go beyond their initial requirements. For example, Dataddo makes it easy to connect online services with business intelligence (BI) tools. The no-code user interface enables Livesport’s business teams, such as marketing, to flexibly sync data from apps like Facebook with BI systems to gain ad-hoc insights — without intervention from the data team. Dataddo also allows Livesport to import offline conversion data from BigQuery directly into Google Ads and even provides an added layer of security with reverse SSH tunneling.

Less engineering, bigger BI team

By outsourcing data engineering tasks to Dataddo, Livesport’s data team is now free to fully capitalize on the analytics capabilities of BigQuery. They can also spend more time using other Google Cloud Platform services like Vertex AI, BigQueryML, and Cloud Functions to enrich data and then send it downstream to end users.

“We save about 70% of the time it would otherwise take to ingest all our data, or 3-4 full-time equivalents, and spend this much more time on data analytics and activation. We only have one full-time data engineer, who does more than just collect data, while our BI team consists of 11 members,” said Zdeněk Hejnak, Data development Team Leader at Livesport.

Livesport is also testing Dataddo’s reverse ETL capabilities to automate the import of offline conversion data from BigQuery to Google Ads — a cutting-edge way to optimize ad spend and punctually target qualified prospects.

“We’re constantly looking for new opportunities to get more from our data, so reverse ETL to Google Ads is a promising direction,” Hejnak said.

To learn more about Dataddo, visit the Google Cloud partner directory or Dataddo’s Marketplace offerings. If you’re interested in using Dataddo for BigQuery, check out Dataddo’s BigQuery Connector and learn more about Google Cloud Ready – BigQuery.

Source : Data Analytics Read More

Standardize your cloud billing data with the new FOCUS BigQuery view

Standardize your cloud billing data with the new FOCUS BigQuery view

Businesses today often rely on multiple cloud providers, making it crucial to have a unified view of their cloud spend. This is where the FinOps Open Cost and Usage Specification (FOCUS) comes in. And today, we’re excited to announce a new BigQuery view that leverages the recent FOCUS (Preview) to help simplify cloud cost management across clouds.

What is FOCUS?

The FinOps Cost and Usage Specification aims to deliver consistency and standardization across cloud billing data, by unifying cloud and usage data into one common data schema. Before FOCUS, there was no industry-standard way to normalize key cloud cost and usage measures across multiple cloud service providers (CSPs), making it challenging to understand how billing costs, credits, usage, and metrics map from one cloud provider to another (see FinOps FAQs for more details).

FOCUS helps FinOps practitioners perform fundamental FinOps capabilities using a generic set of instructions and unified schema, regardless of the origin of the dataset. FOCUS is a living, breathing specification that is constantly being iterated on and improved by the Working Group, which consists of FinOps practitioners, CSP leaders, Software as a Service (SaaS) providers, and more. The FOCUS specification v1.0 Preview was launched in November 2023, paving the way for more efficient and transparent cloud cost management. If you’d like to read more or join the Working Group, here is a link to the FOCUS website.

Introducing a BigQuery view for FOCUS v1.0 Preview

Historically, we’ve offered three ways to export cost and usage-related Cloud Billing data to BigQuery: Standard Billing Export, Detailed Billing Export (resource-level data and price fields to join with Price Export table), and Price Export. Today, we are introducing a new BigQuery view that transforms this data so that it aligns with the data attributes and metrics defined in the FOCUS v1.0 Preview.

A BigQuery view is a virtual table that represents the results of a SQL query. The BigQuery view can be formed off of a base query (see below on how to get access) that maps Google Cloud data into the display names, format, and behavior of the FOCUS Preview dimensions and metrics. BigQuery views are great because the queryable virtual table only contains data from the tables and fields specified in the base query that defines the view. BigQuery views are virtual tables, so incur no additional charges for data storage if you are already using Billing Export to BigQuery.

You should spend time optimizing costs, not mapping billing terminology across Cloud Providers. With the FOCUS BigQuery view, you can now…

View and query Google Cloud billing data that is adapted towards the FOCUS specificationUse the BigQuery view as a data source for a visualization tools like Looker StudioAnalyze your Google Cloud costs alongside data from other providers using the common FOCUS format

How it works

The FOCUS BigQuery view acts as a virtual table that sits on top of your existing Cloud Billing data. To use this feature, you will need Detailed Billing Export and Price Exports enabled. Follow these instructions to set up your billing exports to BigQuery. The FOCUS BigQuery view uses a base SQL query to map your Cloud Billing data into the FOCUS schema, presenting it in the specified format. This allows you to query and analyze your data as if it were native to FOCUS, making it easier to compare costs across different cloud providers.

We’ve made it easy to leverage the power of FOCUS with a step-by-step guide. To view this sample SQL query and follow the step-by-step guide, sign up here.

Looking ahead: A commitment to open standards and collaboration

At Google Cloud, open standards are part of our DNA. We were a founding member of the FinOps Foundation, the first CSP to join the Open Billing Standards Working group, and a core contributor to the v0.5 and v1.0 specifications. As a strong advocate for open billing standards, we believe customers deserve a glimpse of what’s possible with Google Cloud Billing data considering the latest FOCUS specification.

We look forward to shaping the standards of open billing standards alongside our customers, FinOps practitioners in the industry, the FinOps Foundation, CSPs, SaaS providers, and more. Get a unified view of your cloud costs today with the FOCUS BigQuery view. Sign up here to learn more and get started.

Related Article

When they go closed, we go open – Google Cloud and open billing data

Google Cloud partnered with the FinOps Foundation on FOCUS, a Linux Foundation project, to establish an open specification for cloud bill…

Read Article

Source : Data Analytics Read More

Unify customer and partner data with the new entity resolution framework in BigQuery

Unify customer and partner data with the new entity resolution framework in BigQuery

Announcing the BigQuery entity resolution framework
In today’s data-driven world, fragmented information can paint a blurry picture of your users and customers. Connecting the dots between disparate records to reveal a unified identity is a common challenge. Manual data matching is error-prone, time-consuming, and does not scale.

That’s where entity resolution can provide critical value. Whether it’s stitching together a customer’s purchase history across platforms or identifying fraudulent activity hidden within duplicate accounts, entity resolution unlocks the true potential of your data to give you a unified view of who and what matters most.

Match records without moving or copying data
The BigQuery entity resolution framework allows you to integrate with the identity provider of your choice using standard SQL queries. BigQuery customers can now resolve entities in place without invoking data transfer fees or managing ETL jobs. Identity providers can provide their identity graphs as a service on Google Cloud Marketplace without revealing their matching logic or identity graphs to end users.

The BigQuery entity resolution framework uses remote function calls to match your data in an identity provider’s environment. Your data does not need to be copied or moved during this process as shown here:

The end user grants the identity provider’s service account read access to their input dataset and write access to their output dataset.

The user calls the remote function that matches their input data with the provider’s identity graph data. Matching parameters are passed to the provider with the remote function.

The provider’s service account reads the input dataset and processes it.

The provider’s service account writes the entity resolution results to the user’s output dataset.

Why use entity resolution?
The BigQuery entity resolution framework benefits a wide range of industries and use cases, including:

Marketing: Enhance customer segmentation and targeting by clustering customer profiles across channels.
Financial services: Identify fraudulent transactions and customer churn by accurately linking financial records.
Retail: Gain a holistic view of customer behavior by deduplicating purchase records across platforms.
Healthcare: Improve patient care by unifying medical records from disparate sources.
Data sharing: Prepare data for use in BigQuery data clean rooms, which allows organizations to share data in low-trust environments.

Entity resolution pricing
The BigQuery entity resolution framework does not incur additional storage or compute costs beyond any fees charged by the identity provider for use of their service. Identity providers pay no additional costs beyond the storage and compute required to implement and run their entity resolution service. The framework is available in all BigQuery compute models and its use is not restricted by edition.

What our partners say about the BigQuery entity resolution framework
We’ve worked closely with entity resolution providers to design our framework. Here’s what they have to say:

“Entity Resolution on BigQuery is truly a game changer that greatly enhances data connectivity while minimizing data movement. Now Google Cloud clients can access an extensible identity framework that spans data warehouses, clean rooms and AI; and marketers can extend their custom data pipelines with a consistent enterprise identity across LiveRamp’s Data Collaboration Platform services. The result: better customer understanding and measurement, and enriched marketing signals to guide brand success.” – Erin Boelkens, VP of Product, LiveRamp

“TransUnion’s identity resolution unifies customer data and improves its hygiene through deduplication, verification, and correction. With Entity Resolution on Google Cloud and TransUnion’s integration, data engineering teams can reduce setup and ongoing management tasks while making consumer identity ready for insights, audience building, and activation.” – Ryan Engle VP of Identity Solutions, Credit Marketing, and Platform Integrations, TransUnion

Take the next step
If you are an identity provider and want to offer your identity resolution service to Google Cloud customers, you can get started today using the BigQuery entity resolution guide. For additional help, ask your Google Cloud account manager to reach out to the Built with BigQuery team

The Built with BigQuery team helps Independent Software Vendors (ISVs) and data providers build innovative applications with Google Data Cloud. Participating companies can: 

Accelerate product design and architecture through access to designated experts who can provide insight into key use cases, architectural patterns, and best practices
Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption

Source : Data Analytics Read More

Dividends from data: Building a lean data stack for a Series C Fintech

Dividends from data: Building a lean data stack for a Series C Fintech

It is often said that a journey of a thousand miles begins with a single step.

10 years ago, building a data technology stack felt a lot more like a thousand miles than it does today; technology, automation, and business understanding of the value of data have significantly improved. Instead, the problem today is knowing how to take the first step.

Figure: PrimaryBid Overview

PrimaryBid is a regulated capital markets technology platform connecting public companies to their communities during fundraisings. But choosing data technologies presented a challenge as our requirements had several layers:

PrimaryBid facilitates novel access for retail investors to short-dated fundraising deals in the public equity and debt markets. As such, we need a platform that can be elastic to market conditions.PrimaryBid operates in a heavily regulated environment, so our data stack must comply with all applicable requirements.PrimaryBid handles many types of sensitive data, making information security a critical requirement.PrimaryBid’s data assets are highly proprietary; to make the most of this competitive advantage, we needed a scalable, collaborative AI environment.As a business with international ambitions, the technologies we pick have to scale exponentially, and be globally available.And, perhaps the biggest cliche, we needed all of the above for as low a cost as possible.

Over the last 12 or so months, we built a lean, secure, low-cost solution to the challenges above, partnering with vendors that are a great fit for us; we have been hugely impressed by the quality of tech available to data teams now, compared with only a few years ago. We built an end-to-end unified Data and AI Platform. In this blog, we will describe some of the decision-making mechanisms together with some of our architectural choices.

The 30,000 foot view

The 30,000 foot view of PrimaryBid’s environment will not surprise any data professional. We gather data from various sources, structure it into something useful, surface it in a variety of ways, and combine it together into models. Throughout this process, we monitor data quality, ensure data privacy, and send alerts to our team when things break.

Figure: High Level summary of our data stack

Data gathering and transformation

For getting raw data into our data platform, we wanted technology partners whose solutions were low-code, fast, and scalable. For this purpose, we chose a combination of Fivetran and dbt to meet our needs.

Fivetran supports a huge range of pre-built data connectors, which allow data teams to land new feeds in a matter of minutes. The cost model we have adopted is based on monthly ‘active’ rows, i.e., we only pay for what we use.

Fivetran also takes care of connector maintenance, freeing up massive amounts of engineering time by outsourcing the perpetual cycle of updating API integrations.

Once the data is extracted, dbt turns raw data into a usable structure for downstream tools, a process known as analytics engineering. dbt and Fivetran make a synergistic partnership, with many Fivetran connectors having dbt templates available off the shelf. dbt is hugely popular with data engineers, and contains many best practices from software development that ensure analytics transformations are robust.

Both platforms have their own orchestration tools for pipeline scheduling and monitoring, but we deploy Apache Airflow 2.0, managed via Google Cloud’s Cloud Composer, for finer-grained control.

Data storage, governance, and privacy

This is the point in our data stack where Google Cloud starts to solve a whole variety of our needs.

We start with Google Cloud’s BigQuery. BigQuery is highly scalable, serverless, and separates compute costs from storage costs, allowing us only to pay for exactly what we need at any given time.

Beyond that though, what sold us the BigQuery ecosystem was the integration of the data and model privacy, governance and lineage throughout. Leveraging Google Cloud’s Dataplex, we set security policies in one place, on the raw data itself. As the data is transformed and passed between services, these same security policies are adhered to throughout.

One example is PII, which is locked away from every employee bar a necessary few. We tag data one time with a ‘has_PII’ flag, and it doesn’t matter what tool you are using to access the data, if you do not have permission to PII in the raw data you will never be able to see it anywhere.

Figure: Unified governance using Dataplex

Data analytics

We chose Looker for our self-service and business intelligence (BI) platform based on three key factors:

Instead of storing data itself, Looker writes SQL queries directly against your data warehouse. To ensure it writes the right query, engineers and analysts build Looker analytics models using ‘LookML’. LookML for the most part is low-code, but for complex transformations, SQL can be written directly into the model, which plays to our team’s strong experience with SQL. In this instance we store the data in BigQuery and access through Looker.Being able to extend Looker into our platforms was a core decision factor. With the LookML models in place, transformed, clean data can be passed to any downstream service.Finally, the interplay between Looker and Dataplex is particularly powerful. Behind the scenes, Looker is writing queries against BigQuery. As it does so, all rules around data security and privacy are preserved.

There is much more to say about the benefits we found using Looker; we look forward to discussing these in a future blog post.

AI and machine learning

The last step in our data pipelines is our AI/ML environment. Here, we have leaned even further into Google Cloud’s offerings, and decided to use Vertex AI for model development, deployment, and monitoring.

To make model building as flexible as possible, we use the open-source Kubeflow framework within Vertex AI Pipeline environment for pipeline orchestration; this framework decomposes each step of the model building process into components, each of which performs a fully self-contained task, and then passes metadata and model artifacts to the next component in the pipeline. The result is highly adaptable and visible ML pipelines, where individual elements can be upgraded or debugged independently without affecting the rest of the code base.

Figure: Vertex AI Platform

Finishing touches

With this key functionality set up, we’ve added a few more components to add even more functionality and resilience to the stack:

Real-ime pipelines: running alongside our FiveTran ingestion, we added a lightweight pipeline that brings in core transactional data in real time. This leverages a combination of managed Google Cloud services, namely Pub/Sub and Dataflow, and adds both speed and resilience to our most important data feeds.Reverse ETL: Leveraging a CDP partner, we write analytics attributes about our customers back into our customer relationship management tools, to ensure we can build relevant audiences for marketing and service communications.Generative AI: following the huge increase in available gen AI technologies, we’ve built several internal applications that leverage Google’s PaLM 2. We are working to build an external application too — watch this space!

So there you have it, a whistle-stop tour of a data stack. We’re thrilled with the choices we’ve made, and are getting great feedback from the business. We hope you found this useful and look forward to covering how we use Looker for BI in our organization.

Special thanks to the PrimaryBid Data Team, as well as Stathis Onasoglou and Dave Elliott for their input prior to publication.

Related Article

Data analytics in the age of AI: How we’ve enhanced our data platforms this year

Round-up of data analytics innovations in 2023 for the era of AI.

Read Article

Source : Data Analytics Read More

Confluent brings real-time capabilities to Google Cloud generative AI

Confluent brings real-time capabilities to Google Cloud generative AI

In 2023, the spotlight was on generative AI (gen AI) and how it is paving the way for a new category of AI that can create and co-innovate with humans to produce new content, such as text, code, images, and music. Gen AI capabilities are not only promising but extremely powerful, given that large language models (LLMs) can be trained and tuned on vast amounts of data.

Still, the freshness of the data can limit a gen AI model’s out-of-box capabilities and potential. Gen AI models sometimes need to be extended to other systems, such as to access new information when the use case requires real-time context. As a result, many organizations find themselves looking to solve issues related to real-time data access, integrating data from multiple sources in different formats, and the complexity associated with training and leveraging models.

Data streaming platforms like Confluent, powered by Apache Kafka, can help overcome these data challenges by providing the latest data and information. With Confluent, businesses can easily connect, process, integrate, and scale the data needed to support their gen AI use cases, enabling them to solve highly-specific, contextual issues in real time.

For instance, a customer service team could use Confluent to stream real-time customer requests and responses to a chatbot built with Google Cloud technologies. Leveraging AI, the chatbot can access the information used in responses to create more personalized and relevant recommendations while considering real-time context, such as weather conditions, demographics, and purchase history.

Improving personalization is just one example of ways businesses can benefit from using real-time data and gen AI to deliver better customer experiences. Overall, it can also help boost sales with better recommendations, reduce churn with more satisfying customer experiences, and even reduce costs by helping to automate more support tasks with the help of AI chatbots.

Creating a central nervous system for data movement

Through its many years of research and product development, Google has become a recognized leader in the AI space. Already, Google’s LLMs are providing a strong foundation for a rich set of gen AI capabilities that customers and partners can leverage to build new innovations.

Now, Confluent is making these models even easier to use by helping integrate structured and unstructured data from various sources directly into Google Cloud. You can use Connectors/Clients to stream reliable, real-time data in Confluent to Google Cloud AI products and services at scale.

The diagram above illustrates common architecture patterns for using Confluent to stream real-time data that can support gen AI workflows.

Knowledge workflows: Confluent gathers data from various sources across internal and external data systems, pre-processes it into a specific format, and stores it to an appropriate location, which can be used as a knowledge base to build context for gen AI.Inference workflows: Confluent streams data to the systems and tools that help create gen AI-powered interactions between human users and machines, such as text, voice, conversation, and more.Central nervous system: Confluent orchestrates the data exchange between processes and services seamlessly, abstracting events as data streams, processing them, and connecting them directly to models and neural networks. By using stream processing the results can provide communication with a human through various machine interfaces.

To demonstrate how this comes to life, Confluent built a gen AI-powered, personalized shopping assistant that leverages Confluent and Google Cloud generative AI. The application flow allows a customer to have a conversation with an AI chatbot, which connects to Vertex AI and interacts in real time. Here is an example of the dialog:

Behind the scenes, Confluent takes the request, sends it to Vertex AI, and then provides a response. With Confluent, Apache Kafka provides the framework for the business to quickly process the data and provide a generative AI response. This delivers an enriched customer experience and allows the customer to receive precise details on product availability and purchasing locations.

Confluent improves gen AI chatbots in a number of ways, including:

Creating a unified view for gen AI models. Confluent can combine data from multiple sources, including cross-cloud and on-premises, to create a unified view for gen AI models. This view can be used for a variety of applications, such as generating targeted content based on the user, no matter where the data lives..Reducing the cost of training and deploying gen AI models. Confluent helps reduce the amount of data that needs to be processed by gen AI models, leading to lower training and deployment costs.Improving the accuracy and performance of gen AI models: By providing gen AI models with real-time access and the ability to process data from various sources, Confluent improves the accuracy and performance of gen AI models.Making gen AI models more accessible to everyone. Confluent is easy to use and manage, making it easier for developers and organizations to build and deploy gen AI applications that require real-time information.

In addition to these benefits, there are a number of ways that Confluent enables the real-time data movement to Google Cloud AI services that empowers organizations to build more sophisticated gen AI experiences with text, voice, images, and video. For example, streaming your most recent, up-to-date customer data to Vertex AI Search and Conversation can enable teams to harness your data to provide more personalized, relevant chatbot responses and improve personalized recommendations based on customer preferences.

Confluent and Google Cloud: Better gen AI, together

Overall, Confluent allows organizations to get more out of Google Cloud generative AI, helping them build, deploy, and scale gen AI applications faster without worrying about whether they have the data they need.

Confluent delivers real-time data streaming, data integration, scalability, and reliability in one industry-leading platform, allowing organizations from every industry to tap into gen AI to provide better experiences or solve customer problems. Confluent also recently launched Data Streaming for AI, an initiative leveraging Google gen AI partnerships to accelerate organizations’ development of real-time AI applications. And that’s just the beginning — Confluent continues to work on delivering top data streaming innovation to help companies meet real-time AI demands with trustworthy, relevant data served up in the moment.

Learn more about Google Cloud’s open and innovative generative AI partner ecosystem. To get started with Confluent, join the Data Streaming Startup Challenge and begin experimenting with Confluent Cloud on the Google Cloud Marketplace today!

Source : Data Analytics Read More

Unleash the power of generative AI with BigQuery and Vertex AI

Unleash the power of generative AI with BigQuery and Vertex AI

Organizations dream of unlocking new insights and efficiencies with AI. To do this, they need a data and AI platform that makes it easy and seamless to access all enterprise data, both structured and unstructured, in a secure and governed way.

To help customers accomplish this, we are announcing innovations that further connect data and AI with increased scale and efficiency using BigQuery and Vertex AI, allowing you to:

Simplify multimodal generative AI for enterprise data by making Gemini models available through BigQuery MLUnlock value from unstructured data by expanding BigQuery integration with Vertex AI’s document processing and speech-to-text APIsBuild and unleash AI-powered search of your business data with vector search in BigQuery

Bringing AI directly to your data using first-party model integration with BigQuery and Vertex AI democratizes the power of generative AI to all data teams and allows you to seamlessly activate your enterprise data with large language models. This makes building AI-driven analytics simpler, faster and more secure, while taking advantage of BigQuery’s unique serverless architecture for scale and efficiency.

aside_block
<ListValue: [StructValue([(‘title’, ‘Join our upcoming Data Cloud Innovation Live webcast on March 7th’), (‘body’, <wagtail.rich_text.RichText object at 0x3e810f2ebe50>), (‘btn_text’, ‘Register’), (‘href’, ‘https://cloudonair.withgoogle.com/events/data-cloud-innovation-live’), (‘image’, None)])]>

Simplify generative AI use cases with Gemini models

BigQuery ML lets you create, train and execute machine learning models in BigQuery using familiar SQL. With customers running hundreds of millions of prediction and training queries every year, usage of built-in ML in BigQuery grew 250% YoY1.

Today, we are taking BigQuery one step further with Gemini 1.0 Pro integration via Vertex AI. The Gemini 1.0 Pro model is designed for higher input/output scale and better result quality across a wide range of tasks like text summarization and sentiment analysis. You can now access it using simple SQL statements or BigQuery’s embedded DataFrame API from right inside the BigQuery console.

This enables you to build data pipelines that blend structured data, unstructured data and generative AI models together to create a new class of analytical applications. For example, you can analyze customer reviews in real-time and combine them with purchase history and current product availability to generate personalized messages and offers, all right inside BigQuery. You can learn more about BigQuery and Gemini models integration here.

In the coming months, we plan on helping customers unlock multimodal generative AI use cases by expanding the support for Gemini 1.0 Pro Vision model. This provides you the ability to analyze images, videos, and other complex data using familiar SQL queries. For example, if you are working with a large image dataset in BigQuery, you will be able to leverage the Gemini 1.0 Pro Vision model to generate image descriptions, categorize them for better search, annotate key features, colors, aesthetics, and much more.

Unlocking value from unstructured data with AI

Unstructured data such as images, documents, and videos represent a large portion of untapped enterprise data. However, unstructured data can be challenging to interpret, making it difficult to extract meaningful insights from it.

BigLake unifies data lakes and warehouses under a single management framework, enabling you to analyze, search, secure, govern and share unstructured data. With increasing data volumes, customer use of BigLake has grown to hundreds of petabytes. Leveraging the power of BigLake, customers are already analyzing images using a broad range of AI models including Vertex AI’s vision APIs, open-source TensorFlow Hub models, or their own custom models.

We are now expanding these capabilities to help you easily extract insights from documents and audio files using Vertex AI’s document processing and speech-to-text APIs. With these new capabilities, you can create generative AI applications for content generation, classification, sentiment analysis, entity extraction, summarization, embeddings generation, and more.

For example, you can perform deeper financial performance analysis by deriving information like revenue, profit and assets from financial reports and combining it with a BigQuery dataset that contains historical stock performance. Similarly, you can improve customer service by analyzing customer support call recordings for sentiment, identifying common issues, and correlating the call insights with purchase history..

Improve vector search with your unstructured data

Earlier this month, we announced the preview of BigQuery vector search integrated with Vertex AI to enable vector similarity search on your BigQuery data. This functionality, also commonly referred to as approximate nearest-neighbor search, is key to empowering numerous new data and AI use cases such as semantic search, similarity detection, and retrieval-augmented generation (RAG) with a large language model (LLM). Vector search can also enhance the quality of your AI models by improving context understanding, reducing ambiguity, ensuring factual accuracy, and allowing adaptability to different tasks and domains.

For example, vector search can help retailers improve product recommendations to customers. Imagine a shopper looking at a picture of a red dress on the retailer’s e-commerce website. With a vector search, shoppers have the ability to search for their stylistic preference such as the color, cut, maybe even the occasion. With vector search, the retailer can automatically suggest other dresses that are similar, even if they don’t have identical descriptions. This way, shoppers find what they’re looking for more easily, and retailers can show things shoppers are more likely to buy.

Built on our text embeddings capabilities, and adhering to your AI governance policies and access controls, BigQuery vector search unlocks new data and AI use cases such as:

Retrieval-augmented generation (RAG): Retrieve data relevant to a question or task and provide it with context to an LLM. For example, use a support ticket to find ten closely-related previous cases, and pass them to an LLM as context to summarize and suggest a resolution.Semantic search: Find semantically similar documents to a given query, even if the documents do not contain the exact same words. This is useful for tasks such as finding related articles, similar products, or answers to questions.Text clustering: Cluster documents into groups of similar documents. This is useful for tasks such as organizing documents, finding duplicate documents, or identifying trends in a corpus of documents.Summarization: Summarize documents by finding the most similar documents to the original document and extracting the main points. This is useful for tasks such as generating executive summaries, creating abstracts, or summarizing news articles.

Join us for the future of data and generative AI

When it comes to augmenting your business data with generative AI, we’re just getting started. To learn more, sign up for the upcoming Data Cloud Innovation Live webcast on March 7, 2024, 9 – 10 AM PST. And be sure to join us at Next ’24 to get the inside track on all the latest product news and innovations to accelerate your transformation journey this year.

1. Usage of built-in ML in BigQuery grew 250% YoY between July 2022 and 2023.

Source : Data Analytics Read More

Google Cloud databases stand ready to power your gen AI apps with new capabilities

Google Cloud databases stand ready to power your gen AI apps with new capabilities

At Google Cloud, we help our customers unify their data and connect it with groundbreaking AI to build transformative experiences. Data, whether it’s structured data in an operational database or unstructured data in a data lake, helps make AI more effective. For businesses to truly take advantage of generative AI, they need to access, manage, and activate structured and unstructured data across their operational and analytical systems.

At Next ‘23, we laid out a vision to help developers build enterprise gen AI applications including delivering world-class vector capabilities, building strong integration with the developer ecosystem, and making it easy to connect to AI inferencing services. We’ve been hard at work delivering on that promise and today, we’re announcing the general availability (GA) of AlloyDB AI, an integrated set of capabilities in AlloyDB to easily build enterprise gen AI apps. 

We’re also announcing vector search capabilities across more of our databases including Spanner, MySQL, and Redis to help developers build gen AI apps with their favorite databases, and we are adding integrations with LangChain, a popular framework for developing applications powered by language models. 

All these capabilities join our existing integrations with Vertex AI to provide an integrated platform for developers. Spanner and AlloyDB integrate natively with Vertex AI for model serving and inferencing with the familiarity of SQL, while Firestore and Bigtable integrate with Vertex AI Vector Search to deliver semantic search capabilities for gen AI apps.

We believe the real value of generative AI is unlocked when operational data is integrated with gen AI to deliver real-time, accurate, and contextually-relevant experiences across enterprise applications. Operational databases with vector support help bridge the gap between foundation models and enterprise gen AI apps. And because operational databases typically store a majority of application data, they play a critical role in how developers build new, AI-assisted user experiences. That’s why 71% of organizations plan to use databases with integrated gen AI capabilities. Successful databases will evolve to be AI-first, and deeply integrate technologies such as vector search, with seamless connectivity to AI models, and tight integrations with AI tooling and frameworks. All these will be natively built into or around operational databases as table stakes. 

AlloyDB: A Modern PostgreSQL database for generative AI workloads

AlloyDB is Google Cloud’s fully managed PostgreSQL-compatible database designed for superior performance, scale, and availability. Today, we’re announcing that AlloyDB AI is generally available in both AlloyDB and AlloyDB Omni. Built for the future, AlloyDB:

is optimized for enterprise gen AI apps that need real-time and accurate responses
delivers superior performance for transactional, analytical, and vector workloads
runs anywhere, including on-premises and on other clouds, enabling customers to modernize and innovate wherever they are.

Customers such as Character AI, FLUIDEFI, B4A and Regnology, are using AlloyDB to power their applications. For example, Regnology’s regulatory reporting chatbot leverages natural language processing to understand complex regulatory terminology and queries.

“AlloyDB acts as a dynamic vector store, indexing repositories of regulatory guidelines, compliance documents, and historical reporting data to ground the chatbot. Compliance analysts and reporting specialists interact with the chatbot in a conversational manner, saving time and addressing diverse regulatory reporting questions.” – Antoine Moreau, CIO, Regnology

Vector search across all Google Cloud databases 

Vector search has emerged as a critical capability for building useful and accurate gen AI-powered applications, making it easier to find similar search results of unstructured data such as text and images from a product catalog using a nearest neighbor algorithm. Today, we’re announcing vector search across several Google Cloud databases, including Cloud SQL for MySQL, Memorystore for Redis, and Spanner, all in preview. 

Cloud SQL for MySQL now supports both approximate and exact nearest neighbor vector searches, adding to the pgvector capabilities we launched last year in Cloud SQL for PostgreSQL and AlloyDB. Developers can now store millions of vectors in the same MySQL instances they are already using. By utilizing Cloud SQL for vector searches — whether on MySQL or PostgreSQL — you can store and perform vector searches directly in the same operational database you’re already using without having to learn or set up a new system. 

In addition, to provide ultra-fast performance for your gen AI applications, we’re launching built-in support for vector storage and search for Memorystore for Redis. Each Redis instance will be capable of storing tens of millions of vectors and can perform vector search at single digit millisecond latency. This provides an ultra-low-latency data store for a variety of use cases such as LLM semantic caching and recommendation systems. 

Spanner can scale vector searches for highly partitionable workloads. Large-scale vector workloads that involve billions of vectors and millions of queries per second can be challenging for many systems. These workloads are a great fit for Spanner’s exact nearest neighbor search because Spanner can efficiently reduce the search space to provide accurate, real-time results with low latency.

Accelerating ecosystem support for LangChain

LangChain has grown to be one of the most popular open-source LLM orchestration frameworks. In our efforts to provide application developers with tools to help them quickly build gen AI apps, we are open-sourcing LangChain integrations for all of our Google Cloud databases. We will support three LangChain Integrations that include Vector stores, Document loaders, and Chat Messages Memory.

By leveraging the power of LangChain with our databases, developers can now easily create context-aware gen AI applications, faster. The LangChain integration provides them built-in Retrieval Augmented Generation (RAG) workflows across their preferred data source, using their choice of enterprise-grade Google Cloud database. Example use cases include personalized product recommendations, question answering, document search and synthesis, and customer service automation. 

Integration with specific LangChain components simplifies the process of incorporating Google databases into applications. Supported components include:

Vector stores, to support vector similarity queries. The LangChain Vector stores integration is available for AlloyDB, Cloud SQL for PostgreSQL, Cloud SQL for MySQL, Memorystore for Redis, and Spanner.
Document loaders, which allow for seamless data loading from various external sources such as web page content, or a transcript of a YouTube video. 
Chat Messages Memory, allowing storage of chat history for future reference by providing deeper context from past conversations.

Both the Document loaders and Memory integrations are available for all Google Cloud databases including AlloyDB, Firestore, Bigtable, Memorystore for Redis, Spanner, and Cloud SQL for MySQL, PostgreSQL, and SQL Server.  

These packages are now available on GitHub.

Embrace an AI-driven future

There’s a wealth of data in operational databases just waiting to power the next transformative gen AI models and applications. By enhancing AlloyDB AI for enterprise grade production workloads, adding extensive vector search capabilities across our database portfolio, and embracing generative AI frameworks from the community, developers have the tools they need to start adding intelligent, accurate, and helpful gen AI capabilities to their applications that are grounded on the wealth of data in their enterprise databases. 

To learn more about how to get started, join our live Data Cloud Innovation Live webinar on March 7, 2024, 9 – 10 AM PST, where you’ll hear from product engineering leaders about these latest innovations.

Source : Data Analytics Read More

Serverless data architecture for trade surveillance at Deutsche Bank

Serverless data architecture for trade surveillance at Deutsche Bank

Ensuring compliance with regulatory requirements is crucial for every bank’s business. While financial regulation is a broad area, detecting and preventing market manipulation and abuse is absolutely mission-critical for an investment bank of Deutsche Bank’s size. This is called trade surveillance.

At Deutsche Bank, the Compliance Technology division is responsible for the technical implementation of this control function. To do this, the Compliance Technology team retrieve data from various operational systems in the front office and performs scenario calculations to monitor the trades executed by all of the bank’s business lines. If any suspicious patterns are detected, a compliance officer receives an internal alert to investigate the issue for resolution.

The input data comes from a broad range of systems, but the most relevant are market, trade, and reference data. Historically, provisioning data for compliance technology applications from front-office systems required the team to copy data between, and often even within, many different analytical systems, leading to data quality and lineage issues as well as increased architectural complexity. At the same time, executing trade surveillance scenarios includes processing large volumes of data, which requires a solution that can store and process all the data using distributed compute frameworks like Apache Spark.

A new architectural approach

Google Cloud can help solve the complex issues of processing and sharing data at scale across a large organization with its comprehensive data analytics ecosystem of products and services. BigQuery, Google Cloud’s serverless data warehouse, and Dataproc, a managed service for running Apache Spark workloads, are well positioned to support data-heavy business use cases, such as trade surveillance.

The Compliance Technology team decided to leverage these managed services from Google Cloud in their new architecture for trade surveillance. In the new architecture, the operational front-office systems act as publishers that present their data in BigQuery tables. This includes trade, market and reference data that is now available in BigQuery to various data consumers, including the Trade Surveillance application. As the Compliance Technology team doesn’t need all the data that is published from the front-office systems, they can create multiple views derived from only the input data that includes the required information needed to execute trade surveillance scenarios.

Scenario execution involves running trade surveillance business logic in the form of various different data transformations in BigQuery, Spark in Dataproc, and other applications. This business logic is where suspicious trading patterns, indicating market abuse or market manipulation, can be detected. Suspicious cases are written to output BigQuery tables and then processed through research and investigation workflows, where compliance officers perform investigations, detect potential false positives, or file a Suspicious Activity Report to the regulator if the suspicious case indicates a compliance violation.

Surveillance alerts are also retained and persistently stored to calculate how effective the detection is and improve the rate of how many false positives are actually detected. These calculations are run in Dataproc using Spark and in BigQuery using SQL. They are performed periodically and fed back into the trade surveillance scenario execution to further improve the surveillance mechanisms. Orchestrating the execution of ETL processes to derive data for executing trade surveillance scenarios and effectiveness calibrations is done through Cloud Composer, a managed service for workflow orchestration using Apache Airflow.

Here is a simplified view of what the new architecture looks like:

This is how the Compliance Technology team at Deutsche Bank describes the new architecture: 

“This new architecture approach gives us agility and elasticity to roll out new changes and behaviors much faster based on market trends and new emerging risks as e.g. cross product market manipulation is a hot topic our industry is trying to address in line with regulator’s expectations.”
– Asis Mohanty, Global Head, Trade Surveillance, Unauthorized Principal Trading Activity Technology, Deutsche Bank AG

“The serverless BigQuery based architecture enabled Compliance Technology to simplify the sharing of data between the front- and back-office whilst having a zero-data copy approach and aligning with the strategic data architecture.” 
– Puspendra Kumar, Domain Architect, Compliance Technology, Deutsche Bank AG

The benefits of a serverless data architecture

As the architecture shows above, trade surveillance requires various input sources of data. A major benefit of leveraging BigQuery for sourcing this data is that there is no need to copy data to make it available for usage by data consumers in Deutsche Bank. A more simplified architecture improves data quality and lowers cost by minimizing the amount of hops the data needs to take.

The main reason for not having to copy data is due to the fact that BigQuery does not have separate instances or clusters. Instead, every table is accessible by a data consumer as long as the consumer app has the right permissions and references the table URI in its queries (i.e., the Google Cloud project-id, the dataset name, and the table name). Thus, various consumers can access the data directly from their own Google Cloud projects without having to copy it and physically persist it there. 

For the Compliance Technology team to get the required input data to execute trade surveillance scenarios, they simply need to query the BigQuery views with the input data and the tables containing the derived data from the compliance-specific ETLs. This eliminates the need for copying the data, ensuring the data is more reliable and the architecture is more resilient due to fewer data hops. Above all, this zero-copy approach does enable data consumers in other teams in the bank besides trade surveillance to use market, trade and reference data by following the same pattern in BigQuery. 

In addition, BigQuery offers another advantage. It is closely integrated with other Google Cloud services, such as Dataproc and Cloud Composer, so orchestrating ETLs is seamless, leveraging Apache Airflow’s out-of-the-box operators for BigQuery. There is also no need to perform any copying of data to process data from BigQuery using Spark. Instead, an out-of-the-box connector allows data to be read via the BigQuery Storage API, which is optimized for streaming large volumes of data directly to Dataproc workers in parallel ensuring fast processing speed. 

Finally, storing data in BigQuery enables data producers to leverage Google Cloud’s native, out-of-the-box tooling for ensuring data quality, such as Dataplex automatic data quality. With this service, it’s possible to configure rules for data freshness, accuracy, uniqueness, completeness, timeliness, and various other dimensions and then simply execute them against the data stored in BigQuery. This happens fully serverless and automated without the need to provision any infrastructure for the rules execution and data quality enforcement. As a result, the Compliance Technology team can ensure that the data they receive from front-office systems complies with the required data quality standards, thus adding to the value of the new architecture. 

Given the fact that the new architecture leverages integrated and serverless data analytics products and managed services from Google Cloud, the Compliance Technology team can now fully focus on the business logic of their Trade Surveillance application. BigQuery stands out here because it doesn’t require any maintenance windows, version upgrades, upfront sizing or hardware replacements, as opposed to running a large-scale, on-premises Hadoop cluster. 

This brings us to the final advantage, namely the cost-effectiveness of the new architecture. In addition to allowing team members to now focus on business-relevant features instead of dealing with infrastructure, the architecture makes use of services which are charged based on a pay-as-you-go model. Instead of running the underlying machines in 24/7 mode, compute power is only brought up when needed to perform compliance-specific ETLs, execute the trade surveillance scenarios, or perform effectiveness calibration, which are all batch processes. This again helps further reduce the cost compared to an always-on, on-prem solution. 

Here’s the view from Deutsche Bank’s Compliance Technology team about the associated benefits: 

“Our estimations show that we can potentially save up to 30% in IT Infrastructure cost and achieve better risk coverage and Time to Market when it comes to rolling out additional risk and behaviors with this new serverless architecture using BigQuery.” 
Sanjay-Kumar Tripathi, Managing Director, Global Head of Communication Surveillance Technology & Compliance Cloud Transformation Lead, Deutsche Bank AG

Source : Data Analytics Read More

Looker Hackathon 2023 results: Best hacks and more

Looker Hackathon 2023 results: Best hacks and more

In December, the Looker team invited our developer and data community to collaborate, learn, and inspire each other at our annual Looker Hackathon. More than 400 participants from 93 countries joined together, hacked away for 48 hours and created 52 applications, tools, and data experiences. The hacks use Looker and Looker Studio’s developer features, data modeling, visualizations and other Google Cloud services like BigQuery and Cloud Functions.

For the first time in Looker Hackathon history, we had two hacks tie for the award of the Best Hack. See the winners below and learn about the other finalists from the event. In every possible case, we have included links to code repositories or examples to enable you to reproduce these hacks.

Best Hack winners

DashNotes: Persistent dashboard annotations

By Ryan J, Bartosz G, Tristan F

Have you ever wanted to take note of a juicy data point you found after cycling through multiple filterings of your data? You could write your notes in an external notes application, but then you might lose the dashboard and filter context important to your discovery. This Best Hack allows you to take notes right from within your Looker dashboard. Using the Looker Custom Visualization API, it creates a dashboard tile for you to create and edit text notes. Each note records the context around its creation, including the original dashboard and filter context. The hack stores the notes in BigQuery to persist the notes across sessions. Check out the GitHub repository for more details.

Document repository sync automation

By Mehul S, Moksh Akash M, Rutuja G, Akash

Does your organization struggle to maintain documentation on an increasing number of ever-changing dashboards? This Best Hack helps your organization automatically generate current detailed documentation on all your dashboards, for simplified administration. The automation uses the Looker SDK, the Looker API, and serverless Cloud Functions to parse your LookML for useful metadata, and stores it in BigQuery. Then the hack uses LookML to model and display the metadata inside a Looker dashboard. Checkout the GitHub repository for the backend service and the GitHub repository for the LookML for more details.

Nearly Best Hack winner

Querying Python services from a Looker dashboard

By Jacob B, Illya M

If your Looker dashboard had the power to query any external service, what would you build? This Nearly Best Hack explores how your Looker Dashboard can communicate with external Python services. It sets up a Python service to mimic a SQL server and serves it as a Looker database connection for your Looker dashboard to query. Then, clever LookML hacks enable your dashboard buttons to send data to the external Python service, creating a more interactive dashboard. This sets up a wide array of possibilities to enhance your Looker data experience. For example, with this hack, you can deploy a trained ML model from Google Cloud’s Vertex AI in your external service to deliver keen insights about your data. Check out the GitHub repository for more details.

Finalists

What do I watch?

By Hamsa N, Shilpa D

We’ve all had an evening when we didn’t know what movie to watch. You can now tap into a Looker dashboard that recommends ten movies you might like based on your most liked movie from IMDB’s top 1000 movies. The hack analyzes a combination of genre, director, stars, and movie descriptions, using natural language processing techniques. The resulting processed data resides in BigQuery, with LookML modeling the data. Check out the GitHub repository for more details.

Template analytics

By Ehsan S

If you need to determine which customer segment will be most effective to market to, check out this hack, which performs Recency, Frequency, Monetary (RFM) analysis on data from a Google Sheet to help you segment customers based on their last transaction recency, how often they’ve purchased, and how much they’ve spent over time. You provide the custom Looker Studio Community Connector, along with a Google Sheet, and the connector performs RFM analysis on your Google Sheet’s data. The hack’s Looker Studio report visualizes the results to give an overview of your customer segments and behavior. Check out the Google Apps Script code for more details.

LOV filter app

By Markus B

This hack implements a List of Values (LOV) filter that enables you to have the values of one dimension filter a second dimension. For example, take two related dimensions: “id” and “name”. The “name” dimension may change, while the “id” dimension always stays constant.

This hack uses Looker’s Extension Framework and Looker Components to show “name” values in the LOV filter that translate to “id” values in an embedded dashboard’s filter. This helps your stakeholders filter on values they’re familiar with and keeps your data model flexible and robust. Check out the GitLab repository for more details.

Looker accelerator

By Dmitri S, Joy S, Oleksandr K

This collection of open-source LookML dashboard templates provides insight into Looker project performance and usage. The dashboards use Looker’s System Activity data and are a great example of using LookML to create reusable dashboards. In addition, you can conveniently install the Looker Block of seven dashboards through the Looker Marketplace (pending approval) to help your Looker developer or admin to optimize your Looker usage. Check out the GitHub repository for more details.

The SuperViz Earth Explorer

By Ralph S

With this hack, you can visually explore the population and locations of cities across the world on an interactive 3D globe, and can filter the size of the cities in real time as the globe spins. This custom visualization uses the Looker Studio Community Visualization framework with the clever combination of three.js, a 3D Javascript library, and clever graphics hacks to create a visual experience. Check out the GitHub repository for more details.

dbt exposure generator

By Dana H.

Are you using dbt models with Looker? This hack automatically generates dbt exposures to help you debug and identify how your dbt models are used by Looker dashboards. This hack serves as a great example of how our Looker SDK and Looker API can help solve a common pain point for developers. Check out the GitHub repository for more details.

Hacking Looker for fun and community

At Looker Hackathon 2023, our developer community once again gave us a look into how talented, creative, and collaborative they are. We saw how our developer features like Looker Studio Community Visualizations, LookML, and Looker API, in combination with Google Cloud services like Cloud Functions and BigQuery, enable our developer community to build powerful, useful — and sometimes entertaining — tools and data experiences.

We hope these hackathon projects inspire you to build something fun, innovative, or useful for you. Tap into our linked documentation and code in this post to get started, and we will see you at the next hackathon!

Source : Data Analytics Read More

Unlock Web3 data with BigQuery and Subsquid

Unlock Web3 data with BigQuery and Subsquid

Editor’s note: The post is part of a series showcasing partner solutions that are Built with BigQuery.

Blockchains generate a lot of data with every transaction. The beauty of Web3 is that all of that data is publicly available. But the multichain and modular expansion of the space has increased the complexity of accessing data, where any project looking to build cross-chain decentralized apps (DApps) has to figure out how to tap into on-chain data that is stored in varying locations and formats.

Meanwhile, running indexers to extract the data and make it readable is a time-consuming, resource-intensive endeavor often beyond small Web3 teams’ capabilities, since proficiency in coding smart contracts and building indexers are entirely different skills.

Having recognized the challenges for builders to leverage one of the most valuable pieces of Web3 (its data!), the Subsquid team set out to build a fully decentralized solution that drastically increases access to data in a permissionless manner.

Subsquid explained

One way to think about the Subsquid Network is as Web3’s largest decentralized data lake — existing to ingest, normalize, and structure data from over 100 Ethereum Virtual Machines (EVM) and non-EVM chains. It allows devs to quickly access (‘query’) data more granularly — and vastly more efficiently — than via legacy RPC node infrastructure.

Subsquid Network is horizontally scalable, meaning it can grow alongside archival blockchain data storage. Its query engine is optimized to extract large amounts of data and is a perfect fit for both dApp development (indexing) and for analytics. In fact, a total of over 11 billion dollars in decentralized application and L1/L2 value depends on Subsquid indexing.

Since September, Subsquid has been shifting from its initial architecture to a permissionless and decentralized format. So far during the testnet, 30,000 participants — including tens of thousands of developers — have built and deployed over 40,000 indexers. Now, the Subsquid team is determined to bring this user base and its data to Google BigQuery.

BigQuery and blockchain

BigQuery is a powerful enterprise data warehouse solution that allows companies and individuals to store and analyze petabytes of data. Designed for large-scale data analytics, BigQuery supports multi-cloud deployments and offers built-in machine learning capabilities, enabling data scientists to create ML models with simple SQL.

BigQuery is also fully integrated with Google’s own suite of business intelligence and external tools, empowering users to run their own code inside BigQuery using Jupyter Notebooks or Apache Zeppelin.

Since 2018, Google has added support for blockchains like Ethereum and Bitcoin to BigQuery. Then, earlier this year, the on-chain data of 11 additional layer-1 blockchain architectures was integrated into BigQuery, including Avalanche, Fantom, NEAR, Polkadot, and Tron.

But while it’s great to be able to run analytics on public blockchain data, this might not always offer exactly the data a particular developer needs for their app. This is where Subsquid comes in.

Data superpowers for Web3 devs and analysts

Saving custom-curated data to BigQuery lets developers leverage Google’s analytics tools to gain insights into how their product is used, beyond the context of one chain or platform.

Multi-chain projects can leverage Subsquid in combination with BigQuery to quickly analyze their usage on different chains and gain insights into fees, operating costs, and trends. With BigQuery, they aren’t limited to on-chain data either. After all, Google is the company behind Google Analytics, an advanced analytics suite for website traffic.

Web3 Data Unlocked: Indexing Web3 Data with Subsquid & Google BigQuery

Subsquid Developer relations engineer Daria A. demonstrates how to store data indexing using Subsquid to BigQuery and other tools

Analyzing across domains by combining sets of on-chain activity with social media data and website traffic can help projects understand retention and conversion in their projects while identifying points where users drop off, to further improve their workflows.

“BigQuery is quickly becoming an essential product in Web3, as it enables builders to query and analyze one’s own data, as well as to leverage a rich collection of datasets already compiled by others. Since it’s SQL based, it’s extremely easy to explore any data and then run more and more complex queries. With a rich API and complete developer toolkit, it can be connected to virtually anything,” writes Dmitry Zhelezov, Subsquid CEO and co-founder.

“Now, with the addition of Subsquid indexing, Web3 developers literally have data superpowers. They can build a squid indexer from scratch or use an existing one to get exactly the data they need extremely efficiently. We can’t wait to see what this unlocks for builders.”

Get started with Subsquid on BigQuery today

Subsquid’s support for BigQuery is already feature-complete. Are you interested in incorporating this tool into your Web3 projects? Find out more in the documentation. You can also view an example project demoed on YouTube and open-sourced on GitHub.

The Built with BigQuery advantage for Data Providers and ISVs

Built with BigQuery helps companies like Subsquid build innovative applications with Google Data and AI Cloud. Participating companies can:

Accelerate product design and architecture through access to designated experts who can provide insight into key use cases, architectural patterns, and best practices.Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

BigQuery gives Data Providers and ISVs the advantage of a powerful, highly scalable unified AI lakehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. Click here to learn more about Built with BigQuery.

Source : Data Analytics Read More