Blog

Introducing multimodal and structured data embedding support in BigQuery

Introducing multimodal and structured data embedding support in BigQuery

Embeddings represent real-world objects, like entities, text, images, or videos as an array of numbers (a.k.a vectors) that machine learning models can easily process. Embeddings are the building blocks of many ML applications such as semantic search, recommendations, clustering, outlier detection, named entity extraction, and more. Last year, we introduced support for text embeddings in BigQuery, allowing machine learning models to understand real-world data domains more effectively and earlier this year we introduced vector search, which lets you index and work with billions of embeddings and build generative AI applications on BigQuery.

At Next ’24, we announced further enhancement of embedding generation capabilities in BigQuery with support for:

Multimodal embeddings generation in BigQuery via Vertex AI’s multimodalembedding model, which lets you embed text and image data in the same semantic space

Embedding generation for structured data using PCA, Autoencoder or Matrix Factorization models that you train on your data in BigQuery

Multimodal embeddings

Multimodal embedding generates embedding vectors for text and image data in the same semantic space (vectors of items similar in meaning are closer together) and the generated embeddings have the same dimensionality (text and image embeddings are the same size). This enables a rich array of use cases such as embedding and indexing your images and then searching for them via text. 

You can start using multimodal embedding in BigQuery using the following simple flow. If you like, you can take a look at our overview video which walks through a similar example.

Step 0: Create an object table which points to your unstructured data
You can work with unstructured data in BigQuery via object tables. For example, if you have your images stored in a Google Cloud Storage bucket on which you want to generate embeddings, you can create a BigQuery object table that points to this data without needing to move it. 

To follow along the steps in this blog you will need to reuse an existing BigQuery CONNECTION or create a new one following instruction here. Ensure that the principal of the connection used has the ‘Vertex AI User’ role and that the Vertex AI API is enabled for your project. Once the connection is created you can create an object table as follows:

code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE EXTERNAL TABLErn `bqml_tutorial.met_images`rnWITH CONNECTION `Location.ConnectionID`rnOPTIONSrn( object_metadata = ‘SIMPLE’,rn uris = [‘gs://gcs-public-data–met/*’]rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9807b70a00>)])]>

In this example, we are creating an object table that contains public domain art images from The Metropolitan Museum of Art (a.k.a. “The Met”) using a public Cloud Storage bucket that contains this data. The resulting object table has the following schema:

Let’s look at a sample of these images. You can do this using a BigQuery Studio Colab notebook by following instructions in this tutorial. As you can see, the images represent a wide range of objects and art pieces.

Image source: The Metropolitan Museum of Art

Now that we have the object table with images, let’s create embeddings for them.

Step 1: Create model
To generate embeddings, first create a BigQuery model that uses the Vertex AI hosted ‘multimodalembedding@001’ endpoint.

code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODELrn bqml_tutorial.multimodal_embedding_model REMOTErnWITH CONNECTION `LOCATION.CONNNECTION_ID`rnOPTIONS (endpoint = ‘multimodalembedding@001’)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9804081220>)])]>

Note that while the multimodalembedding model supports embedding generation for text, it is specifically designed for cross-modal semantic search scenarios, for example, searching images given text. For text-only use cases, we recommend using the textembedding-gecko@ model instead.

Step 2: Generate embeddings 
You can generate multimodal embeddings in BigQuery via the ML.GENERATE_EMBEDDING function. This function also works for generating text embeddings (via textembedding-gecko model) and structured data embeddings (via PCA, AutoEncoder and Matrix Factorization models). To generate embeddings, simply pass in the embedding model and the object table you created in previous steps to the ML.GENERATE_EMBEDDING function.

code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE TABLE `bqml_tutorial.met_image_embeddings`rnASrnSELECT * FROM ML.GENERATE_EMBEDDING(rn MODEL `bqml_tutorial.multimodal_embedding_model`,rn TABLE `bqml_tutorial.met_images`)rnWHERE content_type = ‘image/jpeg’rnLimit 10000″), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9804081430>)])]>

To reduce the tutorial’s runtime, we limit embedding generation to 10,000 images. This query will take 30 minutes to 2 hours to run. Once this step is completed you can see a preview of the output in BigQuery Studio. The generated embeddings have a dimension of 1408.

Step 3 (optional): Create a vector index on generated embeddings
While the embeddings generated in the previous step can be persisted and used directly in downstream models and applications, we recommend creating a vector index for improving embedding search performance and enabling the nearest-neighbor query pattern. You can learn more about vector search in BigQuery here.

code_block
<ListValue: [StructValue([(‘code’, “– Create a vector index on the embeddingsrnrnCREATE OR REPLACE VECTOR INDEX `met_images_index`rnON bqml_tutorial.met_image_embeddings(ml_generate_embedding_result)rnOPTIONS(index_type = ‘IVF’,rn distance_type = ‘COSINE’)”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9803cc6fa0>)])]>

Step 4: Use embeddings for text-to-image (cross-modality) search
You can now use these embeddings in your applications. For example, to search for “pictures of white or cream colored dress from victorian era” you first embed the search string like so:

code_block
<ListValue: [StructValue([(‘code’, ‘– embed search stringrnrnCREATE OR REPLACE TABLE `bqml_tutorial.search_embedding`rnASrnSELECT * FROM ML.GENERATE_EMBEDDING(rn MODEL `bqml_tutorial.multimodal_embedding_model`,rn (rn SELECT “pictures of white or cream colored dress from victorian era” AS contentrn )rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9803cc6bb0>)])]>

You can now use the embedded search string to find similar (nearest) image embeddings as follows:

code_block
<ListValue: [StructValue([(‘code’, ‘– use the embedded search string to search for imagesrnrnCREATE OR REPLACE TABLErn `bqml_tutorial.vector_search_results` ASrnSELECTrn base.uri AS gcs_uri,rn distancernFROMrn VECTOR_SEARCH( TABLE `bqml_tutorial.met_image_embeddings`,rn “ml_generate_embedding_result”,rn TABLE `bqml_tutorial.search_embedding`,rn “ml_generate_embedding_result”,rn top_k => 5)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9807772760>)])]>

Step 5: Visualize results
Now let’s visualize the results along with the computed distance and see how we performed on the search query “pictures of white or cream colored dress from victorian era”. Refer the accompanying tutorial on how to render this output using a BigQuery notebook.

Image source: The Metropolitan Museum of Art

The results look quite good!

Wrapping up

In this blog, we demonstrated a common vector search usage pattern but there are many other use cases for embeddings. For example, with multimodal embeddings you can perform zero-shot classification of images by converting a table of images and a separate table containing sentence-like labels to embeddings. You can then classify images by computing distance between images and each descriptive label’s embedding. You can also use these embeddings as input for training other ML models, such as clustering models in BigQuery to help you discover hidden groupings in your data. Embeddings are also useful wherever you have free text input as a feature, for example, embeddings of user reviews or call transcripts can be used in a churn prediction model, embeddings of images of a house can be used as input features in a price prediction model etc. You can even use embeddings instead of categorical text data when such categories have semantic meaning, for example, product categories in a deep-learning recommendation model.

In addition to multimodal and text embeddings, BigQuery also supports generating embeddings on structured data using PCA, AUTOENCODER and Matrix Factorization models that have been trained on your data in BigQuery. These embeddings have a wide range of use cases. For example, embeddings from PCA and AUTOENCODER models can be used for anomaly detection (embeddings further away from other embeddings are deemed anomalies) and as input features to other models, for example, a sentiment classification model trained on embeddings from an autoencoder. Matrix Factorization models are classically used for recommendation problems, and you can use them to generate user and item embeddings. Then, given a user embedding you can find the nearest item embeddings and recommend these items, or cluster users so that they can be targeted with specific promotions.

To generate such embeddings, first use the CREATE MODEL function to create a PCA, AutoEncoder or Matrix Factorization model and pass in your data as input, and then use ML.GENERATE_EMBEDDING function providing the model, and a table input to generate embeddings on this data.

Getting started

Support for multimodal embeddings and support for embeddings on structured data in BigQuery is now available in preview. Get started by following our documentation and tutorials. Have feedback? Let us know what you think at bqml-feedback@google.com.

Source : Data Analytics Read More

Announcing Delta Lake support for BigQuery

Announcing Delta Lake support for BigQuery

Delta Lake is an open-source optimized storage layer that provides a foundation for tables in lake houses and brings reliability and performance improvements to existing data lakes. It sits on top of your data lake storage (like cloud object stores) and provides a performant and scalable metadata layer on top of data stored in the Parquet format. 

Organizations use BigQuery to manage and analyze all data types, structured and unstructured, with fine-grained access controls. In the past year, customer use of BigQuery to process multiformat, multicloud, and multimodal data using BigLake has grown over 60x. Support for open table formats gives you the flexibility to use existing open source and legacy tools while getting the benefits of an integrated data platform. This is enabled via BigLake — a storage engine that allows you to store data in open file formats on cloud object stores such as Google Cloud Storage, and run Google-Cloud-native and open-source query engines on it in a secure, governed, and performant manner. BigLake unifies data warehouses and lakes by providing an advanced, uniform data governance model. 

This week at Google Cloud Next ’24, we announced that this support now extends to the Delta Lake format, enabling you to query Delta Lake tables stored in Cloud Storage or Amazon Web Services S3 directly from BigQuery, without having to export, copy, nor use manifest files to query the data. 

Why is this important? 

If you have existing dependencies on Delta Lake and prefer to continue utilizing Delta Lake, you can now leverage BigQuery native support. Google Cloud provides an integrated and price-performant experience for Delta Lake workloads, encompassing unified data management, centralized security, and robust governance. Many customers already harness the capabilities of Dataproc or Serverless Spark to manage Delta Lake tables on Cloud Storage. Now, BigQuery’s native Delta Lake support enables seamless delivery of data for downstream applications such as business intelligence, reporting, as well as integration with Vertex AI. This lets you do a number of things, including: 

Build a secure and governed lakehouse with BigLake’s fine-grained security model

Securely exchange Delta Lake data using Analytics Hub 

Run data science workloads on Delta Lake using BigQuery ML and Vertex AI 

How to use Delta Lake with BigQuery

Delta Lake tables follow the same table creation process as BigLake tables. 

Required roles

To create a BigLake table, you need the following BigQuery identity and access management (IAM) permissions: 

bigquery.tables.create 

bigquery.connections.delegate

Prerequisites

Before you create a BigLake table, you need to have a dataset and a Cloud resource connection that can access Cloud Storage.

Table creation using DDL

Here is the DDL statement to create a Delta lake Table

code_block
<ListValue: [StructValue([(‘code’, ‘CREATE EXTERNAL TABLE `PROJECT_ID.DATASET.DELTALAKE_TABLE_NAME`rnWITH CONNECTION `PROJECT_ID.REGION.CONNECTION_ID`rnOPTIONS (rn format =”DELTA_LAKE”,rn uris=[‘DELTA_TABLE_GCS_BASE_PATH’]);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e476f76ed00>)])]>

Querying Delta Lake tables

After creating a Delta Lake BigLake table, you can query it using GoogleSQL syntax, the same as you would a standard BigQuery table. For example:

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT FIELD1, FIELD2 FROM `PROJECT_ID.DATASET.DELTALAKE_TABLE_NAME`’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e476f76e610>)])]>

You can also enforce fine-grained security at the table level, including row-level and column-level security. For Delta Lake tables based on Cloud Storage, you can also use dynamic data masking.

Conclusion

We believe that BigQuery’s support for Delta Lake is a major step forward for customers building lakehouses using Delta Lake. This integration will make it easier for you to get insights from your data and make data-driven decisions. We are excited to see how you use Delta Lake and BigQuery together to solve their business challenges. For more information on how to use Delta Lake with BigQuery, please refer to the documentation.

Acknowledgments: Mahesh Bogadi, Garrett Casto, Yuri Volobuev, Justin Levandoski, Gaurav Saxena, Manoj Gunti, Sami Akbay, Nic Smith and the rest of the BigQuery Engineering team.

Source : Data Analytics Read More

BigQuery is now your single, unified AI-ready data platform

BigQuery is now your single, unified AI-ready data platform

Eighty percent of data leaders believe that the lines between data and AI are blurring. Using large language models (LLMs) with your business data can give you a competitive advantage, but to realize this advantage, how you structure, prepare, govern, model, and scale your data matters. 

Tens of thousands of organizations already choose BigQuery and its integrated AI capabilities to power their data clouds. But in a data-driven AI era, organizations need a simple way to manage all of their data workloads. Today, we’re going a step further and unifying key data Google Cloud analytics capabilities under BigQuery, which is now the single, AI-ready data analytics platform. BigQuery incorporates key capabilities from multiple Google Cloud analytics services into a single product experience that offers the simplicity and scale you need to manage structured data in BigQuery tables, unstructured data like images, audience and documents, and streaming workloads, all with the best price-performance. 

BigQuery helps you:

Scale your data and AI foundation with support for all data types and open formats

Eliminate the need for upfront sizing and just simply bring your data, at any scale, with a fully managed, serverless workload management model and universal metastore

Increase flexibility and agility for data teams to collaborate by bringing multiple languages and engines (SQL, Spark, Python) to a single copy of data 

Support the end-to-end data to AI lifecycle with built-in high availability, data governance, and enterprise security features

Simplify analytics with a unified product experience designed for all data users and AI-powered assistive and collaboration features

With your data in BigQuery, you can quickly and efficiently bring gen AI to your data and take advantage of LLMs. BigQuery simplifies multimodal generative AI for the enterprise by making Gemini models available through BigQuery ML and BigQuery DataFrames. It helps you unlock value from your unstructured data, with its expanded integration with Vertex AI’s document processing and speech-to-text APIs, and its vector capabilities to enable AI-powered search for your business data. The insights from combining your structured and unstructured data can be used to further fine-tune your LLMs.

Support for all data types and open formats

Customers use BigQuery to manage all data types, structured and unstructured, with fine-grained access controls and integrated governance. BigLake, BigQuery’s unified storage engine, supports open table formats which let you use existing open-source and legacy tools to access structured and unstructured data while benefiting from an integrated data platform. BigLake supports all major open table formats, including Apache Iceberg, Apache Hudi and now Delta Lake natively integrated with BigQuery. It provides a fully managed experience for Iceberg, including DDL, DML and streaming support. 

Your data teams need access to a universal definition of data, whether in structured, unstructured or open formats. To support this, we are launching BigQuery metastore, a managed, scalable runtime metadata service that provides universal table definitions and enforces fine-grained access control policies for analytics and AI runtimes. Supported runtimes include Google Cloud, open source engines (through connectors), and 3rd party partner engines.

Use multiple languages and serverless engines on a single copy of data

Customers increasingly want to run multiple languages and engines on a single copy of their data, but the fragmented nature of today’s analytics and AI systems makes this challenging. You can now bring the programmatic power of Python and PySpark right to your data without having to leave BigQuery. 

BigQuery DataFrames brings the power of Python together with the scale and ease of BigQuery with a minimum learning curve. It implements over 400 common APIs from pandas and scikit-learn by transparently and optimally converting methods to BigQuery SQL and BigQuery ML SQL. This breaks the barriers of client side capabilities, allowing data scientists to explore, transform and train on terabytes of data and processing horsepower of BigQuery.

Apache Spark has become a popular data processing runtime, especially for data engineering tasks. In fact, customers’ use of serverless Apache Spark in Google Cloud increased by over 500% in the past year.1 BigQuery’s newly integrated Spark engine lets you process data using PySpark as you do with SQL. Like the rest of BigQuery, the Spark engine is completely serverless — no need to manage compute infrastructure. You can even create stored procedures using PySpark and call them from your SQL-based pipelines. 

Make decisions and feed ML models in near real-time

Data teams are also increasingly being asked to deliver real-time analytics and AI solutions, reducing the time between signal, insight, and action. BigQuery now helps make real-time streaming data processing easy with new support for continuous SQL queries, an unbounded SQL query that processes data the moment it arrives via SQL statement. BigQuery continuous queries amplifies downstream SaaS applications, like Salesforce, with the real-time enterprise knowledge of your data and AI platform. In addition, to support open source streaming workloads, we are announcing a preview of Apache Kafka for BigQuery. Customers can use Apache Kafka to manage streaming data workloads and feed ML models without the need to worry about version upgrades, rebalancing, monitoring and other operational headaches. 

Scale analytics and AI with governance and enterprise features 

To make it easier for you to manage, discover, and govern data, last year we brought data governance capabilities like data quality, lineage and profiling from Dataplex directly into BigQuery. We will be expanding BigQuery to include Dataplex’s enhanced search capabilities, powered by a unified metadata catalog, to help data users discover data and AI assets, including models and datasets from Vertex AI. Column-level lineage tracking in BigQuery is now available in preview, which will be followed by a preview for lineage for Vertex AI pipelines. Governance rules for fine-grained access control are also in preview, allowing businesses to define governance policies based on metadata.

For customers looking for enhanced redundancy across geographic regions, we are introducing managed disaster recovery for BigQuery. This feature, now in preview, offers automated failover of compute and storage and will offer a new cross-regional service level agreement (SLA) tailored for business-critical workloads. The managed disaster recovery feature provides standby compute capacity in the secondary region included in the price of BigQuery’s Enterprise Plus edition.

A unified experience for all data users

As Google Cloud’s single integrated platform for data analytics, BigQuery unifies how data teams work together with BigQuery Studio. Now generally available, BigQuery Studio gives data teams a collaborative data workspace that all data practitioners can use to accelerate their data-to-AI workflows. BigQuery Studio lets you use SQL, Python, PySpark, and natural language in a single unified analytics workspace, regardless of the data’s scale, format or location. B All development assets in BigQuery Studio are enabled with full lifecycle capabilities, including team collaboration and version control. Since BigQuery Studio’s launch at Next ‘23, hundreds of thousands of users are actively using the new interface.2

Gemini in BigQuery for AI assistive and collaborative experiences

We announced several new innovations for Gemini in BigQuery that help data teams with AI-powered experiences for data preparation, analysis and engineering as well as intelligent recommendations to enhance user productivity and optimize costs. BigQuery data canvas, an AI-centric experience with natural language input, makes data discovery, exploration, and analysis faster and more intuitive. AI augmented data preparation in BigQuery helps users to cleanse and wrangle their data and build low-code visual data pipelines, or rebuild legacy pipelines. Gemini in BigQuery also helps you write and edit SQL or Python code using simple natural language prompts, referencing relevant schemas and metadata.

How Deutsche Telekom is innovating with the BigQuery platform

“Deutsche Telekom built a horizontally scalable data platform in an innovative way that was designed to meet our current and future business needs. With BigQuery at the center of our enterprise’s One Data Ecosystem, we created a unified approach to maintain a single source of truth while fostering de-centralized usage of data across all of our data teams. With BigQuery and Vertex AI, we built a governed and scalable space for data scientists to experiment and productionize AI models while maintaining data sovereignty and federated access controls. This has allowed us to quickly deploy practical usage of LLMs to turbocharge our data engineering life cycle and unleash new business opportunities.” – Ashutosh Mishra, VP of Data Architecture, Deutsche Telekom

Start building your AI-ready data platform

To learn more and start building your AI-ready data platform, start exploring the next generation of BigQuery today. Read more about the latest innovations for Gemini in BigQuery and an overview of what’s next for data analytics at Google Cloud.

1. Google internal data – YoY growth of data processed using Apache Spark on Google Cloud compared with Feb ‘23. 
2. Since the August 2023 announcement of BigQuery Studio, monthly active users have continued to grow.

Source : Data Analytics Read More

Celebrating 20 years of Bigtable with exciting announcements at Next

Celebrating 20 years of Bigtable with exciting announcements at Next

How do you store the entire internet? That’s the problem our engineering team set out to solve in 2004 when it launched Bigtable, one of Google’s longest serving and largest data storage systems. As the internet — and Google — grew, we needed a new breed of storage solution to reliably handle millions of requests a second to store the ever-changing internet. And when we revealed its design to the world in a 2006 research paper, Bigtable kicked-off the Big Data revolution, inspiring the database architectures for NoSQL systems such as Apache HBase and Cassandra. 

Twenty years later, Bigtable doesn’t just support Google Search, but also latency-sensitive workloads across Google such as Ads, Drive, Analytics, Maps, and YouTube. On Google Cloud, big names like Snap, Spotify and Shopify rely on Bigtable, now serving a whopping 7 billion queries per second at peak. On any given day, it is nearly impossible to use the internet without interacting with a Bigtable database. 

Bigtable isn’t just for Big Tech, though. This year, our goal is to bring Bigtable to a much broader developer audience and range of use cases, starting with a number of capabilities that we announced this week at Google Cloud Next.

Introducing Bigtable Data Boost and Authorized Views

For one, Bigtable now supports Data Boost, a serverless way for users to perform analytical queries on their transactional data without impacting their operational workloads. Currently in preview, Data Boost makes managing multiple copies of data for serving and analytics a thing of the past. Further, Data Boost supports a requester-pays model, billing data consumers directly for their usage — a unique capability for an operational database. 

Then, new Bigtable Authorized Views enable many data sharing and collaboration scenarios. For example, retailers can securely share sales or inventory data with each of their vendors, so they can more accurately forecast demand and resupply the shelves — without worrying about how much server capacity to provision. This type of use case is quite common for organizations with multiple business units, but sharing this level of data has traditionally required keeping copies of data in multiple databases, building custom application layers and billing components. Instead, with Bigtable Authorized Views and Data Boost, each vendor will get their own bill for the amount of data they process, with no negative impact on retailer’s operations. Bigtable Authorized Views make it easier to serve data from a single source of truth, with improved data governance and quality. 

These features, along with the existing Request Priorities, stand to transform Bigtable into an all-purpose data fabric, or a Digital Integration Hub. Many Google Cloud customers already use Bigtable for their data fabrics, where its strong write performance, horizontal scalability and flexible schema make it an ideal platform for projects that ingest large amounts of data in batch from multiple sources or collate real-time streaming events. But businesses and their data evolve over time. New data sources are added through acquisitions, partnerships, new product launches, additional business metrics and ML features. To get the value out of data, you need to combine all the pieces and see the big picture — and do it in real-time. Bigtable has already solved the latency and database scaling problems, but features like Authorized Views and Data Boost help to solve data and resource governance issues. 

During the preview, Data Boost is offered at no cost. 

Boosting Bigtable performance for next-gen workloads

At Next, we also announced several Bigtable price-performance improvements. Bigtable now offers a new aggregate data type optimized for increment operations, which delivers significantly higher throughput and can be used to implement distributed counters and simplify Lambda architectures. You can also choose large nodes that offer more performance stability at higher server utilization rates, to better support spiky workloads. This is the first of workload-optimized node shapes that Bigtable will offer. All of these changes come on the heels of an increase in point-read throughput from 10K to 14K reads per second per node just a few months ago. Overall, these improvements mean lower TCO for a database already known for its price-performance.

These improvements could help power your modern analytics and machine learning (ML) workloads: ML is going real-time, and models are getting larger, with more and more variables that require flexible schemas and wide data structures. Analytics workloads are also moving towards wide-table designs with the so-called one big table (OBT) data model. Whether you need its flexible data model for very wide, gradually evolving tables; its scalable counters’ ability to provide real-time metrics at scale, or features like Data Boost and Request Priorities that allow seamless backfills and frequent model training (thereby combining real-time serving and batch processing into a single database), Bigtable simplifies the ML stack and reduces concept and data drift, uplifting ML model performance. 

With 20 years of learnings from running one of the world’s largest cloud databases, Bigtable is ready to tackle even more demanding workloads. If you’re at Google Cloud Next, stop by the following sessions to learn how Ford uses Bigtable for its vehicle telemetry platform, how Snap uses it their latency-sensitive workloads, how Shopify uses Bigtable to power its recommendation system, and about Palo Alto Networks’ journey from Apache Cassandra to Bigtable

Further resources

Unlock your data with Authorized Views

Make every click count with distributed counters

Discover new insights in your operational data with Data Boost’s isolated, serverless analytical processing

Source : Data Analytics Read More

Get to know BigQuery data canvas: an AI-centric experience to reimagine data analytics

Get to know BigQuery data canvas: an AI-centric experience to reimagine data analytics

Navigating the complexities of the data-to-insights journey can be frustrating. Data professionals spend valuable time sifting through data sources, reinventing the wheel with each new question that comes their way. They juggle multiple tools, hop between coding languages, and collaborate with a wide array of teams across their organizations. This fragmented approach is riddled with bottlenecks, preventing analysts from generating insights and doing high-impact work as quickly as they should.

Yesterday at Google Cloud Next ‘24, we introduced BigQuery data canvas, which reimagines how data professionals work with data. This novel user experience helps customers create graphical data workflows that map to their mental model while AI innovations accelerate finding, preparing, analyzing, visualizing and sharing data and insights.

Watch this video for a quick overview of BigQuery data canvas.

BigQuery data canvas: a NL-driven analytics experience

BigQuery data canvas makes data analytics faster and easier with a unified, natural language-driven experience that centralizes data discovery, preparation, querying, and visualization. Rather than toggling between multiple tools, you can now use data canvas to focus on the insights that matter most to your business. Data canvas addresses the challenges of traditional data analysis workflow in two areas:

Natural language-centric experience: Instead of writing code, you can now speak directly to your data. Ask questions, direct tasks, and let the AI guide you through various analytics tasks.

Reimagined user experience: Data canvas rethinks the notebook concept. Its expansive canvas workspace fosters iteration and easy collaboration, allowing you to refine your work, chain results, and share workspaces with colleagues.

For example, to analyze a recent marketing campaign with BigQuery data canvas, you could use natural language prompts to discover campaign data sources, integrate them with existing customer data, derive insights, collaborate with teammates and share visual reports with executives — all within a single canvas experience.

Natural language-based visual workflow with BigQuery data canvas

Do more with BigQuery data canvas 

BigQuery provides a variety of features that can help analysts accelerate their analytics tasks:

Search and discover: Easily find the specific data asset visualization table or view that you need to work with. Or search for the most relevant data assets. Data canvas works with all data that can be managed with BigQuery, including BigQuery managed storage, BigLake, Google Cloud Storage objects, and BigQuery Omni tables. For example, you could use either of the follow inputs to pull data with data canvas:

Specific table: project_name.dataset_name.table_name

Search: “customer transaction data” or “projectid:my-project-name winter jacket sales Atlanta”

Explore data assets: Review the table schema, review their details or preview data and compare it side by side.

Generate SQL queries: Iterate with NL inputs to generate the exact SQL query you need to accomplish the analytics task at hand. You can also edit the SQL before executing it. 

Combine results: Define joins with plain language instructions and refine the generated SQL as needed. Use query results as a starting point for further analysis with prompts like “Join this data with our customer demographics on order id.”

Visualize: Use natural language prompts to easily create and customize charts and graphs to visualize your data, e.g., “create a bar chart with gradient” Then, seamlessly share your findings by exporting your results to Looker Studio or Google Sheets.

Automated insights: Data canvas can interpret query results and chart data and generate automated insights from them. For example, it can look at the query results of sales deal sizes and automatically provide the insight “the median deal size is $73,500.”

Share to collaborate: Data analytics projects are often a team effort. You can simply save your canvas and share it with others using a link.

Popular use cases

While BigQuery data canvas can accelerate many analytics tasks, it’s particularly helpful for:

Ad hoc analysis: When working on a tight deadline, data canvas makes it easy to pull data from various sources.

Exploratory data analysis (EDA): This critical early step in the data analysis process focuses on summarizing the main characteristics of a dataset, often visually. Data canvas helps find data sources and then presents the results visually.

Collaboration: Data canvas makes it easy to share an analytics project with multiple people.

What our customers are saying 

Companies large and small have been experimenting with BigQuery data canvas for their day-to-day analytics tasks and their feedback has been very positive. 

Wunderkind, a performance marketing channel that powers one-to-one customer interactions, has been using BigQuery data canvas across their analytics team for several weeks and is experiencing significant time savings. 

“For any sort of investigation or exploratory exercise resulting in multiple queries there really is no replacement [for data canvas]. [It] Saves us so much time and mental capacity!” – Scott Schaen, VP of Data & Analytics, Wunderkind

How Wunderkind accelerates time to insights with BigQuery data canvas

Veo, a micro mobility company that operates in 50+ locations across the USA, is seeing immediate benefits from the AI capabilities in data canvas. 

“I think it’s been great in terms of being able to turn ideas in the form of NL to SQL to derive insights. And the best part is that I can review and edit the query before running it – that’s a very smart and responsible design. It gives me the space to confirm it and ensure accuracy as well as reliability!” – Tim Velasquez, Head of Analytics, Veo

Give BigQuery data canvas a try

To learn more, watch this video and check out the documentation. BigQuery data canvas is launching in preview and will be rolled out to all users starting on April 15th. Submit this form to get early access. 

For any bugs and feedback, please reach out to the product and engineering team at datacanvas-feedback@google.com. We’re looking forward to hearing how you use the new data canvas!

Source : Data Analytics Read More

Introducing Gemini in Looker to bring intelligent AI-powered BI to everyone

Introducing Gemini in Looker to bring intelligent AI-powered BI to everyone

We are at a pivotal moment for business intelligence (BI). There’s more data than ever impacting all aspects of business. Organizations are faced with increasing user demands for that data, with a wide range of access requirements. Then there’s AI, which is radically transforming how you create and think about every project. The delivery and adoption of generative AI is poised to bring the full benefits of BI to users who find a conversational experience more appealing than traditional methods. This week at Google Cloud Next, we introduced Conversational Analytics as part of Gemini in Looker, rethinking how we bring easy access of insights to our users, transforming the way we engage with our data in BI, using natural language. In addition, we announced the preview of an array of capabilities for Looker that leverage Google’s work in generative AI and speed up your organization’s ability to dive deeper into the data that matters most, so you can rapidly create and share insights.

With Gemini in Looker, your relationship with your data and reporting goes from a slow-moving and high-friction process, limited by gatekeepers, to a collaborative and intelligent conversation – powered by AI. The deep integration of Gemini models in Looker brings insights to the major user flows that power your business, and establishes a single source of truth for your data with consistent metrics.

Conversational Analytics brings your data to life

Conversational Analytics is a dedicated space in Looker for you to chat with and engage with your business data, as simply as you would ask a colleague a question on chat. In combination with LookML semantic models available from Looker, we establish a single source of truth for your data, providing consistent metrics across your organization. Now, your entire company, including business and analyst teams, can chat with your data and obtain insights in seconds, fully enabling your data-driven decision-making culture.

You can leverage Conversational Analytics, using Gemini in Looker, to find top products, sales details, and dive deeper into the answers with follow-up questions.

With Conversational Analytics, everyone can uncover patterns and trends in data, as if you were speaking to your in-house data expert – and while the answers come in seconds, Looker shows you the data behind the insights, so you know the foundation is accurate and the method is true.

Smart and simple modeling on a trusted foundation

In the generative AI era, ensuring data authenticity and standardizing operational metrics is more than a nice to have – it’s critical, ensuring measures and comparisons across apps and teams are reliable and consistent. Looker’s semantic layer is at the heart of our modeling capabilities, powering the centrally defined metrics and data relationships that mean truth and accuracy as you go through your workflows. With LookML, your analysts can work together seamlessly to create universal data and metrics definitions.

Gemini in Looker features LookML Assistant, which we hope will enable everyone to leverage and improve the power of their semantic models quickly using natural language. Simply tell Gemini in Looker what you are looking to build, and the LookML code will be automatically created for you, setting the stage for governed data, powered by generative AI, easier than ever before. 

Expanding intelligence for all Looker customers — and beyond

As the world of BI has evolved, so have our customers’ needs. They demand powerful and complete BI tools that are intuitive to use, with self-service exploration, seamless ad-hoc analysis, and high-quality visualizations all in a single platform, augmented by generative AI. 

We are now offering Looker Studio Pro to licensed Looker users (excluding Embed), at no additional cost, making getting started with BI easier than ever. 

Our vision is that Looker is the single source of truth for both modeled data and metrics that can be consumed anywhere — in our products, through partner BI tools or through our open SQL APIs. Looker’s modeling layer provides a single place to curate and govern the metrics most important to your business, meaning that customers can see consistent results no matter where they interact with their data.

Thanks to deep integration with Google Workspace, you can ask questions of your data with Gemini in Looker, helping you create reports easily and bring your creations to Slides.

Traditionally, BI tools take a user out of the flow of their work. We believe we can improve on this, helping users collaborate on their data where they are. With this in mind, we have extended our connections to Google Workspace, with the goal of meeting users where they are, across Slides, Sheets and Chat. Users will be able to automatically create Looker Studio reports from Google Sheets, helping you rapidly visualize and share insights on your data, while Slide Generation from Gemini in Looker eliminates that blank deck start, starting with your visuals and reports, and building AI-generated summaries to kick off your presentation right.

Business data insights as easy as asking Google

Gemini in Looker offers an array of new capabilities to help speed up and make analytics tasks and workflows including data modeling, chart creation, slide presentation generation and more even easier. As Google has done for decades in applications like Chrome, Gmail, and Google Maps, Gemini in Looker offers a customer experience that is intuitive and efficient.Conversational Analytics in Looker and LookML Assistant are joined by a set of capabilities that we first showcased at Next 2023, namely:

Report generation: Build an entire report, including multiple visualizations, a title, theme and layout, in seconds, by providing a one- two-sentence prompt. Gemini in Looker is an AI analyst that can create entire reports, giving you a starting point that you can adjust by using natural language.

Advanced visualization assistant: Customize your visualizations using natural language. Gemini in Looker helps create JSON code configs, which you can modify as necessary, and generate a custom visualization.

Automatic slide generation: Create impactful presentations with insightful text summaries of your data. Gemini in Looker automatically exports a report into Google Slides, with text narratives that explains the data in charts and highlights key insights.

Formula assistant: Create calculated fields on-the-fly to extend and transform the information flowing from your data sources. Gemini in Looker removes the need for you to remember complicated formulas, and creates your formula for you, for ad-hoc analysis.

Each of these capabilities are now available in preview.

Reliable intelligence for the generative AI era

Looker plays a critical role in Google Cloud’s intelligence platform, unifying your data ecosystem. Bringing even more intelligence into Looker with Gemini makes it easier for our customers to understand and access their business data for analysts to create dashboards and reports, and for developers to build new semantic models. Join us as we create new experiences with data and analytics — one defined by AI-powered conversational interfaces for data and analytics. It all starts with a simple chat box.

Source : Data Analytics Read More

How Gemini in BigQuery accelerates data and analytics workflows with AI

How Gemini in BigQuery accelerates data and analytics workflows with AI

The journey of going from data to insights can be fragmented, complex and time consuming. Data teams spend time on repetitive and routine tasks such as ingesting structured and unstructured data, wrangling data in preparation for analysis, and optimizing and maintaining pipelines. Obviously, they’d rather prefer doing higher-value analysis and insights-led decision making. 

At Next ‘23, we introduced Duet AI in BigQuery. This year at Next ‘24, Duet AI in BigQuery becomes Gemini in BigQuery which provides AI-powered experiences for data preparation, analysis and engineering as well as intelligent recommendations to enhance user productivity and optimize costs.

“With the new AI-powered assistive features in BigQuery and ease of integrating with other Google Workspace products, our teams can extract valuable insights from data. The natural language-based experiences, low-code data preparation tools, and automatic code generation features streamline high-priority analytics workflows, enhancing the productivity of data practitioners and providing the space to focus on high impact initiatives. Moreover, users with varying skill sets, including our business users, can leverage more accessible data insights to effect beneficial changes, fostering an inclusive data-driven culture within our organization.” said Tim Velasquez, Head of Analytics, Veo 

Let’s take a closer look at the new features of Gemini in BigQuery.

Accelerate data preparation with AI

Your business insights are only as good as your data. When you work with large datasets that come from a variety of sources, there are often inconsistent formats, errors, and  missing data. As such, cleaning, transforming, and structuring them can be a major hurdle.

To simplify data preparation, validation, and enrichment, BigQuery now includes AI augmented data preparation that helps users to cleanse and wrangle their data. Additionally we are enabling users to build low-code visual data pipelines, or rebuild legacy pipelines in BigQuery. 

Once the pipelines are running in production, AI assists with finding and resolving issues such as schema or data drift, significantly reducing the toil associated with maintaining a data pipeline. Because the resulting pipelines run in BigQuery, users also benefit from integrated metadata management, automatic end-to-end data lineage, and capacity management.

Gemini in BigQuery provides AI-driven assistance for users to clean and wrangle data

Kickstart the data-to-insights journey

Most data analysis starts with exploration — finding the right dataset, understanding the data’s structure, identifying key patterns, and identifying the most valuable insights you want to extract. This step can be cumbersome and time-consuming, especially if you are working with a new dataset or if you are new to the team. 

To address this problem, Gemini in BigQuery provides new semantic search capabilities to help you pinpoint the most relevant tables for your tasks. Leveraging the metadata and profiling information of these tables from Dataplex, Gemini in BigQuery surfaces relevant, executable queries that you can run with just one click. You can learn more about BigQuery data insights here.

Gemini in BigQuery suggests executable queries for tables that you can run in single click

Reimagine analytics workflows with natural language

To boost user productivity, we’re also rethinking the end-to-end user experience. The new BigQuery data canvas provides a reimagined natural language-based experience for data exploration, curation, wrangling, analysis, and visualization, allowing you to explore and scaffold your data journeys in a graphical workflow that mirrors your mental model. 

For example, to analyze a recent marketing campaign, you can use simple natural language prompts to discover campaign data sources, integrate with existing customer data, derive insights, and share visual reports with executives — all within a single experience. Watch this video for a quick overview of BigQuery data canvas.

BigQuery data canvas allows you to explore and analyze datasets, and create a customized visualization, all using natural language prompts within the same interface

Enhance productivity with SQL and Python code assistance 

Even advanced users sometimes struggle to remember all the details of SQL or Python syntax, and navigating through numerous tables, columns, and relationships can be daunting. 

Gemini in BigQuery helps you write and edit SQL or Python code using simple natural language prompts, referencing relevant schemas and metadata. You can also leverage BigQuery’s in-console chat interface to explore tutorials, documentation and best practices for specific tasks using simple prompts such as: “How can I use BigQuery materialized views?” “How do I ingest JSON data?” and “How can I improve query performance?”

Optimize analytics for performance and speed 

With growing data volumes, analytics practitioners including data administrators, find it increasingly challenging to effectively manage capacity and enhance query performance. We are introducing recommendations that can help continuously improve query performance, minimize errors and optimize your platform costs. 

With these recommendations, you can identify materialized views that can be created or deleted based on your query patterns and partition or cluster of your tables. Additionally, you can autotune Spark pipelines and troubleshoot failures and performance issues. 

Get started

To learn more about Gemini in BigQuery, watch this short overview video and refer to the documentation , and sign up to get early access to the preview features. If you’re at Next ‘24, join our data and analytics breakout sessions and stop by at the demo stations to explore further and see these capabilities in action. Pricing details for Gemini in BigQuery will be shared when generally available to all customers.

Source : Data Analytics Read More

Analyze images and videos in BigQuery using Gemini 1.0 Pro Vision

Analyze images and videos in BigQuery using Gemini 1.0 Pro Vision

With the proliferation of digital devices and platforms including social media, mobile devices and IoT sensors, organizations are increasingly generating unstructured data in the form of images, audio files, videos, and documents etc. Over the last few months, we launched BigQuery integrations with Vertex AI to leverage Gemini 1.0 Pro, PaLM , Vision AI, Speech AI, Doc AI, Natural Language AI and more to help you interpret and extract meaningful insights from unstructured data.

While Vision AI provides image classification and object recognition capabilities, large language models (LLMs) unlock new visual use cases. To that end, we are expanding BigQuery and Vertex AI integrations to support multimodal generative AI use cases with Gemini 1.0 Pro Vision. Using familiar SQL statements, you can take advantage of Gemini 1.0 Pro Vision directly in BigQuery to analyze both images and videos by combining them with your own text prompts.

A birds-eye view of Vertex AI integration capabilities for analyzing unstructured data in BigQuery

Within a data warehouse setting, multimodal capabilities can help enhance your unstructured data analysis across a variety of use cases: 

Object recognition: Answer questions related to fine-grained identification of the objects in images and videos.

Info seeking: Combine world knowledge with information extracted from the images and videos.

Captioning/description: Generate descriptions of images and videos with varying levels of detail.

Digital content understanding: Answer questions by extracting information from content like infographics, charts, figures, tables, and web pages.

Structured content generation: Generate responses in formats like HTML and JSON based on provided prompt instructions.

Turning unstructured data into structured data

With minimal prompt adjustments, Gemini 1.0 Pro Vision can produce structured responses in convenient formats like HTML or JSON, making them easy to consume in downstream tasks. In a data warehouse such as BigQuery, having structured data means you can use the results in SQL operations and combine it with other structured datasets for deeper analysis.

For example, imagine you have a large dataset that contains images of cars. You want to understand a few basic details about the car in each image. This is a use case that Gemini 1.0 Pro Vision can help with!

Combining text and image into a prompt for Gemini 1.0 Pro Vision, with a sample response.

Dataset from: 3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.

As you can see, Gemini’s response is very thorough! But while the format and extra information are great if you’re a person, they’re not so great if you’re a data warehouse. Rather than turning unstructured data into more unstructured data, you can make changes to the prompt to direct the model on how to return a structured response.

Adjusting the text portion of the prompt to indicate a structured response from Gemini 1.0 Pro Vision, with a sample result.

You can see how this response would be much more useful in an environment like BigQuery.

Now let’s see how to prompt Gemini 1.0 Pro Vision directly in BigQuery to perform this analysis over thousands of images!

Accessing Gemini 1.0 Pro Vision from BigQuery ML

Gemini 1.0 Pro Vision is integrated with BigQuery through the ML.GENERATE_TEXT() function. To unlock this function in your BigQuery project, you will need to create a remote model that represents a hosted Vertex AI large language model. Fortunately, it’s just a few lines of SQL:

code_block
<ListValue: [StructValue([(‘code’, “CREATE MODEL `mydataset.gemini_pro_vision_model`rnREMOTE WITH CONNECTION `us.bqml_llm_connection`rnOPTIONS(endpoint = ‘gemini-pro-vision’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e77205b98b0>)])]>

Once the model is created, you can combine your data with the ML.GENERATE_TEXT() function in your SQL queries to generate text. 

A few notes on the ML.GENERATE_TEXT() function syntax when it is pointing to a gemini-pro-vision model endpoint, as is the case in this example:

TABLE: takes an object table as input, where it can contain different types of unstructured objects (e.g. images, videos).

PROMPT: takes a single string text prompt that is placed as part of the option STRUCT (dissimilar to the case when using the gemini-pro model) and applies this prompt to each object, row-by-row, contained in the object TABLE.

code_block
<ListValue: [StructValue([(‘code’, “SELECTrn uri,rn ml_generate_text_llm_result as brand_model_yearrn FROMrn ML.GENERATE_TEXT(rn MODEL `mydataset.gemini_pro_vision_model`,rn TABLE `mydataset.car_images_object_table`,rn STRUCT(rn ‘What is the brand, model, and year of this car? Answer in JSON format with three keys: brand, model, year. brand and model should be string, year should be integer.’ AS prompt, TRUE AS flatten_json_output));”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7720ce4070>)])]>

Let’s take a peek at the results.

We can add some SQL to this query to extract each of the values for brand, model, and year into new fields for use downstream.

code_block
<ListValue: [StructValue([(‘code’, ‘WITH raw_json_result AS ( rnSELECTrn uri,rn ml_generate_text_llm_result as brand_model_yearrn FROMrn ML.GENERATE_TEXT(rn MODEL `mydataset.gemini_pro_vision_model`,rn TABLE `mydataset.car_images_object_table`,rn STRUCT(rn ‘What is the brand, model, and year of this car? Answer in JSON format with three keys: brand, model, year. brand and model should be string, year should be integer.’ AS prompt, TRUE AS flatten_json_output)))rnSELECTrn uri,rn JSON_QUERY(RTRIM(LTRIM(raw_json_result.brand_model_year, ” “`json”), ““`”), “$.brand”) AS brand,rn JSON_QUERY(RTRIM(LTRIM(raw_json_result.brand_model_year, ” “`json”), ““`”), “$.model”) AS model,rn JSON_QUERY(RTRIM(LTRIM(raw_json_result.brand_model_year, ” “`json”), ““`”), “$.year”) AS yearrnFROM raw_json_result’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7720ce4790>)])]>

Now the responses have been parsed into new, structured columns.

And there you have it. We’ve just turned a collection of unlabeled, raw images into structured data, fit for analysis in a data warehouse. Imagine joining this new table with other relevant enterprise data. With a dataset of historical car sales, for example, you could determine the average or median sale price for similar cars in a recent time period. This is just a taste of the possibilities that are uncovered by bringing unstructured data into your data workflows!

When getting started with Gemini 1.0 Pro Vision in BigQuery, there are a few important items to note:

You need an enterprise or enterprise plus reservation to run Gemini 1.0 Pro Vision model inference over an object table. For reference see the BigQuery editions documentation.

Limits apply to functions that use Vertex AI large language models (LLMs) and Cloud AI services, so review the current quota in place for the Gemini 1.0 Pro Vision model.

Next steps

Bringing generative AI directly into BigQuery has enormous benefits. Instead of writing custom Python code and building data pipelines between BigQuery and the generative AI model APIs, you can now just write a few lines of SQL! BigQuery manages the infrastructure and helps you scale from one prompt to thousands. Check out the overview and demo video, and the documentation to see more example queries using ML.GENERATE_TEXT() with Gemini 1.0 Pro Vision.

Coming to Next ‘24? Check out session Power data analytics with generative AI using BigQuery and Gemini, where you can see Gemini Vision Pro and BigQuery in action.

Source : Data Analytics Read More

Grounding generative AI in enterprise truth

Grounding generative AI in enterprise truth

Generative AI continues to amaze users at organizations around the world. From helping marketers workshop ideas and creative campaigns, to recommending coding advice to developers, to assisting analysts with market research, the technology has captivated users with its ability to synthesize information and generate answers to questions.

But the arrival of generative AI hasn’t been without its challenges. 

Though foundation models that power generative AI develop a vast “world knowledge” during training, they’re only as up-to-date as their training data, and they can lack access to all the data sources pertinent to enterprise use cases. To adopt generative AI at full speed, businesses need to ground foundation model responses in enterprises systems and fresh data, to ensure the most accurate and complete responses.

For instance, by grounding foundation model responses in ERP systems, businesses can create AI agents that provide accurate shipping predictions, and by grounding in documentation and manuals, they can deliver more helpful answers to product questions and for troubleshooting.

Similarly, research can be accelerated by grounding in analyst reports and studies, compliance can be strengthened by connecting foundation models to contracts, and employee training and onboarding can be improved by rooting agents in internal documents, knowledge bases, and HR systems.

Essentially, the more easily businesses can ground foundation models in their data, the more powerful their use cases can become.

At Google Cloud, we call this “enterprise truth” — the approach to grounding a foundation model in web information; enterprise data like databases and data warehouses; enterprise applications like ERP, CRM, and HR systems; and other sources of relevant information. Grounding in enterprise truth significantly improves the completeness and accuracy of responses, unlocking unique use cases across the business and laying the groundwork for the next generation of AI agents. 

Let’s explore how we do it! 

Grounding generative AI with Google Search and enterprise data 

Generative AI models know the most probable response, which isn’t the same as being able to cite facts. This is why we’ve built — and continue to build — a variety of ways to help ensure each organization is able to ground its foundation models in the truth relevant to its use case.

Google Search is one of the world’s most trusted sources of factual and up-to-date information. Ground with Google Search expands and enhances the model’s access to fresh, high-quality information, significantly improving the completeness and accuracy of responses.

Today, we are announcing the preview of Ground with Google Search in Vertex AI. Businesses can now augment Gemini models with Google Search grounding, and can easily integrate the enhanced model into their AI agents.

When it comes to enterprise data, we offer multiple ways for businesses to ground model responses in enterprise data sources by leveraging retrieval augmented generation, or RAG. RAG helps improve the accuracy of model outputs by using vectors and embeddings to gather facts from relevant data sources.

Vertex AI includes not only a Google-quality, out-of-box RAG solution, but also a variety of component APIs for building bespoke retrieval, ranking, and document processing systems that enable enterprises to easily ground foundation models in their own data. 

For organizations that need embeddings-based information retrieval, Vertex AI offers powerful vector search capabilities. Today, to enhance vector search, we are excited to announce the preview of our hybrid search feature, which integrates vector-based and keyword-based search techniques to ensure relevant and accurate responses for users.

Besides these, customers can connect models to Google databases like AlloyDB and BigQuery for the contextual retrieval of operational data and analytics like purchase preferences, rewards, basket analysis, interaction history, and more. To enable actions and transactions, we provide a host of data connectors to help businesses connect their models to enterprise applications like Workday, Salesforce, ServiceNow, Hadoop, Confluence, and JIRA to access the latest data on customer interactions and internal knowledge updates like issue tracking, program management, and employee records. 

With a comprehensive approach to grounding that covers web search, enterprise data, and third-party enterprise applications, businesses can ensure that their models will deliver enterprise truth – wherever it is hosted.

How more sources of truth create more value 

Let’s walk through an example to show how grounding lets organizations integrate sources of enterprise truth and create more helpful AI agents. 

Suppose an athletic brand wants to create an AI agent to help customers find and purchase shoes.

If the company just puts an interface atop a foundation model API, they won’t accomplish much. The resulting app would be able to discuss shoes generally, based on its training knowledge, but it wouldn’t have particular expertise in the brand’s shoes or any awareness of new footwear trends that emerged after its training cutoff date. 

With grounding in Google Search, the shoe brand’s app can become much more functional, able to search the web for fresh information. However, it wouldn’t have insight into the brand’s internal data such as product information, inventory levels, and manufacturing timelines, nor would it be able to call functions for transactions — so the shoe company would still be dealing with a basic and rather limited agent. 

To cross this chasm, the company also needs to connect its gen AI models to enterprise data sources via RAG mechanisms, so the agent can ground its advice in the specificity and factuality of internal documents and databases. 

Imagine an advanced, proactive version of the shoe-recommending agent, with access to the full spectrum of aforementioned search, databases, and analytics. It would be able to observe patterns like the customer’s last several purchases all having green stripes. It would remember the earlier chat in which the customer said they dislike shoes that squeak on hardwood floors, and then go about reviewing customer reviews to purge squeakiness from its recommendations. It would also generate tables on the fly so the customer can more easily compare options, and it would know up-to-date inventory and shipping information to help execute transactions. With the right grounding in enterprise truth, the sky’s the limit — and so is the value the agent can create.

Enterprise truth: fueling gen AI innovation across businesses

Generative AI adoption isn’t just about access to capable models. It’s also about grounding foundation models in first-party data and high-quality external sources — and using these connections to steer model behavior, creating more accurate, relevant, and factual generative AI experiences for businesses to offer their customers, partners, and employees. 

With access to high-quality and relevant data, models can power experiences that move beyond traditional passive applications, giving rise to the next generation of AI agents grounded in enterprise truth. That’s the future we are rapidly moving towards. Backed by our commitment to this journey with our customers, we’re excited to help make the outputs of today’s agents factual, relevant, and actionable.

To learn more about Google Cloud’s AI news at Next, check out our Vertex AI Agent Builder announcement.

Source : Data Analytics Read More

What’s next for data analytics at Google Cloud Next ’24

What’s next for data analytics at Google Cloud Next ’24

We’re entering a new era for data analytics, going from narrow insights to enterprise-wide transformation through a virtuous cycle of data, analytics, and AI. At the same time, analytics and AI are becoming widely accessible, providing insights and recommendations to anyone with a question. Ultimately, we’re going beyond our own human limitations to leverage AI-based data agents to find deeply hidden insights for us.

Organizations already recognize that data and AI can come together to unlock the value of AI for their business. Research from Google’s 2024 Data and AI Trends Report highlighted 84% of data leaders believe that generative AI will help their organization reduce time-to-insight, and 80% agree that the lines of data and AI are starting to blur.

Today at Google Cloud Next ’24, we’re announcing new innovations for BigQuery and Looker that will help activate all of your data with AI:

BigQuery is a unified AI-ready data platform with support for multimodal data, multiple serverless processing engines and built-in streaming and data governance to support the entire data-to-AI lifecycle. 

New BigQuery integrations with Gemini models in Vertex AI support multimodal analytics, vector embeddings, and fine-tuning of LLMs from within BigQuery, applied to your enterprise data.

Gemini in BigQuery provides AI-powered experiences for data preparation, analysis and engineering, as well as intelligent recommenders to optimize your data workloads.

Gemini in Looker enables business users to chat with their enterprise data and generate visualizations and reports—all powered by the Looker semantic data model that’s seamlessly integrated into Google Workspace.

Let’s take a deeper look at each of these developments.

BigQuery: the unified AI-ready data foundation

BigQuery is now Google Cloud’s single integrated platform for data to AI workloads. BigLake, BigQuery’s unified storage engine, provides a single interface across BigQuery native and open formats for analytics and AI workloads, giving you the choice of where your data is stored and access to all of your data, whether structured or unstructured, along with a universal view of data supported by a single runtime metastore, built-in governance, and fine grained access controls.

Today we’re expanding open format support with the preview of a fully managed experience for Iceberg, with DDL, DML and high throughput support. In addition to support for Iceberg and Hudi, we’re also extending BigLake capabilities with native support for the Delta file format, now in preview. 

At HCA Healthcare we are committed to the care and improvement of human life. We are on a mission to redesign the way care is delivered, letting clinicians focus on patient care and using data and AI where it can best support doctors and nurses. We are building our unified data and AI foundation using Google Cloud’s lakehouse stack, where BigQuery and BigLake enable us to securely discover and manage all data types and formats in a single platform to build the best possible experiences for our patients, doctors, and nurses. With our data in Google Cloud’s lakehouse stack, we’ve built a multimodal data foundation that will enable our data scientists, engineers, and analysts to rapidly innovate with AI.” – Mangesh Patil, Chief Analytics Officer, HCA Healthcare

We’re also extending our cross-cloud capabilities of BigQuery Omni. Through partnerships with leading organizations like Salesforce and our recent launch of bidirectional data sharing between BigQuery and Salesforce Data Cloud, customers can securely combine data across platforms with zero copy and zero ops to build AI models and predictions on combined Salesforce and BigQuery data. Customers can also enrich customer 360 profiles in Salesforce Data Cloud with data from BigQuery, driving additional personalization opportunities powered by data and AI. 

“It is great to collaborate without boundaries to unlock trapped data and deliver amazing customer experiences. This integration will help our joint customers tap into Salesforce Data Cloud’s rich capabilities and use zero copy data sharing and Google AI connected to trusted enterprise data.” – Rahul Auradkar, EVP and General Manager of United Data Services & Einstein at Salesforce

Building on this unified AI-ready data foundation, we are now making BigQuery Studio generally available, which already has hundreds of thousands of active users. BigQuery Studio provides a collaborative data workspace across data and AI that all data teams and practitioners can use to accelerate their data-to-AI workflows. BigQuery Studio provides the choice of SQL, Python, Spark or natural language directly within BigQuery, as well as new integrations for real-time streaming and governance. 

Customers’ use of serverless Apache Spark for data processing increased by over 500% in the past year1. Today, we are excited to announce the preview of our serverless engine for Apache Spark integrated within BigQuery Studio to help data teams work with Python as easily as they do with SQL, without having to manage infrastructure.

The data team at Snap Inc. uses these new capabilities to converge toward a common data and AI platform with multiple engines that work across a single copy of data. This gives them the ability to enforce fine-grained governance and track lineage close to the data to easily expand analytics and AI use cases needed to drive transformation.

To make data processing on real-time streams directly accessible from BigQuery, we’re announcing the preview of BigQuery continuous queries providing continuous SQL processing over data streams, enabling real-time pipelines with AI operators or reverse ETL. We are also announcing the preview of Apache Kafka for BigQuery as a managed service to enable streaming data workloads based on open-source APIs.

We’re expanding our governance capabilities with Dataplex with new innovations for data-to-AI governance available in preview. You can now perform integrated search and drive gen AI-powered insights on your enterprise data, including data and models from Vertex AI, with a fully integrated catalog in BigQuery. We’re introducing column-level lineage in BigQuery and expanding lineage capabilities to support Vertex AI pipelines (available in preview soon) to help you better understand data-to-AI workloads. Finally, to facilitate governance for data-access at scale, we are launching governance rules in Dataplex. 

Multimodal analytics with new BigQuery and Vertex AI integrations

With BigQuery’s direct integration with Vertex AI, we are now announcing the ability to connect models in Vertex AI with your enterprise data, without having to copy or move your data out of BigQuery. This enables multi-modal analytics using unstructured data, fine tuning of LLMs and the use of vector embeddings in BigQuery.

Priceline, for instance, is using business data stored in BigQuery for LLMs across a wide range of applications. 

“BigQuery gave us a solid data foundation for AI. Our data was exactly where we needed it. We were able to connect millions of customer data points from hotel information, marketing content, and customer service chat and use our business data to ground LLMs.” – Allie Surina Dixon, Director of Data, Priceline 

The direct integration between BigQuery and Vertex AI now enables seamless preparation and analysis of multimodal data such as documents, audio and video files. BigQuery features rich support for analyzing unstructured data using object tables and Vertex AI Vision, Document AI and Speech-to-Text APIs. We are now enabling BigQuery to analyze images and video using Gemini 1.0 Pro Vision, making it easier than ever to combine structured with unstructured data in data pipelines using the generative AI capabilities of the latest Gemini models. 

BigQuery makes it easier than ever to execute AI on enterprise data by providing the ability to build prompts based on your BigQuery data, and use of LLMs for sentiment extraction, classification, topic detection, translation, classification, data enrichment and more.

BigQuery now also supports generating vector embeddings and indexing them at scale using vector and semantic search. This enables new use cases that require similarity search, recommendations or retrieval of your BigQuery data, including documents, images or videos. Customers can use the semantic search in the BigQuery SQL interface or via our integration with gen AI frameworks such as LangChain and leverage Retrieval Augmented Generation based on their enterprise data.

Gemini in BigQuery and Gemini in Looker for AI-powered assistance

Gen AI is creating new opportunities for rich data-driven experiences that enable business users to ask questions, build custom visualizations and reports, and surface new insights using natural language. In addition to business users, gen AI assistive and agent capabilities can also accelerate the work of data teams, spanning data exploration, analysis, governance, and optimization. In fact, more than 90% of organizations believe business intelligence and data analytics will change significantly due to AI. 

Today, we are announcing the public preview of Gemini in BigQuery, which provides AI-powered features that enhance user productivity and optimize costs throughout the analytics lifecycle, from ingestion and pipeline creation to deriving valuable insights. What makes Gemini in BigQuery unique is its contextual awareness of your business through access to metadata, usage data, and semantics. Gemini in BigQuery also goes beyond chat assistance to include new visual experiences such as data canvas, a new natural language-based experience for data exploration, curation, wrangling, analysis, and visualization workflows.

Imagine you are a data analyst at a bikeshare company. You can use the new data canvas of Gemini in BigQuery to explore the datasets, identify the top trips and create a customized visualization, all using natural language prompts within the same interface

Gemini in BigQuery capabilities extend to query recommendations, semantic search capabilities, low-code visual data pipeline development tools, and AI-powered recommendations for query performance improvement, error minimization, and cost optimization. Additionally, it allows users to create SQL or Python code using natural language prompts and get real-time suggestions while composing queries.

Today, we are also announcing the private preview of Gemini in Looker to enable business users and analysts to chat with their business data. Gemini in Looker capabilities include conversational analytics, report and formula generation, LookML and visualization assistance, and automated Google slide generation. What’s more, these capabilities are being integrated with Workspace to enable users to easily access beautiful data visualizations and insights right where they work.

Imagine you’re an ecommerce store. You can query Gemini in Looker to learn sales trends and market details and immediately explore the insights, with details on how the charts were created.

To learn more about our data analytics product innovations, hear customer stories, and gain hands-on knowledge from our developer experts, join our data analytics spotlights and breakout sessions at Google Cloud Next ‘24, or watch them on-demand.

1. Google internal data – YoY growth of data processed using Apache Spark on Google Cloud compared with Feb ‘23

Source : Data Analytics Read More