Introducing partitioning and clustering recommendations for optimizing BigQuery usage

Introducing partitioning and clustering recommendations for optimizing BigQuery usage

Do you have a lot of BigQuery tables? Do you find it hard to keep track of which ones are partitioned and clustered, and which ones could be? If so, we have good news. We’re launching a partitioning and clustering recommender that will do the work for you! The recommender analyzes all your organization’s workloads and tables and identifies potential cost optimization opportunities. And the best part is, it’s completely free!

“The BigQuery partitioning and clustering recommendations are awesome! They have helped our customers identify areas where they can reduce costs, improve performance, and optimize our BigQuery usage.” Sky, one of Europe leading media and communications companies

How does the recommender work?

Partitioning divides a table into segments, while clustering sorts the table based on user-defined columns. Both methods can improve the performance of certain types of queries, such as queries that use filter clauses and queries that aggregate data.

BigQuery’s partitioning and clustering recommender analyzes each project’s workload execution over the past 30 days to look for suboptimal scans of the table data. The recommender then uses machine learning to estimate the potential savings and generate final recommendations. The process has four key steps: Candidate Generation, Read Pattern Analyzer, Write Pattern Analyzer, and Generate Recommendations.

Candidate Generation is the first step in the process, where tables and columns are selected based on specific criteria. For Partitioning, tables larger than 100 Gb are chosen, and for Clustering tables larger than 10 Gb are chosen. The reason for filtering out the smaller tables is because the optimization benefit is smaller and less predictable. Then we identify columns that meet BigQuery’s partitioning and clustering requirements. 

In the Read Pattern Analyzer step, the recommender analyzes the logs of queries that filter on the selected columns to determine their potential for cost savings through partitioning or clustering. Several metrics, such as filter selectivity, potential file pruning, and runtime, are considered, and machine learning is used to estimate the potential slot time saved if partitioning or clustering is applied.

The Write Pattern Analyzer step is then used to estimate the cost that partitioning or clustering may introduce during write time. Write patterns and table schema are analyzed to determine the net savings from partitioning or clustering for each column.

Finally, in Generate Recommendations, the output from both the Read Pattern Analyzer and Write Pattern Analyzer is used to determine the net savings from partitioning or clustering for each column. If the net savings are positive and meaningful, the recommendations are uploaded to the Recommender API with proper IAM permissions.

Discovering BigQuery partitioning and clustering recommendations

You can access these recommendations via a few different channels:

Via the lightbulb or idea icon in the top right of BigQuery’s UI page

On our console via the Recommendation Hub

Via our Recommender API

You can also export the recommendations to BigQuery using BigQuery Export.

To learn more about the recommender, please see the public documentation

We hope you use BigQuery partitioning and clustering recommendations to optimize your BigQuery tables, and can’t wait to hear your feedback and thoughts about this feature. Please feel free to reach us at active-assist-feedback@google.com.

Source : Data Analytics Read More

Streaming graph data with Confluent Cloud and Neo4j on Google Cloud

Streaming graph data with Confluent Cloud and Neo4j on Google Cloud

There are many ways to classify data. Data can be characterized as batch and streaming. Similarly data can be characterized as tabular or connected. In this blog post, we’re going to explore an architecture focused on a particular kind of data — connected data which is streaming.

Neo4j is the leading graph database. It stores data as nodes and relationships between those nodes. This allows users to uncover insights from connections in their connected data. Neo4j offers Neo4j Aura, a managed service for Neo4j.

Apache Kafka is the de facto tool today for creating streaming data pipelines. Confluent offers Confluent Cloud, a managed service for Apache Kafka. In addition, Confluent provides the tools needed to bring together real-time data streams to connect the whole business. Its data streaming platform turns events into outcomes, enables intelligent, real-time apps, and empowers teams and systems to act on data instantly.

Both these products are available on Google Cloud, through Google Cloud Marketplace. Used together, Neo4j Aura and Confluent Cloud provide a streaming architecture that can extract value from connected data. Some examples include:

Retail: Confluent Cloud can stream real-time buying data to Neo4j Aura. With this connected data in Aura, graph algorithms can be leveraged to understand buying patterns. This allows for real time product recommendations, customer churn prediction. In supply chain management, use cases include finding alternate suppliers and demand forecasting.

Healthcare and Life Sciences: Streaming data into Neo4j Aura allows for real-time case prioritization and triaging of patients based on medical events and patterns. This architecture can capture patient journey data including medical events for individuals. This allows for cohort based analysis across events related to medical conditions patients experience, medical procedures they undergo and medication they take. This cohort journey can then be used to predict future outcomes or apply corrective actions.

Financial Services: Streaming transaction data with Confluent Cloud into Neo4j Aura allows for real time fraud detection. Previously unknown, benign-looking fraud-ring activities can be tracked in real-time and detected. This reduces the risk of financial losses and improves customer experience.

This post will take you through setting up a fully managed Kafka cluster running in Confluent Cloud and creating a streaming data pipeline that can ingest data into Neo4j Aura.

In this example we generate a message manually in Confluent Cloud. For production implementations, messages are typically generated by upstream systems. On Google Cloud this includes myriad Google services that Confluent Cloud can connect to such as Cloud Functions, BigTable and Cloud Run.

Pre-requisites

So let’s start building this architecture. We’ll need to set up a few things:

Google Cloud Account: You can create one for free if you don’t have one. You also get $300 credits once you sign-up.

Confluent Cloud: The easiest way to start with Confluent Cloud is to deploy through Google Cloud Marketplace. The relevant listing is here.

Neo4j Aura: To get started with Neo4j Aura, just deploy it via Google Cloud Marketplace here.

A VM: We need a terminal to execute confluent CLI commands and run docker. You can create a VM using Google Compute Engine (GCE).

Creating a Kafka topic

To start we’re going to need to create a Kafka cluster in Confluent Cloud. Then we’ll create a Kafka topic in that cluster. The steps below can be done via the Confluent Cloud UI. However, let’s do it via command line so that it is easier to automate the whole process. 

First, open a bash terminal on your GCE VM. Then, let’s install the Confluent CLI tool.
curl -sL –http1.1 https://cnfl.io/cli | sh -s — latest

Login to your Confluent account
confluent login –save

We have to create an environment and cluster to use. To create an environment:
confluent environment create test

To list down the environments available, run:
confluent environment list

This command will return a table of environment IDs and names. You will find the newly created `test` environment in the result. Let’s try to use its environment ID to create all the resources in the `test` environment. In my case, `env-3r2362` is the ID for the `test` environment.
confluent environment use env-3r2362

Using this environment, let’s create a kafka cluster on the GCP `us-central1` region.
confluent kafka cluster create test –cloud gcp –region us-central1

You can choose some other region from the list of supported regions:
confluent kafka region list –cloud gcp

You can obtain the cluster ID by executing:
confluent kafka cluster list

Now, let’s use the environment and cluster created above.
confluent environment use test
confluent kafka cluster use lkc-2r1rz1

An API key/secret pair is required to create a topic on your cluster. You also need it to produce/consume messages in a topic. If you don’t have one, you can create it using:
confluent api-key create –resource lkc-2r1rz1

Now, let’s create a topic to produce and consume in this cluster using:
confluent kafka topic create my-users

With these steps, our Kafka cluster is ready to produce and consume messages.

Creating a Connector instance

The Neo4j Connector for Apache Kafka can be run self-managed on a container inside Google Kubernetes Engine. Let’s create a `docker-compose.yml` and run a Kafka connect instance locally.

In the docker-compose file, we are trying to create and orchestrate a Kafka Connect container. We use the `confluentinc/cp-kafka-connect-base` as the base image. The connector will be running and exposed on port 8083.

code_block[StructValue([(u’code’, u”version: ‘3’rnservices:rn kconnect-neo4j-confluent:rn image: confluentinc/cp-kafka-connect-base:7.3.1rn container_name: kconnect-neo4j-confluentrn ports:rn – 8083:8083″), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588de750>)])]

Upon container start, we are going to install a Neo4j Sink Connector package via confluent-hub. Once the package is installed, we should be good to create a Sink instance running within the container.

First, let’s set the environment variables that the base image expects. 

In the following snippet, replace your Kafka URL and Port, which can be gotten from Confluent Cloud. 
`<KAFKA_INSTANCE_URL>` with your Kafka URL 
`<KAFKA_PORT>` with your Kafka Port. 

We are creating topics specific to this connector for writing configuration, offset and status data. Since we are going to write JSON data, let’s use JsonConverter for `CONNECT_KEY_CONVERTER` and `CONNECT_VALUE_CONVERTER`.

Our Kafka cluster inside confluent is protected and has to be accessed via a Key and Secret.

Kafka API and Secret created during setup has to be used to replace `<KAFKA_API_KEY>` and `<KAFKA_API_SECRET>` inside CONNECT_SASL_JAAS_CONFIG and CONNECT_CONSUMER_SASL_JAAS_CONFIG variables.

code_block[StructValue([(u’code’, u’environment:rn CONNECT_BOOTSTRAP_SERVERS: <KAFKA_INSTANCE_URL>:<KAFKA_PORT>rn CONNECT_REST_ADVERTISED_HOST_NAME: ‘kconnect-neo4j-confluent’rn CONNECT_REST_PORT: 8083rn CONNECT_GROUP_ID: kconnect-neo4j-confluentrn CONNECT_CONFIG_STORAGE_TOPIC: _config-kconnect-neo4j-confluentrn CONNECT_OFFSET_STORAGE_TOPIC: _offsets-kconnect-neo4j-confluentrn CONNECT_STATUS_STORAGE_TOPIC: _status-kconnect-neo4j-confluentrn CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_PLUGIN_PATH: ‘/usr/share/java,/usr/share/confluent-hub-components/’rn CONNECT_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_RETRY_BACKOFF_MS: “500”rn CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_SASL_MECHANISM: “PLAIN”rn CONNECT_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_CONSUMER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_CONSUMER_SASL_MECHANISM: “PLAIN”rn CONNECT_CONSUMER_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_CONSUMER_RETRY_BACKOFF_MS: “500”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f32d0>)])]

With all the Connector variables set, let’s focus on installing and configuring the Neo4j Sink connector. We have to install the binary via Confluent-hub
confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2

Sometimes, the above command might fail if there is any bandwidth or connection issue. Let’s keep trying until the command succeeds.

code_block[StructValue([(u’code’, u’while [ $? -eq 1 ]rn dorn echo “Failed to download the connector, will sleep and retry again”rn sleep 10rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn done’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f31d0>)])]

Once the package is installed, we have to use the RESTful API that the connector provides to install and configure a Neo4j Sink instance. Before that let’s wait until the connector worker is running:

code_block[StructValue([(u’code’, u’echo “Start Self-managed Connect Worker…”rn/etc/confluent/docker/run &rnwhile : ; dorncurl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)rnecho -e $$(date) ” Listener State : ” $$curl_status ” (waiting for 200)”rnif [ $$curl_status -eq 200 ] ; thenrnbreakrnfirnsleep 5rndone’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3290>)])]

After the worker is up, we can use the REST API to create a new Neo4j Sink Connector instance that listens to our topic and writes the JSON data in Neo4j. 

In the config below, we are listening to a topic named `test` “topics”: “my-users” and ingest the data via this cypher command: “MERGE (p:Person{name: event.name, surname: event.surname})” defined in the “neo4j.topic.cypher.test” property. Here, we are using a simple command to create or update a new Person node defined in the test topic. 

You might have to replace the <NEO4J_URL>, <NEO4J_PORT>, <NEO4J_USER>, <NEO4J_PASSWORD> placeholders with appropriate values.

code_block[StructValue([(u’code’, u’curl -i -X PUT -H “Accept:application/json” \rn -H “Content-Type:application/json” \rn http://localhost:8083/connectors/neo4j-sink/config \rn -d ‘{rn “topics”: “my-users”,rn “connector.class”: “streams.kafka.connect.sink.Neo4jSinkConnector”,rn “key.converter”: “org.apache.kafka.connect.storage.StringConverter”,rn “value.converter”: “org.apache.kafka.connect.json.JsonConverter”,rn “value.converter.schemas.enable”: “false”,rn “errors.retry.timeout”: “-1”,rn “errors.retry.delay.max.ms”: “1000”,rn “errors.tolerance”: “all”,rn “errors.log.enable”: “true”,rn “errors.log.include.messages”: “true”,rn “neo4j.server.uri”: “neo4j+s://<NEO4J_URL>:<NEO4J_PORT>”,rn “neo4j.authentication.basic.username”: “<NEO4J_USER>”,rn “neo4j.authentication.basic.password”: “<NEO4J_PASSWORD>”,rn “neo4j.topic.cypher.my-users”: “MERGE (p:Person{name: event.name, surname: event.surname})”rn }”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3950>)])]

Finally, let’s wait until this connector worker is up.

code_block[StructValue([(u’code’, u’while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors/neo4j-sink/status)rn echo -e $$(date) ” Neo4j Sink Connector State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn done’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f38d0>)])]

This is the complete docker-compose.yml. Ensure that you replace all the placeholders mentioned above:
docker-compose up

code_block[StructValue([(u’code’, u’—rnversion: ‘3’rnservices:rn kconnect-neo4j-confluent:rn image: confluentinc/cp-kafka-connect-base:7.3.1rn container_name: kconnect-neo4j-confluentrn ports:rn – 8083:8083rn environment:rn CONNECT_BOOTSTRAP_SERVERS: <KAFKA_INSTANCE_URL>:<KAFKA_PORT>rn CONNECT_REST_ADVERTISED_HOST_NAME: ‘kconnect-neo4j-confluent’rn CONNECT_REST_PORT: 8083rn CONNECT_GROUP_ID: kconnect-neo4j-confluentrn CONNECT_CONFIG_STORAGE_TOPIC: _config-kconnect-neo4j-confluentrn CONNECT_OFFSET_STORAGE_TOPIC: _offsets-kconnect-neo4j-confluentrn CONNECT_STATUS_STORAGE_TOPIC: _status-kconnect-neo4j-confluentrn CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_PLUGIN_PATH: ‘/usr/share/java,/usr/share/confluent-hub-components/’rn CONNECT_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_RETRY_BACKOFF_MS: “500”rn CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_SASL_MECHANISM: “PLAIN”rn CONNECT_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_CONSUMER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_CONSUMER_SASL_MECHANISM: “PLAIN”rn CONNECT_CONSUMER_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_CONSUMER_RETRY_BACKOFF_MS: “500”rn command:rn – bashrn – -crn – |rn echo “Install Neo4j Sink Connector”rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn rn while [ $? -eq 1 ]rn dorn echo “Failed to download the connector, will sleep and retry again”rn sleep 10rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn donernrnrn echo “Start Self-managed Connect Worker…”rn /etc/confluent/docker/run &rn while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)rn echo -e $$(date) ” Listener State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn donernrnrn echo -e “\n–\n+> Create Neo4j Sink Connector”rn curl -i -X PUT -H “Accept:application/json” \rn -H “Content-Type:application/json” \rn http://localhost:8083/connectors/neo4j-sink/config \rn -d ‘{rn “topics”: “my-users”,rn “connector.class”: “streams.kafka.connect.sink.Neo4jSinkConnector”,rn “key.converter”: “org.apache.kafka.connect.storage.StringConverterrn”,rn “value.converter”: “org.apache.kafka.connect.json.JsonConverter”,rn “value.converter.schemas.enable”: “false”,rn “errors.retry.timeout”: “-1”,rn “errors.retry.delay.max.ms”: “1000”,rn “errors.tolerance”: “all”,rn “errors.log.enable”: “true”,rn “errors.log.include.messages”: “true”,rn “neo4j.server.uri”: “neo4j+s://<NEO4J_URL>:<NEO4J_PORT>”,rn “neo4j.authentication.basic.username”: “<NEO4J_USER>”,rn “neo4j.authentication.basic.password”: “<NEO4J_PASSWORD>”,rn “neo4j.topic.cypher.my-users”: “MERGE (p:Person{name: event.name, surname: event.surname})”rn }’rnrnrn echo “Checking the Status of Neo4j Sink Connector…”rn while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors/neo4j-sink/status)rn echo -e $$(date) ” Neo4j Sink Connector State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn donern #rn #rn sleep infinity’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3610>)])]

Sending a message

Let’s write some messages via Confluent UI to test whether they get persisted on Neo4j. Go to your Confluent Cloud UI, click on your environment

You will now see the clusters within the environment. Click the cluster you created previously.

From the sidebar on the left, click on the `Topics` section and the `my-users` topic we created previously.

From the messages tab, you can start producing messages to this topic by clicking on the `Produce a new message to this topic` button.

Click the `Produce` button once you are done.

Alternatively, you can also write messages to our `my-users` topic via the command line.

Confluent CLI provides a command to write and consume messages from topics. Before using this command ensure that you are using an api-key.
confluent api-key use <API_KEY> –resource lkc-2r1rz1

confluent kafka topic produce my-users –parse-key –delimiter “:”

Using the last command, we can add messages containing key and value separated by a delimiter “:” in the topic.
“event”:{“name”: “John”, “surname”: “Doe”}

Go to your Neo4j Browser and check for the new Person node created with name ‘John’ and surname ‘Doe’.

Conclusion

In this blog post, we walked through setting up Confluent Cloud and Neo4j Aura on Google Cloud. We then used the Neo4j Connector for Apache Kafka to bridge between them. With that environment created, we tested sending a message through Confluent Cloud and capturing it in the Neo4j database. You can try this yourself with a Google Cloud account and the marketplace listings for Neo4j Aura and Confluent Cloud.

Confluent is a great data streaming platform to capture high volume of data in motion. Neo4j is a native graph platform that can sift through the connected data to deliver highly contextual insights in a low latency manner. In a highly connected world, real-time insights can add huge value to businesses. Customers across verticals are using Confluent Cloud and Neo4j to solve problems the moment they happen. Graph Data Science algorithms are leveraged to understand the seemingly random network, derive hidden insights, predict and prescribe the next course of action.

To know more about Neo4j and its use cases, reach out to ecosystem@neo4j.com.

Source : Data Analytics Read More

Scaling reaction-based enumeration for next-gen drug discovery using Google Cloud

Scaling reaction-based enumeration for next-gen drug discovery using Google Cloud

Discovering new drugs is at the heart of modern medicine, yet finding a “needle in the haystack” is immensely challenging due to the enormous number of possible drug-like compounds (estimated at 10^60 or more). To increase our chances of finding breakthrough medicines for patients with unmet medical needs, we need to explore the vast universe of chemical compounds and use predictive in silico methods to select the best compounds for lab-based experiments. Enter reaction-based enumeration, a powerful technique that generates novel, synthetically accessible molecules. Our team at Psivant has been pushing the boundaries of this process to an unprecedented scale, implementing reaction-based enumeration on Google Cloud. By tapping into Google Cloud’s robust infrastructure and scalability, we’re unlocking the potential of this technique to uncover new chemical entities, leading to groundbreaking advancements and life-altering therapeutics.

Our journey began with a Python-based prototype, leveraging RDKit for chemistry and Ray for distributed computing. Despite initial progress, we encountered a roadblock: our on-premises computing resources were limited, holding back our prototype’s potential. While we could explore millions of compounds, our ambition was to explore billions and beyond. To address this limitation, we sought a solution that offered greater flexibility and scalability, leading us to the powerful ecosystem provided by Google Cloud.

Leveraging Google Cloud infrastructure

Google Cloud’s technologies allowed us to supercharge our pipelines and conduct chemical compound exploration at scale. By integrating Dataflow, Google Workflows, and Compute Engine, we built a sophisticated, high-performance system that is both flexible and resilient. 

Dataflow is a managed batch and streaming system that provides real-time, fault-tolerant, and parallel processing capabilities to manage and manipulate massive datasets effectively. Google Workflows orchestrates the complex, multi-stage processes involved in enumeration, ensuring smooth transitions and error handling across various tasks. Finally, Compute Engine provides us with scalable, customizable infrastructure to run our demanding computational workloads, ensuring optimal performance and cost-effectiveness. Together, these technologies laid the foundation for our cutting-edge solution to explore the endless possibilities of reaction-based enumeration.

We built a cloud-native solution to achieve the scalability we sought, taking advantage of Dataflow, which relies on Apache Beam, a versatile programming model with its own data structures, such as the PCollection — a distributed dictionary designed to handle computation efficiently.

Enter Dataflow 

Balancing performance and cost-efficiency was crucial during pipeline development. That is where Dataflow came in, allowing us to optimize resource utilization without compromising performance, paving the way for optimal resource allocation and cost control.

Our pipeline required a deep understanding of the chemistry libraries and Google Cloud ecosystem. We built a simple, highly distributed enumeration pipeline, then added various chemistry operations while ensuring scalability and performance at every step. Google Cloud’s team played a pivotal role in our success, providing expert guidance and troubleshooting support.

To 100 billion and beyond

Our journey implementing reaction-based enumeration at scale on Google Cloud has been an inspiring testament to the collaborative spirit, relentless innovation, and unwavering pursuit of excellence. With smart cloud-native engineering and cutting-edge technologies, our workflow achieves rapid scalability, capable of deploying thousands of workers within minutes, enabling us to explore an astounding 100 billion compounds in under a day. Looking ahead, we’re excited to integrate Vertex AI into our workflow as our go-to MLOps solution, and to supercharge our high-throughput virtual screening experiments with the robust capabilities of Batch, further enhancing our capacity to innovate.

We’d like to extend our heartfelt thanks to Javier Tordable for his guidance in distributed computing, enriching our understanding of building a massively scalable pipeline.

As we persistently push the boundaries of computational chemistry and drug discovery, we are continuously motivated by the immense potential of reaction-based enumeration. This potential is driven by the powerful and flexible infrastructure of Google Cloud, combined with the comprehensive capabilities of Psivant’s QUAISAR platform. Together, they empower us to design the next generation of groundbreaking medicines to combat the most challenging diseases.

Source : Data Analytics Read More

Transform your unstructured data with AI using BigQuery object tables, now GA

Transform your unstructured data with AI using BigQuery object tables, now GA

Today, the vast majority of data that gets generated in the world is unstructured (text, audio, images), but only a fraction of it ever gets analyzed. The AI pipelines required to unlock the value of this data are siloed from mainstream analytic systems, requiring engineers to build custom data infrastructure to integrate structured and unstructured data insights. 

Our goal is to help you realize the potential of all your data, whatever its type and format. To make this easier, we launched the preview of BigQuery object tables at Google Cloud Next 2022. Powered by BigLake, object tables provide BigQuery users a structured record interface for unstructured data stored in Cloud Storage. With it, you can use existing BigQuery frameworks to process and manage this data using object tables in a secure and governed manner. 

Since we launched the preview, we have seen customers use object tables for many use cases and are excited to announce that object tables are now generally available.

Analyzing unstructured data with BigQuery object tables

Object tables let you leverage the simplicity of SQL to run a wide range of AI models on your unstructured data. There are three key mechanisms for using AI models; all enabled through the BigQuery Inference engine

First, you can import your models and run queries on the object table to process the data within BigQuery. This approach works well for customers looking for an integrated BigQuery solution that allows them to utilize their existing BigQuery resources. Since the preview, we’ve expanded support beyond TensorFlow models with TF-Lite and ONNX models and introduced new scalar functions to pre-process images. We also added support for saving pre-processed tensors to allow for efficient multi-model use of tensors to help you reduce slot usage. 

Second, you can choose from various pre-trained Google models such as Cloud Vision API, Cloud Natural Language API, and Cloud Translation API, for which we have added pre-defined SQL table valued functions that invoke when querying an object table. The results of the inference are stored as a BigQuery table. 

Third, you can integrate customer-hosted AI models or custom models built through Vertex AI using remote functions. You can call these remote functions from BigQuery SQL to serve objects to models, and the results are returned as BigQuery tables. This option is well suited if you run your own model infrastructure such as GPUs, or have externally maintained models. 

During the preview, customers used a mix of these integration mechanisms to unify their AI workloads with data already present in BigQuery. For example, Semios, an agro-tech company, uses imported and remote image processing models to serve precision agriculture use cases. 

“With the new imported model capability with object table, we are able to import state-of-the-art Pytorch vision models to process image data and improve in-orchard temperature prediction using BigQuery. And with the new remote model capability, we can greatly simplify our pipelines and improve maintainability.” – Semios

Storage insights, fine-grained security, sharing and more 

Beyond processing with AI models, customers extending existing data management frameworks to unstructured data, resulting in several novel use cases such as:

Cloud Storage insights – Objects tables provide an SQL interface to Cloud Storage metadata (e.g., storage class), making it easy to build analytics on Cloud Storage usage, understand growth, optimize costs, and inform decisions to better manage data.

Fine-grained access control at scale – Object tables are built on BigLake’s unified lakehouse infrastructure and support row- and column-level access controls. You can use it to secure specific objects with governed signed URLs. Fine-grained access control has broad applicability for augmenting unstructured data use cases, for example securing specific documents or images based on PII inferences returned by the AI model.  

Sharing with Analytics Hub – You can share object tables, similar to BigLake tables, via Analytics Hub, expanding the set of sharing use cases for unstructured data. Instead of sharing buckets, you now get finer control over the objects you wish to share with partners, customers, or suppliers.  

Run generative AI workloads using object tables (Preview)

Members of Google Cloud AI’s trusted tester program can use a wide range of generative AI models available in Model Garden to run on the object table. You can use Generative AI studio to decide on a foundation model of your choice or fine-tune it to deploy a custom API endpoint. You can then call this API using BigQuery using the remote function integration to pass prompts/inputs and return the text results from Language Learning Models (LLM) in a BigQuery table. In the coming months, we will enable SQL functions through the BigQuery Inference engine to call LLMs directly, further simplifying these workloads. 

Getting started

To get started, follow along with a guided lab or tutorials to run your first unstructured data analysis in BigQuery. Learn more by referring to our documentation.

Acknowledgments: Abhinav Khushraj, Amir Hormati, Anoop Johnson, Bo Yang, Eric Hao, Gaurangi Saxena, Jeff Nelson, Jian Guo, Jiashang Liu, Justin Levandoski, Mingge Deng, Mujie Zhang, Oliver Zhuang, Yuri Volobuev and rest of the BigQuery engineering team who contributed to this launch.

Source : Data Analytics Read More

How an open data cloud is enabling Airports of Thailand and EVme to reshape the future of travel

How an open data cloud is enabling Airports of Thailand and EVme to reshape the future of travel

Aviation and accommodation play a big role in impacting the tourism economy, but analysis of recent data also highlights tourism’s impact on other sectors, from financial services to healthcare, to retail and transportation. 

With travel recovery in full swing post pandemic, Google search queries related to “travel insurance” and “medical tourism” in Thailand have increased by more than 900% and 500% respectively. Financial institutions and healthcare providers must therefore find ways to deliver tailored offerings to travelers who are seeking peace of mind from unexpected changes or visiting the country to receive specialized medical treatment.

Interest in visiting Thailand for “gastronomy tourism” is also growing, with online searches increasing by more than 110% year-on-year.  Players in the food and beverage industry should therefore be looking at ways to better engage tourists keen on authentic Thai cuisine.

Most importantly, digital services will play an integral role in travel recovery. More than one in two consumers in Thailand are already using online travel services, with this category expected to grow 22% year-on-year and contribute US$9 billion to Thailand’s digital economy by 2025. To seize growth opportunities amidst the country’s tourism rebound, businesses cannot afford to overlook the importance of offering always-on, simple, personalized, and secure digital services.

That is why Airports of Thailand (AOT), SKY ICT (SKY) and EVME PLUS (EVme) are adopting Google Cloud’s open data cloud to deliver sustainable, digital-first travel experiences.

Improving the passenger experience in the cloud

With Thailand reopening its borders, there has been an upturn in both inbound and outbound air travel. To accommodate these spikes in passenger traffic across its six international airports, AOT migrated its entire IT footprint to Google Cloud, which offers an open, scalable, and secure data platform, with implementation support from its partner SKY, an aviation technology solutions provider.

Tapping on Google Cloud’s dynamic autoscaling capabilities, the IT systems underpinning AOT’s ground aviation services and the SAWASDEE by AOT app can now accommodate up to 10 times their usual workloads. AOT can also automatically scale down its resources to reduce costs when they are no longer in use. Using the database management services of Google Cloud to eliminate data silos, the organization is able to enhance its capacity to deliver real-time airport and flight information to millions of passengers. As a result, travelers enjoy a smoother passenger experience, from check-in to baggage collection.

At the same time, SKY uses Google Kubernetes Engine (GKE) to transform SAWASDEE by AOT into an essential, all-in-one travel app that offers a full range of tourism-related services. GKE allows AOT to automate application deployment and upgrades without causing downtime. This frees up time for the tech team to accelerate the launch of new in-app features, such as a baggage tracker service, airport loyalty programs, curated travel recommendations, an e-payment system, and more.

EVme drives sustainable travel with data

Being able to travel more efficiently is only one part of the future of travel. More than ever, sustainability is becoming a priority for consumers when they plan their travel itineraries. For instance, search queries related to “sustainable tourism” in Thailand have increased by more than 200% in the past year, with close to four in 10 consumers sharing that they are willing to pay more for a sustainable product or service.

To meet this increasing demand and support Thailand’s national efforts to become a low-carbon society, EVme, a subsidiary of PTT Group, is building its electric vehicle lifestyle app on Google Cloud, the industry’s cleanest cloud. It has also deployed the advanced analytics and business intelligence tools of Google Cloud to offer its employees improved access to data-driven insights, which helps them better understand customer needs and deliver personalized interactions. These insights have helped EVme determine the range of electric vehicle models it offers for rental via its app, so as to cater to different preferences. At the same time, the app can also share crucial information, such as the availability of public electric vehicle charging stations, while providing timely support and 24-hour emergency assistance to customers.

As we empower organizations across industries with intelligent, data-driven capabilities to make smarter business decisions and be part of an integrated ecosystem that delivers world-class visitor experiences, our collaborations with AOT, SKY, and EVme will enhance their ability to serve travelers with personalized, digital-first offerings powered by our secure and scalable open data cloud.

Source : Data Analytics Read More

Faster together: How Dun & Bradstreet datasets accelerate your real-time insights

Faster together: How Dun & Bradstreet datasets accelerate your real-time insights

At the third annual Google Data Cloud and AI Summit, we shared how data analytics and insights continue to be a key focus area for our customers and how we’re accelerating their data journeys through new product innovations and partner offerings. 

A big part of that is helping customers turn their data into insights faster using differentiated datasets from partners and integrating them into their AI/ML workflows. We recently announced our partnership with Crux to add over 1,000 new datasets on Analytics Hubto provide customers with access to a rich ecosystem of data to enrich first-party data assets and accelerate time to value and scalability with real time insights. There will be an initial focus on Financial Services, ESG, and Supply Chain, but we plan to increase this to 2,000 datasets later this year. These datasets are critical to our customers who execute highly process-intensive analytics workloads for trading, planning, and risk calculations.

An industry leader, Dun & Bradstreet, will also make much of its catalog available on Analytics Hub and listed on the Google Cloud Marketplace. This will enable customers to achieve the same benefits they receive for SaaS purchases in the Marketplace, including simplified procurement, consolidated billing, and financial incentives. 

“We are excited to build upon our ten-year relationship with Google Cloud and both companies’ commitments to deliver innovative opportunities to our mutual customers,” said Ginny Gomez, President, Dun & Bradstreet North America. “By making D&B datasets and products available in the Google Cloud Marketplace, we are making it easier for our customers to access and utilize this critical data, while also helping to provide a frictionless procurement process for customers to use their committed Google Cloud spend.”

When you purchase and subscribe to a dataset in the Google Cloud Marketplace, the data is immediately accessible via your BigQuery environment via Analytics Hub, without ingress, storage charges, or wait times. This allows your project teams to leverage Google Cloud AI/ML, BigQuery, and other third-party innovations to get valuable insights from datasets with ease. This is a commercial expansion on the hundreds of public and free datasets already listed in the Google Cloud Marketplace.

Analytics Hub is built on a decade of data sharing in BigQuery. Since 2010, BigQuery has supported always-live, in-place data sharing within an organization’s security perimeter, as well as data sharing across boundaries to external organizations. Analytics Hub makes the administration of sharing assets across boundaries even easier and more scalable, while retaining access to key capabilities of BigQuery like its time-tested sharing infrastructure, and built-in ML, real-time and geospatial analytics. 

These datasets on Marketplace also benefit from BigQuery’s advantages:

Scale: BigQuery is an exabyte-scale data warehouse that can handle even the most demanding data sharing needs. It grows with your data needs including auto scaling capabilities.

Security: BigQuery is built on Google’s secure infrastructure and offers various security features to protect your data. Data is always encrypted and PII data discovery services can be directly used to improve the security of the data.

Freshness: BigQuery data can be shared without moving it, this means you can join shared data with your own data with no need to implement expensive ETLs to bring the data from the providers

Cost-effectiveness: BigQuery provides different billing models so each workload can make use of the data providing the best price/performance.

At Google Cloud, we believe data and AI have the power to transform businesses and unlock the next wave of innovation. We are excited to share that customers can now procure new data assets on the Google Cloud Marketplace to accelerate their business decisions and drive new innovations. Customers interested in these datasets, can request a custom quote, or more information by clicking Contact Sales on the Marketplace product page and completing the inquiry form.

Source : Data Analytics Read More

Introducing Dataflow Cookbook: Practical solutions to common data processing problems

Introducing Dataflow Cookbook: Practical solutions to common data processing problems

Organizations like Tyson Foods, Renault, and Air Asia use real-time intelligence solutions from Google Cloud to transform their data clouds and solve for new customer challenges in an always-on, digitally connected world. And as more companies move their data processing to the cloud, Google Cloud Dataflow has become a popular choice. 

Dataflow is a powerful and flexible data processing service that can be used to build streaming and batch data pipelines, from reading from messaging services like Pub/Sub to writing to a data warehouse like BigQuery. To help new users get started and master the many features Dataflow offers, we are thrilled to announce the Dataflow Cookbook.

This cookbook is designed to help developers and data engineers accelerate their productivity by providing a range of practical solutions to common data processing challenges. In addition to the recipes, the cookbook also includes best practices, tips, and tricks that will help developers optimize their pipelines and avoid common pitfalls.

The cookbook is available in Java, Python and Scala (via Scio), and organized in folders depending on what the use case is. Every example is self-contained and as minimal as possible, using public resources when possible so that you can use the examples without any extra preparation. Some examples you can find:

Reading and writing data from various sources: Dataflow can read / write data from a wide variety of sources, including Google Cloud Storage, BigQuery, and Pub/Sub. The examples on the cookbook cover the most common approaches when reading, writing, and handling data

Windowing and triggers: Many data processing tasks involve analyzing data over a certain period of time. Recipes cover how to use windowing functions in Dataflow to group streaming data into time-based intervals, as well as triggers.

Advanced topics: We have included more advanced pipeline patterns with StatefulDoFns and custom window implementations.

How can I get started? 

We believe that this cookbook will be a valuable resource for anyone working with Dataflow. Whether you’re new to the platform and want to learn, or you are an experienced user that wants to speed up creating new pipelines by merging examples together. We’re excited to share our knowledge with the community and look forward to seeing how it helps developers and data engineers achieve their goals. The cookbook is available on GitHub. Get it there and let us know what you think!

Source : Data Analytics Read More

What is Cloud Scheduler?

What is Cloud Scheduler?

If you want to get right into using Google Cloud Scheduler, check out this interactive tutorial!

So, there you are relishing in the fact that you’ve just set up a new database and pipeline to process large amounts of your organization’s data regularly. Feeling alive! 

Your manager congratulates you (great job, btw!) and then informs you that you’re now in charge of maintaining this moving forward. Something she didn’t mention when you started this project. 

You quickly begin to visualize your weekends taken up by the heavily manual process of backing up the database and making sure that it’s properly feeding into the pipeline, before the start of each week. Long hours, using multiple interfaces to make sure nothing fails and starting over  when things do. 

Well you’re not alone, managing large-scale operations that involve many different tasks and dependencies is one of the biggest challenges that developers and businesses face.  

Not only is it manual, time consuming and hard to keep track of everything going on in your cloud environment, it also makes it really difficult to scale your organization’s IT by creating  bottlenecks. 

If only you could have a way to schedule all of these minute administrative tasks to be completed for you, so you could focus on  more strategic, impactful tasks.  

Well, from the title of this blogpost you can probably guess that Google Cloud Scheduler (Cloud Scheduler) IS your fully-managed, highly reliable, scheduling service. 

It allows you to schedule and automate tasks that perform routine maintenance, data processing or to trigger workflows. Think –  scheduled data backups, updates, batch processing, application monitoring, automated testing and report generation. Plus, it comes with an intelligent retry mechanism that you can configure to rerun failed jobs, up until the point that you specify.

It supports a variety of scheduling options, including specific date and time, recurring intervals and cron expressions or jobs. 

Cron jobs are a commonly used scheduling tool in the software development industry (Linux and Unix world), allowing developers to schedule tasks at specific intervals or on specific dates and times. These intervals are specified using a formatbased onunix-cron.

You can use Cloud Scheduler to accomplish tasks like:

easily scheduling recurring data backups, ensuring your data is safe and secure in the event of a system failure or data loss

processing large amounts of data in batches at specific intervals 

monitoring your applications for errors or performance issues and send notifications to your team if issues are detected

Cloud Scheduler is designed to make it easier to manage your cloud environment by allowing you to define schedules, run tasks and manage results, all from a single, centralized location – this gives you better control and more visibility over your infrastructure. 

Plus, the fully managed part, means you have the reliability of Google’s robust infrastructure, so you can be confident that

your jobs will run when they are scheduled to do so

they’ll scale automatically to meet your needs – handling any volume of jobs from a few tasks per day to thousands of tasks at a time, without any additional configuration. (no worrying about capacity constraints!)

The automation of these processes reduces the risk of human error and frees up your time and mental energy to focus on more important tasks, making you more productive and organized.

So… how does it work?

Cloud Scheduler works by allowing developers to create jobs that run at specific intervals or when triggered by an event based on specific conditions. 

Jobs can be created using the Cloud Scheduler UI, a command line interface or via the API. 

Cloud Scheduler can trigger your jobs in a variety of ways and currently supports a wide range of job types, including HTTP/HTTPS requests, and Pub/Sub messages You can also define custom job types using Cloud Functions or Cloud Run, which gives you the flexibility to run any type of task in your cloud environment.

Once a job is created, you can specify the schedule for it to run on and from there Cloud scheduler takes care of the rest. It automatically triggers your task at the specified time or interval and ensures that it runs reliably and consistently.  

As mentioned above, you can create schedules using Unix cron expressions to specify seconds, minutes, hours, days, months and weekdays. 

The scheduler then triggers the job at  the target location specified. This can be any web service or application, and the request can include parameters and data. The target location can then process the request and perform the necessary tasks. 

For example, if you wanted to schedule a job to run a data import from BigQuery, you could create a job that specifies the file location and the target location (cloud storage in this case) for the imported data.

After the job is executed, Cloud Scheduler provides detailed information about its status and results. This information is stored in logs and can be accessed using the Google Cloud Console, APIs or third-party tools.  It’s especially useful to set up Cloud Logging and Cloud Monitoring, which provide real-time visibility into the status and performance of scheduled jobs, enabling you to monitor job execution, troubleshoot issues, and optimize performance. 

If you’re thinking – this all sounds great, but am I going to have to give an arm instead of my weekends to make this happen? No, keep your arm. Cloud Scheduler is a cost-effective solution where you only pay for the tasks that are executed AND it has a tier that provides up to three jobs a month for free (at the account level, not project level).

 With Cloud Scheduler you can schedule:

Data backups for databases, file systems and other data sources – ensuring your organizations critical data is backed up regularly and can be restored in case of a disaster

Resource management tasks, such as starting and stopping VM instances, deleting temporary files and cleaning up unused resources, which can help optimize resource usage and reduce costs

Data pipeline tasks that move and transform data across different systems both inside and outside of Google Cloud. 

Fundamentally, Cloud Scheduler is more than just a simple job scheduling tool. Its flexibility and integration with other Google Cloud services make it a powerful tool for orchestrating complex workflows, managing resources, automating tasks and saving weekends. 

For some hands-on play time with Cloud Scheduler right now, check out this interactive tutorial where you can run your first job for free!

Source : Data Analytics Read More

Introducing BigQuery differential privacy and partnership with Tumult Labs

Introducing BigQuery differential privacy and partnership with Tumult Labs

We are excited to announce the public preview of BigQuery differential privacy, an SQL building block that analysts and data scientists can use to anonymize their data. In the future, we’ll integrate differential privacy with BigQuery data clean rooms to help organizations anonymize and share sensitive data, all while preserving privacy. 

This launch adds differential privacy to Google SQL for BigQuery, building on the open-source differential privacy library that is used by Ads Data Hub and the COVID-19 Community Mobility Reports. Google’s research in differentially private SQL was published in a 2019 paper and was recognized with the Future of Privacy Forum’s 2021 Award for Research Data Stewardship

We’re also excited to announce our partnership with Tumult Labs, a leader in differential privacy for companies and government agencies. Tumult Labs offers technology and professional services to help Google Cloud customers with their differential privacy implementations. Learn more about how Tumult Labs can help you below.

What is differential privacy?

Differential privacy is an anonymization technique that limits the personal information that is revealed by an output. Differential privacy is commonly used to allow inferences and to share data while preventing someone from learning information about an entity in that dataset. 

Advertising, financial services, healthcare, and education companies use differential privacy to perform analysis without exposing individual records. Differential privacy is also used by public sector organizations like the U.S. Census and by companies that comply with the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Family Educational Rights and Privacy Act (FERPA), and the California Consumer Privacy Act (CCPA).

What can I do with BigQuery differential privacy?

With BigQuery differential privacy, you can:

Anonymize results with individual-record privacy

Anonymize results without copying or moving your data, including data from AWS and Azure with BigQuery Omni

Anonymize results that are sent to Dataform pipelines so that they can be consumed by other applications

Anonymize results that are sent to Apache Spark stored procedures

Use additional differential privacy features by calling external frameworks and platforms like PipelineDP.io and Tumult Analytics 

[Coming soon] Use differential privacy with authorized views and authorized routines

[Coming soon] Share anonymized data with BigQuery Data Clean Rooms 

BigQuery differential privacy also works with your existing security controls so you can:

Anonymize results while using row- and column-level security, dynamic data masking, and column-level encryption

Prevent sensitive data from being queried without proper permission using Data profiles for BigQuery data

How do I get started?

Differential privacy is now part of GoogleSQL for BigQuery and is available in all editions and the on-demand pricing model.

You can apply differential privacy to the following aggregate functions to anonymize the results:

SUM

COUNT

AVG

PERCENTILE_CONT

Here is a sample differential privacy query on a BigQuery public dataset that computes the 50th and 90th percentiles of Medicare beneficiaries by provider type. This query anonymizes the percentile results that are calculated using the physician identifier to protect physician privacy.

Note: The parameters in DIFFERENTIAL_PRIVACY OPTIONS in this sample query are not recommendations. You can learn more about how privacy parameters work in the differential privacy clause and can work with your privacy officer or with a Google partner to determine the optimal privacy parameters for your dataset and organization.

code_block[StructValue([(u’code’, u’SELECTrnWITHrn DIFFERENTIAL_PRIVACYrn OPTIONS (rn epsilon = 1,rn delta = 1e-7,rn privacy_unit_column = npi)rn provider_type,rnPERCENTILE_CONT(rn bene_unique_cnt, 0.5, contribution_bounds_per_row => (0, 10000))rn percentile_50th,rnPERCENTILE_CONT(rn bene_unique_cnt, 0.9, contribution_bounds_per_row => (0, 10000))rn percentile_90thrnFROM `bigquery-public-data.cms_medicare.physicians_and_other_supplier_2015`rnWHERE provider_type IS NOT NULLrnGROUP BY 1rnORDER BY 2 DESCrnLIMIT 10;rnrn– Query results may differ slightly with each run due to noise being appliedrn/*————————————–+—————–+—————–*rn| provider_type | percentile_50th | percentile_90th |rn+————————————–+—————–+—————–+rn| Peripheral Vascular Disease | 132.95 | 3134.24 |rn| Ambulance Service Supplier | 101.81 | 697.79 |rn| Multispecialty Clinic/Group Practice | 75.03 | 2316.40 |rn| Addiction Medicine | 68.38 | 3811.18 |rn| Public Health Welfare Agency | 67.27 | 597.46 |rn| Neuropsychiatry | 63.85 | 375.88 |rn| Emergency Medicine | 62.86 | 272.00 |rn| Centralized Flu | 52.97 | 216.98 |rn| Clinical Laboratory | 52.04 | 744.01 |rn| Ophthalmology | 49.93 | 282.12 |rn*————————————–+—————–+—————–*/’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e80245580d0>)])]

How can Tumult Labs help me?

Some uses of differential privacy require features like privacy accounting or variants like zero-concentrated differential privacy. Through our partnership with Tumult Labs, you can ensure that your use of BigQuery differential privacy:

Aligns with compliance and regulatory requirements

Certifies that your use of differential privacy provides end-to-end privacy guarantees

Balances data sharing with privacy risk

Learn more about how Tumult Labs can help you with BigQuery differential privacy here.

Where can I learn more?

Learn more about BigQuery differential privacy at:

The differential privacy clause

Let us know where you need help with BigQuery differential privacy.

Source : Data Analytics Read More

Looker now available from Google Cloud console

Looker now available from Google Cloud console

Looker helps make it easy to get insights from business data and to build data-driven applications with unified metrics for ease of collaboration. Today, we are bringing Looker to the Google Cloud console, making it simpler than ever to acquire, deploy, and manage Looker in your Google Cloud environment, in a solution we call Looker (Google Cloud core). 

The ability to configure and create a Looker instance from the console empowers customers to test Google’s business intelligence solution in their environment, including the option of a no-cost 30-day trial.

Looker (Google Cloud core) offers organizations a fresh, consistent real-time view of their business data, and extends the benefits that a commissioned study by Forrester Consulting on behalf of Google – “The Total Economic Impact™ Of Google BigQuery and Looker” (April 2023) says leads to an ROI of greater than 200%, while bringing the offering closer to Google Cloud’s array of leading products. This new offering builds upon the semantic modeling and data exploration capabilities Looker has been known for over the last decade and adds expanded security options, Google Cloud integrations, and instance management features, such as:

Enterprise-grade security features, including support for customer-managed encryption keys, private IP access, and the ability to deploy Looker within a VPC-SC perimeter.

Built-in connectivity support that simplifies the connection to BigQuery.

Integration with Google Cloud Identity Access Management (IAM), enabling customers to manage Looker administration and user access.

Control over instance management and maintenance with self-defined maintenance and maintenance exclusion windows, custom URL configuration, and custom analytics and auditing based on log data.

“Google Cloud has made significant investments to make it easy for Looker customers to leverage other products like BigQuery and Connected Sheets,” said Doug Henschen, vice president and principal analyst at Constellation Research. “With the launch of Looker in the Google Cloud console, Google is making it easier for existing users to explore Looker’s capabilities, including its semantic model, alongside security features, integrations, and instance management.”

Introducing Looker (Google Cloud core) editions

Looker (Google Cloud core) debuts with two editions for internal business intelligence use cases: Standard and Enterprise, as well as a dedicated Embed variant.

Looker (Google Cloud core) Standard edition is tailored for small teams and small or medium-sized businesses with up to 50 internal platform users. In addition to existing Looker features, the Standard Edition brings new functionality including Google Cloud identity access management and simplified BigQuery connectivity.

Looker (Google Cloud core) Enterprise edition includes all features found in the Standard Edition, and expanded capabilities to support no cap on the number of users, additional security features like VPC-SC and Private IP and more robust monitoring through Elite System Activity to support larger deployments.

Looker (Google Cloud core) Embed makes it easy  for you to create innovative data applications quickly and with less code by embedding Looker into your applications, products, or internal portals. In addition to all the features included in the Enterprise edition, Looker (Google Cloud core) Embed offers 500,000 Query API calls per month, and 100,000 Admin API calls per month, SSO embed and private label capabilities.

Next Steps

Looker (Google Cloud core) editions are now generally available. Start by accessing the Looker product page in Google Cloud console to learn more and start your trial. Existing Looker customers interested in Looker (Google Cloud core) should contact your Google Cloud or partner representative.

Related Article

Introducing the next evolution of Looker, your unified business intelligence platform

Presenting the future of business intelligence: Looker, which now has deep integration with Data Studio and Google’s top products in AI/M…

Read Article

Source : Data Analytics Read More