Export Google Cloud data into Elastic Stack with Dataflow templates

Export Google Cloud data into Elastic Stack with Dataflow templates

At Google Cloud, we’re focused on solving customer problems while supporting a thriving partner ecosystem. Many of you use third-party monitoring solutions to keep a tab on your multi-cloud or hybrid cloud environments, be it for IT operations, security operations, application performance monitoring, or cost analysis. At the same time, you’re looking for a cloud-native way to reliably export your Google Cloud logs, events, and alerts at scale.

As part of our efforts to expand the set of purpose-built Dataflow templates for these common data movement operations, we launched three Dataflow templates to export Google Cloud data into your Elastic Cloud or your self-managed Elasticsearch deployment: Pub/Sub to Elasticsearch (streaming), Cloud Storage to Elasticsearch (batch) and BigQuery to Elasticsearch (batch).

In this blog post, we’ll show you how to set up a streaming pipeline to export your Google Cloud logs to Elastic Cloud using the Pub/Sub to Elasticsearch Dataflow template. Using this Dataflow template, you can forward to Elasticsearch any message that can be delivered to a Pub/Sub topic, including logs from Cloud Logging or events such as security findings from Cloud Security Command Center

The step-by-step walkthrough covers the entire setup, from configuring the originating log sinks in Cloud Logging, to setting up Elastic integration with GCP in Kibana UI, to visualizing GCP audit logs in a Kibana dashboard.

Push vs. Pull

Traditionally, Elasticsearch users have the option to pull logs from Pub/Sub topics into Elasticsearch via Logstash or Beats as a data collector. This documented solution works well, but it does include tradeoffs that need to be taken into account:

Requires managing one or more data collectors with added operational complexity for high availability and scale-out with increased log volume

Requires external resource access to Google Cloud by giving permissions to aforementioned data collectors to establish subscription and pull data from one or more Pub/Sub topics.

We’ve heard from you that you need a more cloud-native approach that streams logs directly into your Elasticsearch deployment without the need to manage an intermediary fleet of data collectors. This is where the managed Cloud Dataflow service comes into play: A Dataflow job can automatically pull logs from a Pub/Sub topic, parse payloads and extract fields, apply an optional JavaScript user-defined function (UDF) to transform or redact the logs, then finally forward to the Elasticsearch cluster.

Set up logging export to Elasticsearch

This is how the end-to-end logging export looks:

Below are the steps that we’ll walk through:

Set up Pub/Sub topics and subscriptions

Set up a log sink

Set IAM policy for Pub/Sub topic

Install Elastic GCP integration

Create API key for Elasticsearch

Deploy Pub/Sub to the Elastic Dataflow template

View and analyze GCP logs in Kibana

Set up Pub/Sub topics and subscriptions

First, set up a Pub/Sub topic that will receive your exported logs, and a Pub/Sub subscription that the Dataflow job can later pull logs from. You can do so via the Cloud Console or via CLI using gcloud. For example, using gcloud looks like this:

Note: It is important to create the subscription before setting up the Cloud Logging sink to avoid losing any data added to the topic prior to the subscription getting created.

Repeat the same steps for the Pub/Sub deadletter topic that holds any undeliverable message, due to pipeline misconfigurations (e.g. wrong API key) or inability to connect to Elasticsearch cluster:

Set up a Cloud Logging sink

Create a log sink with the previously created Pub/Sub topic as destination. Again, you can do so via the Logs Viewer, or via CLI using gcloud logging. For example, to capture all logs in your current Google Cloud project (replace [MY_PROJECT]), use this code:

Note: To export logs from all projects or folders in your Google Cloud organization, refer to aggregated exports for examples of “gcloud logging sink” commands. For example, provided you have the right permissions, you may choose to export Cloud Audit Logs from all projects into one Pub/Sub topic to be later forwarded to Elasticsearch.

The output of this last command is similar to this:

Take note of the service account [LOG_SINK_SERVICE_ACCOUNT] returned. It typically ends with @gcp-sa-logging.iam.gserviceaccount.com.

Set IAM policy for Pub/Sub topic

For the sink export to work, you need to grant the returned sink service account a Cloud IAM role so it has permission to publish logs to the Pub/Sub topic:

If you created the log sink using the Cloud Console, it will automatically grant the new service account permission to write to its export destinations, provided you own the destination. In this case, it’s Pub/Sub topic my-logs.

Install Elastic GCP integration

From Kibana web UI, navigate to ‘Integrations’ and search for GCP. Select ‘Google Cloud Platform (GCP)’ integration, then click on ‘Add Google Cloud Platform (GCP)’.

In the following screen, make sure to uncheck ‘Collect Google Cloud Platform (GCP) … (input: gcp-pubsub)’ since we will not rely on pollers to pull data from Pub/Sub topic, and rather on Dataflow pipeline to stream that data in.

Create API key for Elasticsearch

If you don’t already have an API key for Elasticsearch, navigate to ‘Stack Management’ > ‘API keys’ to create an API key from Kibana web UI. Refer to Elastic docs for more details on Elasticsearch API keys. Take note of the base64-encoded API key which will be used later by your Dataflow pipeline to authenticate with Elasticsearch.

Before proceeding, take also note of your Cloud ID which can be found from Elastic Cloud UI under ‘Cloud’ > ‘ Deployments’.

Deploy Pub/Sub to Elastic Dataflow pipeline

The Pub/Sub to Elastic pipeline can be executed either from the Console, gcloud CLI, or via a REST API call (more detail here). Using the Console as example, navigate to the Dataflow Jobs page, click ‘Create Job from Template’ then select “Cloud Pub/Sub to Elasticsearch” template from the dropdown menu. After filling out all required parameters, the form should look similar to this:

Click on ‘Show Optional Parameters’ to expand the list of optional parameters.

Enter ‘audit’ for ‘The type of logs…’ parameter to specify the type of dataset we’re sending in order to populate the corresponding GCP audit dashboard available in the GCP integration you enabled previously in Kibana:

Once you click “Run job”, the pipeline will start streaming events to Elastic Cloud after a few minutes. You can visually check correct operation by clicking on the Dataflow job and selecting the “Job Graph” tab, which should look as below. In our test project, the Dataflow step WriteToElasticsearch is sending a little over 2,800 elements per second at that point in time:

Now head over to Kibana UI, and navigate under ‘Observability’ > ‘Overview’  to quickly inspect that your GCP audit logs are being ingested in Elasticsearch:

Visualize GCP Audit logs in Kibana

You can now view Google Cloud audit logs from your Kibana UI search interface. Navigate to either ‘Observability’ > ‘Logs’ > ‘Stream’ or ‘Analytics’ > ‘ Discover’, and type the following simple query in KQL to filter for GCP audit logs only:


The above table was produced after selecting the following fields as columns in order to highlight who did what to which resource:

protoPayload.authenticationInfo.principalEmail – Who

protoPayload.methodName – What

protoPayload.serviceName – Which (service)

protoPayload.resourceName – Which (resource)

Open GCP Audit dashboard in Kibana

Navigate to ‘Analytics’ > ‘Dashboards’, and search for ‘GCP’. Select ‘[Logs GCP] Audit’ dashboard to visualize your GCP audit logs. Among other things, this dashboard displays a map view of where your cloud activity is coming from, a timechart of activity volume, and a breakdown of top actions and resources acted on.

But wait, there’s more!

Pub/Sub to Elasticsearch Dataflow template is meant to abstract away the heavy-lifting when it comes to reliably collecting voluminous logs in near real-time. At the same time, it offers advanced customizations to tune the pipeline to your own requirements with optional parameters such as delivery batch size (in number of messages or bytes) for throughput, retry settings (in number of attempts or duration) for fault tolerance, and a custom user-defined function (UDF) to transform the output messages before delivery to Elasticsearch. To learn more about Dataflow UDFs along with specific examples, see Extend your Dataflow templates with UDFs.

In addition to Pub/Sub to Elasticsearch Dataflow template, there are two new Dataflow templates to export to Elasticsearch depending on your use case: 

Cloud Storage to Elasticsearch: Use this Dataflow template to export rows from CSV files in Cloud Storage into Elasticsearch as JSON documents.

BigQuery to Elasticsearch: Use this Dataflow template to export rows from a BigQuery table (or results from a SQL query) into Elasticsearch. This is particularly handy to forward billing data by Cloud Billing or assets metadata snapshots by Cloud Asset Inventory, both of which can be natively exported to BigQuery. 

What’s next?

Refer to our user docs for the latest reference material on all Google-provided Dataflow templates including the Elastic Dataflow ones described above. We’d like to hear your feedback and feature requests. You can create an issue directly in the corresponding GitHub repo, or create a support case directly from your Cloud Console, or ask questions in our Stack Overflow forum.

To get started with Elastic Cloud on Google Cloud, you can subscribe via Google Cloud Marketplace and start creating your own Elasticsearch cluster on Google Cloud within minutes. Refer to Elastic getting started guide for step by step instructions.


We’d like to thank several contributors within and outside Google for making these Elastic Dataflow templates available for our joint customers:

Prathap Kumar Parvathareddy, Strategic Cloud Engineer, Google

Adam Quan, Solutions Architect, Elastic

Michael Yang, Product Manager, Elastic

Suyog Rao, Engineering Manager, Elastic

Related Article

Extend your Dataflow template with UDFs

Learn how to easily extend a Cloud Dataflow template with user-defined functions (UDFs) to transform messages in-flight, without modifyi…

Read Article

Source : Data Analytics Read More

Faster time to value with Data Analytics Design Patterns

Faster time to value with Data Analytics Design Patterns

Companies today are inundated with vast amounts of data from various sources. This overwhelming amount of data is meant to benefit the company, but often leaves data teams feeling overwhelmed, which can create data bottlenecks and result in a slow time to value. In fact, only twenty seven percent of companies agree that data and analytics projects produce insights and recommendations that are highly actionable (Accenture). This means that nearly 3 in 4 companies are not unlocking value in their data, which poses a huge challenge for organizations trying to move the needle and drive real business results. We at Google Cloud, however, saw opportunity in this challenge, which is why we created Design Patterns: cross-product technical solutions designed to accelerate a customer’s path to value realization with their data. These industry solutions bring together product capabilities alongside design methodology, open source deployable code, data models, and reference architectures to accelerate your business outcomes.

With Data Analytics Design Patterns, you get access to more than 30 ready-to-deploy data analytics solutions. Design patterns leverage the best of Google and our rich partner ecosystem, including Technology Partners & System Integrators. In this blog, we will cover 3 examples on how a design pattern can be applied to unlock the value of data:

Improve mobile app experience with Unified App Analytics

Maximize digital shop’s revenue with Price Optimization 

Protect internal systems from security and malware threat with Anomaly Detection 

Unified App Analytics

If mobile apps are part of your Go-to-market strategy, you have several data sources that can provide invaluable customer insights. In addition to tools, such as CRM (e.g. Salesforce) and customer care (e.g. Zendesk), you likely use Google Analytics to log app events and Firebase Crashlytics to gather data about app errors. But can you easily combine back-end server data with app front-end data to unlock customer insights? 

With Unified App Analytics design pattern it’s easy to plug all the disparate data sources into a single warehouse (BigQuery) and start analyzing it with a Business Intelligence tool (Looker). Once you have a complete and real time view of your customer experience with your app, you can take action. For example, if you notice an increase in app errors, you can quickly combine your Crashlytics data with your CRM data to narrow down the crashes with the highest revenue impact and prioritize their resolution. Further, you can automate your issue resolution workflow by creating a rule for any future crash that impacts a subset of VIP customers.

Unified App Analytics turns your data warehouse into an actionable customer insights tool

With Unified App Analytics design pattern you’ll gain access to valuable insights about your user experience with your app so you can inform your future app strategy. For example, NPR, an American media company, increased user engagement by showing content that better mapped to listener interests and behaviors.

Price Optimization

In a competitive and hectic global marketplace, strategic pricing matters more than ever, but often projects are consumed by the tedium of standardizing, cleaning, and preparing data—from transactions, inventory, demand, among other sources.

Price Optimisation solution allows retailers to build a data driven pricing model. The solution consists of three main components:

Dataprep by Trifacta: integrates different data sources into a single Common Data Model (CDM). Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.

BigQuery: allows you to create and store pricing models in a consistent and scalable way as a serverless Cloud Data Warehouse service

Looker dashboards: surface insights and enable business teams to take action with enterprise ready BI platform

With Price Optimization design pattern from Google Cloud and our partner Trifacta, you’ll be able to rapidly unify multiple data sources and create a real-time and ML-powered analysis, leveraging predictive models to estimate future sales. For example, PDPAOLA, an online jewelry company, doubled sales with dynamic pricing adjustments enabled by a single data view.

Anomaly Detection

Organizations need to anticipate and act on risks and opportunities to stay competitive in a digitally transforming society. Anomaly detection helps organizations identify and respond to data points and data trends in high velocity, high volume data sets that deviate from historical standards and expected behaviors, allowing them to take action on changing user needs, mitigate malicious actors and behaviors, and prevent unnecessary costs and monetary losses.

The Anomaly Detection solution uses Google Pub/Sub, BigQuery, Dataflow, and Looker to:

Stream events in real time 

Process the events, extract useful data points, train the detection algorithm of choice

Apply the detection algorithm in near-real time to the events to detect anomalies

Update dashboards and/or send alerts

The challenge of finding the important insights and anomalies in vast amounts of data applies to organizations across all industries and lines of business, but is especially important to protecting the security of an organization. For example, TELUS, a national communications company, modernized their security analytics platform leveraging this Pattern, allowing them to detect anomalies in near real time to detect and mitigate suspicious activity.

Get started

Turn your data into business outcomes with Google Cloud and our broad partner ecosystem by deploying Data Analytics Design Patterns at your organization. There are more than 30 data analytics design patterns ready for you to use. We have more than 200+ more ideas in the pipeline so be sure to check in regularly as new design patterns will be added soon. 

To dive deeper and find out more about how Data Analytics design patterns can help your organization accelerate use cases and create faster time to value, check out this video.

Related Article

Read Article

Source : Data Analytics Read More

Debunking myths about Python on Dataflow

Debunking myths about Python on Dataflow

For many developers that come to Dataflow, Google Cloud’s fully managed data processing service, the first decision they have to make is which programming language to use. Dataflow developers use the open-source Apache Beam SDK to author their pipelines, and have several choices for language to use: Java, Python, Go, SQL, Scala, and Kotlin. In this post, we’ll focus on one of our fastest growing languages: Python.

The Python SDK for Apache Beam was introduced shortly after Dataflow’s general availability was announced in 2015, and is the primary choice for several of our largest customers. However, it has suffered from a reputation for being incomplete and inferior to its predecessor, the Java SDK. While historically there was some truth to this perception, Python’s feature set has caught up to the Java SDK and offers new capabilities that are specifically catered to Python developers. We’ll take the rest of this blog to inspect some popular myths, and conclude with a brief review of the latest & greatest for the Python SDK.

Myth 1: Python doesn’t support streaming pipelines.

BUSTED. Streaming support for Python has been available for more than two years, released as part of Beam 2.16 in October 2019. This means all of the unique capabilities of streaming Dataflow, including Streaming Engine, update, drain, and snapshots are all available for Python users. 

Myth 2: SqlTransform isn’t available for Python.

BUSTED. Tired of writing tedious code to join together datastreams? Use SqlTransform for Python. Apache Beam introduced support for SqlTransform to the Python SDK last year as part of our advancements with multi-language pipelines (more on that later). Take a look at this example to get started.

Myth 3: State and Timer APIs aren’t available in Python.

BUSTED. Two of the most powerful features of the Beam SDK are the State and Timer APIs, which allow for more fine-grained control over aggregations than windows and triggers do. These APIs are available in Python, and offer parity with the Java SDK for the most common use cases. Reference the Beam programming guide for some examples of what you can do with these APIs.

Myth 4: There is support for a limited set of I/O’s in Python.

BUSTED. The most glaring disparity between the Java and Python SDKs is the discrepancy between I/O connectors, which facilitate read & write operations for Dataflow pipelines. Our support for multi-language pipelines puts this myth to rest. With cross-language transformations, Python pipelines can invoke a Java transformation underneath the hood to provide access to the entire library of Java-based I/O connectors. In fact, that’s how we implemented the KafkaIO module for Python (see example). Developers can invoke their own cross-language transformations using the instructions in the patterns library.

Myth 5: There are fewer code samples in Python.

PLAUSIBLE: Apache Beam maintains several repos for Python examples: one for snippets, one for notebook samples, and one for complete examples. However, there are a couple of notable exceptions where Python is missing, namely our Dataflow Templates repository. This is attributable to the fact that most of Dataflow’s initial users were Java developers. But this quick observation ignores two key factors: 1) the unique assets that are only available for Python developers, and 2) the tremendous momentum behind the Beam Python community.

Python developers love writing exploratory code in JupyterLab notebook environments. Beam offers an interactive module that allows you to interactively build and run your Beam Python pipelines in a Jupyter Notebook. We make deploying these notebooks really simple with Beam Notebooks, which spins up a managed Notebook that contains all the required Beam libraries to prototype your pipelines. We also have a number of helpful examples & tutorials that show how you can sample data from a streaming source, or attach GPUs to your notebooks to accelerate your processing. The notebook also contains a learning track for new Beam developers that cover everything from basic operations, aggregations, and streaming concepts. You can review the documentation here.

Over the past few years, we have seen a number of extensions built on top of the Beam Python SDK. Cruise Automation published the Terra library, which enables 70+ Cruise data scientists to submit jobs without having to understand the underlying infrastructure. Spotify open-sourced Klio, a framework built on top of Beam Python that simplifies common tasks required for processing audio and media files. I have even pointed customers to beam-nuggets, a lightweight collection of Beam Python transformations used for reading/writing from/to relational databases. Open-source developers and large organizations are doubling down on Beam Python, and these brief examples underscore that trend.

What’s new:

The Dataflow team has a slew of new capabilities that will help Python developers advance their use case. Here’s a quick run-down of the newest features:

Custom containers: Users can now specify their own container image when they launch their Dataflow job. This is a common ask from our Python audience, who like to package their pipeline code with their own libraries and dependencies. We’re excited to announce that this feature is generally available—take a look at the documentation so you can try for yourself!

GPUs: Dataflow recently announced the general availability of GPU support on Dataflow. You can now accelerate your data processing by provisioning GPUs on your Dataflow job, another common request from machine learning practitioners on Dataflow. You can review the details of the launch here.

Beam DataFrames: Beam DataFrames brings the magic of pandas to the Beam SDK, allowing developers to convert PCollections to DataFrames and use the standard methods available with the pandas DataFrame API. DataFrames gives developers a more natural way to interact with their datasets and create their pipelines, and will be a stepping stone to future efficiency improvements. Beam DataFrames are generally available starting Beam 2.32, which was released in August. Learn more about Beam DataFrames here.

We invite you to try out our new features using Beam Notebooks today!

Do you have an interesting idea that you want to share with our Beam community? You can reach out to us through various modes, all found here. We are excited to see what’s next for Beam Python.

Source : Data Analytics Read More

JOIN 2021: Sharing our product vision with the Looker community

JOIN 2021: Sharing our product vision with the Looker community

Welcome to JOIN 2021! We’re so excited to kick off Looker’s annual user conference. It’s an event we look forward to each year as it’s a terrific opportunity to connect with the Looker community,  interact with our partners and customers, showcase our product capabilities, and share our product vision and strategy for the year ahead. 

This is the second year in a row that we’re hosting our conference virtually. Even though we were hoping to see you all in person this year, we’re delighted to connect with the Looker community in a virtual setting. This year we have some great content prepared that we hope you will find insightful and educational. JOIN 2021 brings you 3 days of content that includes 5 keynotes, 33 breakouts, 12 How-tos, 27 Data Circles of Success (live!), and our popular Hackathon. 

One of the most exciting parts of this event is the highly anticipated product keynote session, where you’ll learn about our product vision, key investments, new product features, and the roadmap ahead. It’s also our opportunity to share with you all the cool and exciting projects the team has been working on. Here is a sneak preview of some of the things you will hear in the product keynote:

Composable Analytics

The idea that different people in different roles need to work with data in different ways is a conviction that guides Looker’s product direction. Composable analytics is about enabling organizations to deliver these bespoke data experiences tailored to the different ways people work, in a way that transcends conventional business intelligence (BI). We at Looker see a world where people can assemble different data experiences, quickly and easily, with reusable components with low code deployment options, without requiring any specialized technical skills. 

Looker’s investment in the Extension Framework and Looker Components lays the foundation for our developers building composable analytics applications. Looker’s Extension Framework is an accelerator and makes it significantly easier and faster for developers to build a variety of experiences on Looker without having to worry about things like hosting, authentication, authorization, and local API access. 

“It took one developer one day to stand up an application using the Extension Framework! I’ve seen a lot of great Looker features built over the years. This has the potential to be the most ground-breaking.” —Jawad Laraqui, CEO, Data Driven (Don’t miss Jawad’s session “Building and Monetizing Custom Data Experiences with Looker”)

Looker Components lower the barrier to developing beautiful data experiences that utilize native Looker functionality through extension templates, componentized Looker functionality, a library of foundational components, and theming. In July we released Filter components, which allow developers to bring the filters they declare on any Looker dashboard into any embedded application or extension. Today, we announce Visualization components (screenshot below), which is an open-source toolkit that makes it easy to connect your visualization to Looker, customize its presentation, and easily add it to your external application.

In addition, we are also announcing the new Looker public marketplace, where developers can explore Beyond BI content like applications, blocks, and plug-ins.

Augmented Analytics

Augmented analytics is fundamentally changing the way that people interact with data. We see tremendous opportunity to help organizations take advantage of artificial intelligence and machine learning capabilities to deliver a better and more intuitive experience with Looker. We are delivering augmented analytics capabilities via two Solutions – Contact Center AI (CCAI) and Healthcare NLP.

Looker’s Solution for Contact Center AI (CCAI), helps businesses gain a deeper understanding and appreciation of their customers’ full journey by unlocking insights from all their company’s first-party data. We’ve partnered with the product teams at CCAI to build Looker’s Block for CCAI Insights, which sets you on the path to integrating the advanced insights into your first-party data in Looker, overlaying business data with the customer experience. 

Looker’s Block for Healthcare NLP API serves as a critical bridge between existing care systems and applications hosted on Google Cloud providing a managed solution for storing and accessing healthcare data in Google Cloud. Healthcare providers, payers, and pharma companies can quickly understand the context and relationships of medical concepts within the text, such as medications, procedures, conditions, clinical history, and begin to link this to other clinical data sources for downstream AI/ML.

We are also investing in the way you can interact with data in Looker. With Ask Looker, you can explore and visualize data using natural language queries. By combining Google’s expertise in AI with Looker’s modern semantic layer, Ask Looker will deliver a uniquely valuable experience to our users that dramatically lowers the barrier to interacting with data (this feature is currently available in preview and we expect to roll it out more broadly in 2022).

For more information on the newest Looker solutions, click here

Universal Semantic Model

A core differentiator since the beginning has been Looker’s semantic model. Our LookML modeling language enables developers to describe their organization’s business rules and calculations using centralized and reusable semantic definitions. This means everyone across the organization can trust that they’re working with consistent and reliable interpretations of the data, allowing them to make more confident business decisions. Soon, different front-end applications like Tableau, Google Sheets, Google Slides, and more will be able to connect to Looker’s semantic model.

With Looker’s universal semantic model, organizations can deliver trusted and governed data to every user across heterogeneous tools. This eliminates the risk of people relying on stale and unsecured data that increases the risk of data exfiltration. The universal semantic model gives companies a way to tie disparate data sets together in a central repository that provides a complete understanding of their business.

Data Studio is now part of Looker

We are also excited to announce that Data Studio is now a part of the Looker team. Data Studio has built a strong product enabling reporting and ad-hoc analysis on top of Google Ads and other data sources. This is a distinct and complementary use case to Looker’s enterprise BI and analytics platform. Bringing Data Studio and Looker together opens up exciting opportunities to combine the strengths of both products to reimagine the way organizations work with data. 

As we reach the end of 2021, we feel proud of the product capabilities we shipped this year and we’re excited about the investments we’ll make in 2022. We’re grateful for the continued support of our customers and partners, and our work is inspired every day by the innovative applications you build and use cases you support . We hope you enjoy JOIN 2021 and all the amazing content we have available for you in this digital format. Make sure to register to access all the event sessions. We hope to see you all in person in 2022.

Source : Data Analytics Read More

Using Google Cloud Vision API from within a Data Fusion Pipeline

Using Google Cloud Vision API from within a Data Fusion Pipeline

Cloud Data Fusion (CDF) provides enormous opportunity to help cultivate new data pipelines and integrations.  With over 200 plugins, Data Fusion gives you the tools to wrangle, coalesce and integrate with many data providers like Salesforce, Amazon S3, BigQuery, Azure, Kafka Streams and more.  Deploying scalable, resilient data pipelines based upon open source CDAP gives organizations the flexibility to enrich data at scale.  Sometimes though, the integration of a custom REST API or other tool is not already in the plugin library, you may need to connect your own REST API. 

Many modern REST APIs (like Google’s AI APIs, and other Google APIs) and data sources use OAuth 2.0 authorization. OAuth 2.0 is a great authorization protocol, however, it can be challenging to figure out how to interface with it when integrating with tools like Cloud Data Fusion (CDF). So, we thought we would show you how to configure a CDF HTTP source that calls a Google Vision AI API using OAuth 2.0.

First, let’s look at the Vision AI API itself. Specifically, we will be using the Vision AI “annotate” API. We will pass the API an HTTP URL to an image and the API will return a JSON document that provides AI generated information about the image. Here are the official docs for the API.

Let’s start with an example of how we would call the API interactively with curl on Google Cloud Shell where we can authenticate with a gcloud command.

This will produce the AI API’s JSON response output like this:

Now that we can call the API with curl, it’s time to figure out how we translate that API call to a CDF HTTP source. Some things will be the same, like the API URL and the request body. Some things will be different, like the authorization process.

The CDF HTTP source can’t get an authorization token from calling “gcloud auth print-access-token” like we did above. Instead, we will need to create an OAuth 2.0 Client ID in our GCP project and we will need to get a refresh token for that Client ID that CDF will be able to use to generate a new OAuth 2.0 token when CDF needs to make a request.

Let’s get started by filling in all the properties of the HTTP Source. The first few are simple, same as we used with curl:

The next setting is Format. You might think we should pick JSON here, and we could — since JSON is what is returned. However, CDF sources will expect a JSON record per line, and we really want the entire response to be a single record. So, we will mark the Format as blob and will convert the blob to string in the pipeline later (and could even split out records like object detections, faces, etc.):

The next and final section is the hardest — the OAuth 2.0 properties. Let’s look at the properties we will need to find and then start finding them:

The documentation for getting most of these settings is here:


Auth URL and Token URL…

The first two properties are listed in the doc above:

Auth URL:  https://accounts.google.com/o/oauth2/v2/auth

Token URL: https://oauth2.googleapis.com/token

Client ID and Client Secret…

For the Client ID and Client Secret, we will need to create those credentials here: https://pantheon.corp.google.com/apis/credentials. It may seem odd to specify a URI of http://localhost:8080, but that is just to get the refresh token later.

After specifying these options above and clicking Create, we will get our Client ID and Client Secret:


For the Scopes, we can use either of these two scopes as mentioned in the API docs that are linked and screenshotted below:




Refresh Token…

Lastly, we need the refresh token, which is the hardest property to get. There are two steps to this process. First, we have to authenticate and authorize with the Google Auth server to get an authorization “code”, and then we have to use that authorization code with the Google Token server to get an “access token” and a “refresh token” that CDF will use to get future access tokens. The access token has a short life, so wouldn’t be useful to give to CDF. Instead, CDF will use the refresh token so that it can get its own access tokens whenever the pipeline is run.

To get the authorization “code”, you can copy the URL below, change to use your client_id, and then open that URL in a browser window:


Initially, this will prompt you to login, then prompt you to authorize this client for the specified scopes, and then will redirect to http://localhost:8080. It will look like an error page, but notice that the URL of the error page you were redirected to includes the “code” (circled in green below). In a normal web application, that is how the authorization code is returned to the requesting web application.

NOTE: You may see an error like this “Authorization Error — Error 400: admin_policy_enforced”. If so, your GCP User’s organization has a policy that restricts you from using Client IDs for third party products. In that case, you’ll need to get that restriction lifted, or use a different GCP user in a different org.

With that authorization code (circled in green above), we can now call the Google Token server to get the “access token” and the “refresh token”. Just set your “code”, “client_id”, and “client_secret” in the curl command below and run it in a Cloud Shell terminal.

curl -X POST -d “code=4/0AX4XfWjgRdrWXuNxqXOOtw_9THZlwomweFrzcoHMBbTFkrKLMvo8twSXdGT9JramIYq86w&client_id=199375159079-st8toco9pfu1qi5b45fkj59unc5th2v1.apps.googleusercontent.com&client_secret=q2zQ-vc3wG5iF5twSwBQkn68&redirect_uri=http%3A//localhost:8080&grant_type=authorization_code”


At long last, you will have your “refresh_token”, which is the last OAuth 2.0 property that the CDF HTTP source needs to authorize with the Google Vision API!

Now, we have all the information needed to populate the OAuth 2.0 properties of the CDF HTTP Source:

Next, we need to set the output schema of the HTTP Source to have a column called “body” with a type of “bytes” (since bytes is the format we selected in the properties), and then we can validate and close the HTTP source properties:

In the Projection properties, we simply convert the body from bytes to string and then validate:

Now, we can add a BigQuery sink (or any sink) in CDF Studio and run a preview:

If we click Preview Data on the Projection step, we can see our Vision AI response both as a byte array (on the left) and projected as a string (on the right):

Lastly, we can name the pipeline, deploy it, and run it. Here are the results of the run as well as a screenshot of the data in BigQuery

Final thoughts…

Further processing…

This example just stored the Vision AI response JSON in a string column of a BigQuery table. The pipeline could easily be extended to use a Wrangler transform to parse the Vision AI JSON response into more fine grained columns, or even pull out parts of the response JSON into multiple rows/records (for example, a row for each face or object found in the image).

We also hard coded the image url in the pipeline above. That’s not terribly useful for reuse. We could have used a runtime parameter like ${image_url} and then we could specify a different image URL for each pipeline run.

Other OAuth 2.0 APIs…

This example was focused on the Google APIs, so would work similarly for any other Google API that needs to be called with OAuth 2.0 authorization. But, the HTTP plugin is generic (not specific to Google APIs), so can work with other OAuth 2.0 protected services outside of Google as well. Of course the auth server and token server will be different (since they are not Google), but hopefully this at least gives an example of using an OAuth 2.0 protected service.

The final example pipeline…

Below is a link to the finished pipeline in case you want to import it and look it over in more detail.


The sample pipeline created above… drive.google.com

Related Article

How to load Salesforce data into BigQuery using a code-free approach powered by Cloud Data Fusion

Move Salesforce data into BigQuery using an intuitive drag-and-drop solution based on pre-built connectors, and the self-service model of…

Read Article

Source : Data Analytics Read More

Bring governance and trust to everyone with Looker’s universal semantic model

Bring governance and trust to everyone with Looker’s universal semantic model

As digital transformation accelerates, the data available to organizations about their products, customers, and markets is growing exponentially.  Turning this data into insights will be critical to innovating and growing.  Furthermore, the capacity to make data-driven decisions must be pervasive in organizations, accessible to a broad range of employees, and simple enough to apply to day-to-day decisions as well as the most strategic.  However, a major barrier to achieving this vision is a gap in access to the tools for analysis and building broad-based usage.

Looker’s semantic model enables complex data to be simplified for end users with a curated catalog, pre-defined business metrics, and built-in transformation.  We’ve always been platform first, and this is an expansion of that strategy. We are extending our investment here to power more data rich experiences and aim to provide access to the Looker model more directly in other tools. By bringing governed data directly to end users in tools they are already familiar with, this will democratize access to trusted data across organizations.

Google Workspace is everything you need to get anything done, all in one place. Google Workspace includes all of the productivity apps you know and love—Gmail, Calendar, Drive, Docs, Sheets, Slides, Meet, and many more. Whether you’re returning to the office, working from home, on the frontlines with your mobile device, or connecting with customers, Google Workspace is the best way to create, communicate, and collaborate.

Now, with the Connected Sheets integration with Looker, users can interactively explore data from Looker in a familiar, spreadsheet interface.  The integration creates a live connection between Looker and Sheets, meaning that data is always up to date and access is secured based on the user exploring the data.  Common tools like formulas, formatting, and charts make it easy to perform ad-hoc analysis.  This integration will be available in preview by December 2021, with GA next year.  You can sign up to hear when the Connected Sheets and Looker integration is available.

And that’s not all. Looker dashboards are critical to keep teams focused on the metrics that matter.  We will be releasing a Google Workspace Add-on that enables visuals from those dashboards to be embedded in Google Slides.  Embedded visuals in Slides can be refreshed so that people are always seeing the latest data.

Self-service business intelligence (BI) tools have also gained widespread adoption among business users, with a drag-and-drop, graphical user interface that makes exploring data easy.  This flexibility is critical in democratizing analytics, but it also introduces risk that for the most important data may be duplicated or inconsistent definitions are introduced.  With Looker’s upcoming Governed BI Connectors, we will give organizations the flexibility of self-service, while allowing users to leverage their governed, trusted data in those tools too.  Users will be able to live connect from BI tools to Looker’s semantic model.  From there, they can use the familiar self-service tool they are used to for analytics, even mashing up governed data with their own data.  Connectors will be available for Google Data Studio, Tableau, and more.   These connectors will be made available as they are ready in the coming year, with the full set being available in preview  by the end of 2022.  You can register to stay up to date on preview availability for Governed BI Connectors.

With these new capabilities, everyone in an organization can be empowered to make more data-driven decisions, in whatever tool they are familiar with, all powered by consistent, trusted data. To learn more about Looker’s semantic model as well as these new capabilities, tune into our JOIN session discussing ‘What’s New for Data Modelers’.

Source : Data Analytics Read More

Solving business problems with data

Solving business problems with data

Data science can be applied to business problems to improve practices and help to strengthen customer satisfaction. In this blog, we address how the addition of Looker extends the value of your Google Cloud investments to help you understand your customer journey, unlock value from first-party data, and optimize existing cloud infrastructure.    

Data is a powerful tool 

While everyone doesn’t need to understand the nuts and bolts of data technologies, most people do care about the value data can create for them — how it can help them do their jobs better. Within the data space, we are seeing a trend of ongoing failure to make data accessible to “humans” –   the industry still hasn’t figured out how to put data into the hands of people when and where they need it, how they need it. 

What if everyone in your organization could analyze data at scale, and make more-informed, fact-based decisions?   Data and insights derived from data are valuable but only if your users see it.  We think Looker helps solve that. 

Solutions tell a larger story of how it all fits together 

Across all verticals and industries,  businesses benefit from knowing and understanding their customers better. Many business goals are to increase revenue by improving product recommendations and pricing optimization, improving the user experience through targeted marketing and personalization, and reducing churn while improving retention rates.  To help reach  these goals, key strategies should focus on understanding customer needs, their motivations, likes and dislikes, and using all available data – in other words, put yourself in the shoes of your customer.    

Our goal with Looker solutions is to offer the right level of out-of-box support that allows customers to get to value quickly, while maintaining the necessary flexibility. We aim to offer a library of data-driven solutions that accelerate data projects. Many solutions include Looker Blocks (pre-built pieces of code that accelerate data exploration environments) and Actions (custom integrations) that get customers up and running quickly and lets you build business-friendly access points for Google Cloud functionality like BQML, App Engine and Cloud Functions.  

Below, you’ll find a sampling of the newest Looker solutions. 

Listening to customers by looking at the data

Looker’s solution for Contact Center AI (CCAI), helps businesses gain a deeper understanding and appreciation of their customers’ full journey by unlocking insights from all their company’s first-party data. Call centers can converse naturally with customers and deliver outstanding experiences by leveraging artificial intelligence. CCAI‘s newest product —CCAI Insights — reviews conversations support agents are having, finding and annotating the data with the important information, and identifying the calls that need review. We’ve partnered with the product teams at CCAI to build Looker’s Block for CCAI Insights, which sets you on the path to integrating the advanced insights into your first-party data in Looker, overlaying business data with the customer experience.

Sentiment Analysis Dashboard: identifies how customers feel about their interactions

Businesses can better understand contact center experiences and take immediate action when necessary to make sure the most valuable customers receive the best service. 

Realizing full business value of First-Party data

Looker for Google Marketing Platform (GMP) provides marketers the power to unlock the value of their company’s first-party data to more effectively target audiences.  The Looker Blocks and Actions for GMP offer interactive data exploration, slices of data with built-in ML predictions and activation paths back to the GMP.  This strategic solution continues to evolve with the Looker Action for Google Ads (Customer Match), the Looker Action for Google Analytics (Data Import) and the Looker Block for Google Analytics 4 (GA4).

The Looker Action for Customer Match allows marketers to send segments and audiences based on first-party data directly into Google Ads. Reach users cross-device and across the web’s most powerful channels such as Display, Video, YouTube, and Gmail.  The entire process is performed within a single screen in Looker, and is able to be completed in a few minutes by a non-technical user. 

The Looker Action for Data Import can be used to enhance user segmentation and remarketing audiences in Google Analytics by taking advantage of user information accessible in Looker, such as in CRM systems or transactional data warehouses.

The Looker Block for Google Analytics 4(GA4) expands the solution’s support with out-of-the-box dashboards and pre-baked BigQuery ML models for the newest version of Google Analytics.The Looker Block offers up reports with flexible configuration capabilities to unlock custom insights beyond the standard GA reporting. Customize audience segments, define custom goals to track and share these reports with teams who do not have access to the GA console.

From clinical notes to patient insights at scale

Taking a look at the Healthcare vertical, theLooker Healthcare NLP API Block serves as a critical bridge between existing care systems and applications hosted on Google Cloud providing a managed solution for storing and accessing healthcare data in Google Cloud. The Healthcare NLP API uses natural language models to extract healthcare information from medical text,  rapidly unlocking insights from unstructured medical text and providing medical providers with simplified access to intelligent insights.  Healthcare providers, payers, and pharma companies can quickly understand the context and relationships of medical concepts within the text, such as medications, procedures, conditions, clinical history, and begin to link this to other clinical data sources for downstream AI/ML.   

Specifically, the natural language processing (NLP) Patient View (pictured below) allows you to review a single selected patient of interest, surfacing their clinical notes history over time. It informs clinical diagnosis with family history insights, which is not currently captured in claims, and captures additional procedure coding for revenue cycle purposes.

NLP Patient View Dashboard: details on specific patients’

The dashboard below shows the NLP Term View which allows users to focus on chosen medical terms across the entire patient population in the dataset so they can start to view trends and patterns across groups of patients. 

NLP Term View Dashboard: relate medical terms across your dataset

This data can be used to: 

Enhance patient matching for clinical trials 

Identify re-purposed medications

Drive advancements for research in cancer and rare diseases

Identify how social determinants of health impact access to care

Managing Costs Across Clouds

Effective cloud cost management is important for reasons beyond cost control — it provides you the ability to reduce waste and predictably forecast both costs and resource needs. Looker’s solution for Cloud Cost Management offers quick access to necessary reporting and clear insights into cloud expenditures and utilization. 

This solution brings together billing data from different cloud providers in a phased approach: get up and running quickly with Blocks optimized for where the data is today (Google Cloud, AWS or Azure) as you work towards more sophisticated analysis for cross-platform planning and even cloud spend optimization with the mapping of tags, labels and cost centers across clouds.

Multi-cloud Summary Dashboard: a single view into spend across AWS, Azure and GCP

The Looker Cloud Cost Management solution provides operational teams struggling to monitor, understand, and manage the costs and needs associated with their cloud technology with a comprehensive view into what, where and why they are spending money.

Making better decisions with Looker-powered data

Leading companies are discovering ways to get value from all of that data beyond displaying it in a report or a dashboard. They want to enable everyone to make better decisions but that’s only going to happen if everyone can ask questions of the data, and get reliable, correct answers without using outdated or incomplete data and without waiting for it. People and systems need to have data available to them in the way that makes the most sense for them at that moment.  

It’s clear that successful data-driven organizations will lead their respective segments not because they use data to create reports, but because they use it to power data experiences tailored to every part of the business, including employees, customers, operational workflows, products and services.   

As people’s way of experiencing data has evolved, more than ever before, dashboards alone are not enough. You can use data as fuel for data-driven business workflows, and to power digital experiences that improve customer engagement, conversions, and advocacy. 

From Nov. 9 – 11th, Looker is hosting its annual conference JOIN, where we’ll be showing new features, including how we help to:

Build data experiences at the speed of business

Accelerate the business value with packaged experiences 

Unleash more insights for more people in the right way – Deliver data experiences at scale

There is no cost to attend JOIN.  Register hereand learn how Looker helps organizations build and deliver custom data-driven experiences that goes beyond just reports and dashboards, scales and grows with your business, allows developers to build innovative data products faster, and ensures data reaches everyone.

Source : Data Analytics Read More

Going beyond BI with the Looker Marketplace

Going beyond BI with the Looker Marketplace

For many businesses, business intelligence (BI) consists of data visualizations and dashboards.  The problem is that dashboards are not the answer to every data need. Many users today are looking for rich data-experiences that are immediately accessible and seamlessly part of their day to day work. Surfacing insights in collaboration apps like Google Chat and Slack, infusing data into productivity apps like Google Docs or Slides, or triggering auto generated business processes such as updating paid media bids or using AI-powered bots are just a few of the ways information can be provided, integrated and operationalized.

For these reasons, businesses need to think about delivering data and analytics to workers in a way that makes it meaningful and immediately productive. We call this moving beyond BI. 

Reflecting everyday data experiences

On a daily basis, in almost everything we do, we use data analytics without being aware of it. When we sit down to watch our favorite streaming service, navigate traffic, shop online, work out using a smart watch, we rely on integrated data insights in our day-to-day activities.

These experiences have influenced what we expect from data systems at work, and why businesses need a new approach for delivering data and analytics on the job.

The Looker Marketplace

The Looker platform helps businesses realize this new approach and move beyond BI by providing tailored off-the-shelf products or Looker Blocks™ which are ready for deployment. These data-driven experiences are focused  on the needs of the business or department. 

These accelerators support:

Modern BI & analytics: Democratize easy access to timely and trustworthy data enabling people to make better, faster, more confident data-driven decisions every day. 

Integrated insights: Infuse relevant information into the tools and products people already use, enhancing the experience of those tools and making teams more effective. Without even thinking about it, everyone at the company is making data-informed decisions.

Data-driven workflows: Super-charge operational workflows with complete, near-real time data to optimize business processes. This allows companies to save time and money by putting their data to work in every part of their business.

Custom applications: Create purpose-built tools to deliver data in an experience tailored to the job. By building the exact experiences people need, you can make your employees, customers and partners more effective and more efficient.

The Looker Marketplace helps you find content to either deploy data-experiences into your Looker instance or to build new data experiences faster by taking building blocks from the marketplace and extending them in a reusable fashion.

Moving Beyond BI with Solutions

An example of such a solution is the  Contact Center AI (CCAI) Insights, that you can explore in the Looker Marketplace.

With CCAI Insights, Looker can leverage the power of AI to converse naturally with customers and resolve basic issues quickly, as well as improve future experiences and drive long-term customer satisfaction by measuring and analyzing customer interactions while improving overall efficiency. 

Using Google’s machine learning capabilities and Looker Blocks, you can easily identify resolutions that have worked well, and use data actions to automate contact center operations based on new insights.

Another example is the Google Cloud Cost Management solution.  Effective cloud cost management is important for reasons beyond cost control. A good cloud cost management solution provides you the ability to reduce waste and predictably forecast both costs and resource needs.

Leveraging the Cloud Cost Management Block is a simple way to understand your cloud spend without data movement — your existing billing and platform utilization data remains in existing siloed cloud data warehouse platforms. Looker connects directly to the billing data in each respective cloud data warehouse, providing a consolidated reporting view of cloud spend across Google Cloud, AWS and Azure. You can see all your spend information in a single dashboard that can contain mutual filters such as date, skill teams, application name(s), and more. You can activate alerting notifications and schedule reports to be sent automatically via email, a messaging service, and to other destinations. This means that on day one, you’ll have real-time, accurate multi-cloud cost reporting.

Build new data experiences and upload to the Looker Marketplace

Developers and Looker partners can create new content and publish it in the Looker Marketplace.  The easiest way for you to get started is to visit the Looker Developer Portal and discover all the types of content you can build with our platform capabilities.

From Nov. 9 – 11th, Looker is hosting its annual conference JOIN, where we’ll be showing how to:

Build data experiences at the speed of business

Accelerate the business value with marketplace content

Grow your business with reusable analytics components

There is no cost to attend JOIN.  Register here and learn how Looker helps organizations build and deliver custom data-driven experiences that goes beyond just reports and dashboards, scales and grows with your business, allows developers to build innovative data products faster, and ensures data reaches everyone.

Source : Data Analytics Read More

Predict hospital readmission rates with Google Cloud Platform

Predict hospital readmission rates with Google Cloud Platform

Today’s challenge with healthcare data:
The amount of data collected today is at an all time high and the demand to leverage and understand that data is rapidly growing. Organizations across every industry want convenient, fast, and easy access to data and insights, while allowing users to take action on it in real-time. Healthcare is no exception. 

In this recent GCP blog post, the importance of Electronic Health Records (EHR) systems and healthcare interoperability is explained. EHR systems by default do not speak to one another, and this makes it difficult to track patients within a health system across different hospitals or clinics. EHR data is highly complex, containing numerous diagnosis codes, procedure codes, visits, provider data, prescriptions, etc. Moreover, it becomes challenging to track a patient’s clinical history if a hospital upgrades their EHR system or when a patient switches hospitals (even within the same system). 

The solution? A common data schema that can act as a mechanism for normalizing this messy real-world data across different EHR systems. This is known as the FHIR (Fast Healthcare Interoperability Resources) schema. 

Google Cloud has seen a number of organizations implement solutions utilizing the Healthcare Data Engine (HDE) to produce FHIR records from streaming clinical data and then analyzing that data via BigQuery and Looker in order to uncover insights and improve clinical outcomes.

This shift toward the Cloud and business intelligence (BI)  Modernization provides organizations with a single-platform that has the flexibility to scale, the ability to create a unified, trusted view of business metrics and logic, and an extensible activation layer to drive decisions in real-time.

Background and business opportunity:
According to the Mayo Clinic, the number of patients who experience unplanned readmissions to a hospital is one way of tracking and evaluating the quality and performance of provider care. By definition, a 7-day readmission rate is the percentage of admitted patients who return to the hospital for an unplanned visit within 7 days of discharge. This indicator can reflect the breadth and depth of care that a patient has received. Not only is a high readmission rate a reflection of low quality of care, but also unnecessary readmission rates are expensive. This is especially relevant to hospitals and providers in a value-based reimbursement environment. 

When it comes to accurately analyzing and understanding hospital readmission rates, amongst many other quality and performance metrics in a healthcare setting, common obstacles include: latency, scalability, speed, governance, security, and overall accessibility in sharing results.

Recently, we examined a real-world use case for predicting 7-day hospital readmission rates utilizing FHIR data stored in BigQuery, along with BigQuery ML, Looker and Cloud Functions.

BigQuery is Google Cloud’s fully managed, serverless SQL data warehouse and data lake. It’s highly performant for fast querying, and it is secure and fully encrypted in the Cloud and in transit to other locations. It also has a feature called BigQuery ML, which allows users to execute machine learning (ML) models in BigQuery using standard SQL. BigQuery ML offers  models, like: linear regression, binary logistic regression, multiclass logistic regression, K-means clustering, matrix factorization, time series, boosted tree, Deep Neural Network (DNN), and more. You can also use its AutoML feature, which searches through a variety of model architectures based on the input data and chooses the best model for you. BigQuery ML increases development speed by eliminating the need to move data and allows data science teams to focus their time and effort on more robust and complex modeling.

Looker is Google Cloud’s cloud-native BI and analytics platform that gives users access to data in real-time through its in-database architecture and semantic modeling layer. Looker connects directly to BigQuery (as well as most other SQL-compliant databases), meaning you do not have to move or make copies of the data, and you are not limited to cubes or extracts. This enables governance at scale where Looker acts as the single source of truth for users to go for information and take action on insights. 

Cloud Functions offer a serverless execution environment for building and connecting cloud services. You can write simple, single-purpose functions that can be activated when triggered. Cloud Functions can act as the bridge for communication between insights in Looker and BigQuery.

Our goals with this use case solution were to (1) help hospital clinicians and administrators know where to most focus their attention when it comes to 7-day readmission rates, and (2) be able to initiate proactive interventions through alerting, self-service, and data-driven actions, and finally, (3) scale, govern, and secure the data on a modern, unified platform in the Cloud.

The solution and how it works:

Once our data is in BigQuery and we’ve connected it to Looker, we can begin our analysis. Looker’s semantic modeling layer leverages LookML, which is an abstract of SQL that simplifies SQL by turning it into reusable components. We can use LookML to make transformations to build and define unified metrics. Then we can write a BigQuery ML model directly in Looker’s semantic modeling layer by implementing the BigQuery ML Looker Block for Classification and Regression with AutoML Tables.

The Block goes through the components of how to train, evaluate, and predict for our target variable. In our use case, the target variable is the propensity score for a 7-day readmission. As mentioned, BigQuery ML gives us the ability to quickly do this using standard SQL. We can assess our model performance using the out-of-the-box evaluation functions provided by BigQuery ML and easily tune hyperparameters as needed using model options in the CREATE MODEL syntax

The benefits of building the model in Looker are:

In contrast to traditional data science methods, we can keep our code in a single location for ease of use and access

We can choose the refresh frequency to continue to automatically re-run the model based on new incoming data

It is fast to implement the code, and it’s easy to visualize and explore the results in the Looker UI

We can create a dashboard within Looker that highlights the key performance indicators of the model. Using other data science methods, accuracy and precision results might be inaccessible or difficult to share. A Looker dashboard will provide transparency into how the model performed, and because Looker reads directly from BigQuery, as new data comes in, we are able to view changes in predictions in real-time, as well as view any variations in model performance KPIs.

Model Performance Dashboard:  Highlights accuracy and precision metrics that show how the BQML model performed on unseen data, including the confusion matrix and the ROC curve. We can also surface the training duration, F1 score, recall, and log loss.

In addition to building out a dashboard for model performance, we can also analyze readmission rate across the hospital and at the patient level. We built examples to show how an overview dashboard allows hospital clinicians, care managers, and administrators to see how the hospital is performing overall, by facility, by specialty, or by condition, while the patient view shows an individual patient and their average readmission rate score.

Hospital Readmission Overview Dashboard:  Sample showing overall readmission rate of 21%

Patient Readmission Dashboard: Sample showing average readmission rate risk score of 34.16% for patient John Doe

*Disclaimer: Sample synthetic data was used in the exploration of this use case (meaning, no real PII or PHI data was used)

Within Looker, clinicians or care managers can then set up alerts to monitor their individual patients based on the predicted score. With Looker alerts, they can also set a threshold and be notified whenever that threshold is reached. This allows care managers to be more proactive rather than reactive when putting together discharge care plans for their patients.

A care manager can also quickly send an email follow-up to a patient with a high-risk score directly from within the platform.

This is an example of a Looker Action. There are many possibilities, such as:

Send a text message with Twilio

Send data to Google Sheets

Send data to a repository, such as Google Cloud Storage

Use a form to write-back to BigQuery

When it comes to write-backs, Cloud Functions make the process simple. In our use case, in order to collect patient feedback and satisfaction data at discharge, we built a form in LookML. A Looker Action then triggers a write-back any time the form is submitted. The write-back is executed by a Cloud Function behind the scenes. The form makes it seamless and easy for hospitals to quickly collect survey data and store it in a structured format for analysis. Once in BigQuery, the data can then ultimately be passed back into our BigQuery ML model as additional features for retraining and predicting readmission rate risk scores.

New Survey Dashboard: Sample showing how a Looker write-back works with a Looker Action form

Check out the sample Cloud Function code on GitHub here.

The value and potential future work:
Google Cloud provides a seamless experience with the data. This solution addresses challenges of latency, scalability, speed, governance, security, and overall accessibility in sharing results. Looker’s in-database architecture and semantic modeling layer inherit the power, speed, and security of BigQuery and BigQuery ML, and when implemented with Cloud Functions, Looker can enhance both data and operational workflows. These workflows impact how clinicians, care managers, hospital administrators, and data scientists manage their day-to-day, which in turn help keep healthcare costs down and improve the quality of patient care.

Future work in building out this solution may include leveraging the GCP Healthcare NLP API, which converts unstructured data, such as clinical notes, into a structured format for exploration and additional downstream AI/ML. 

Keep an eye out for more to come on Looker Healthcare & Life Sciences Solutions!

Related Article

Google Cloud improves Healthcare Interoperability on FHIR

Google Cloud improves healthcare interoperability using the FHIR schema using Healthcare Data Engine, BigQuery, and Looker.

Read Article

Source : Data Analytics Read More

How Looker is helping marketers optimize workflows with first-party data

How Looker is helping marketers optimize workflows with first-party data

It’s no secret that first-party data will become increasingly important to brands and marketing teams as they brace for transformation and unlock new, engaging ways of providing consumer experiences.  The pain points that marketers are experiencing today (siloed data preventing complete customer view, non-actionable insights, and general data access issues) are coming into focus as marketers prepare to wean off third-party data that the industry has grown so accustomed to, and begin efforts to truly harness their organizations’ first party data.

While gathering first-party data is a critical first step, simply collecting it doesn’t guarantee success. To pull away from the competition and truly move the needle on business objectives, marketers need to be able to put their first-party data to work.  How can they begin realizing value from their organizations’ first-party data? 

In May, Looker and Media.Monks (formerly known as MightyHive)  hosted a webinar, “Using Data to Gain Marketing Intelligence and Drive Measurable Impact,” that shared strategies brands have used to take control of their data. Brands are relying on technology to make that happen, and in these cases, the Looker analytics platform was key to their success. 

However, technology alone isn’t a silver bullet for data challenges. It’s the ability to closely align to the principles outlined below that will help determine a brand’s success when it comes to realizing the value of their first-party data. While technology is important,  you need to have the right mix of talent, processes and strategies to bring it to life. 

If you have data that nobody can access, what use is having this data?

Siloed or difficult-to-access data has only a fraction of the value that portable, frictionless data has. And data, insights and capabilities that can’t be put in the right hands (or are difficult to understand) won’t remain competitive in a market where advertisers are getting nimbler and savvier every day.

A large part of making data frictionless and actionable lies in the platforms you use to access it. Many platforms are either too technical for teams to understand (SQL databases) or pose too much of a security risk (site analytics platforms) to be shared with the wider team of stakeholders who could benefit from data access. 

Ease of use becomes incredibly important when considering the velocity and agility that marketers require in a fast-changing world. In the webinar’s opening, Elena Rowell, Outbound Product Manager at Looker, noted that the foundational data requirement for brands is to have “the flexibility to match the complexity of the data.”

Knowing your customer 

While the transformation to become a best-in-class data driven marketing organization may look like a long, arduous process, it really is not.  Elena instead views it as an iterative journey along which brands can quickly capture value.  “Every incremental insight into customer behavior brings us one step closer to knowing the customer better,” she said. “It’s not a switch you flip – there’s so much value to be gained along the way.” 

She showed how Simply Business, a UK based insurance company, took this approach. First, they implemented more data-driven decision-making by building easy access to more trustworthy data using Looker. They could look in depth at marketing campaigns and implement changes that needed to be made along the way.  They started to build lists in Looker that enabled better targeting and scheduled outbound communication campaigns. 

The original goal was to better understand what was going on with marketing campaigns, but as they put the data and intelligence stored in it to work, Simply Business found value at every step.

Insights for impact   

Ritual, an e-commerce, subscription-based multivitamin service, needed a way to measure the true impact of their acquisition channels & advertising efforts.   They understood that an effective way to grow the business is by having insights into which ads & messaging were creating the greatest impact and what was resonating with their current and prospective customers.  

Ritual chose to use Looker, which offers marketers interactive access to their company’s first-party data.  During the webcast,A 360-View of Acquisition, Kira Furuichi, Data Scientist, and Divine Edem, Associate Manager of Acquisition at Ritual, explained how an e-commerce startup makes use of platform and on-site attribution data to develop a multifaceted understanding of customer acquisition and its performance.

Ritual now has insights into not only the traffic channels are bringing to the site, but how customers from each of those channels behave after they arrive. This leads to a better overall understanding of how products are really resonating with customers. Their consumer insights team shares that information with the acquisition team so they can collaborate to make overall decisions, especially in Google Ads.   Deeper insights into ad copy and visuals help the teams hone in on what the prospective customer is looking for as they browse for new products to add to their daily routine. They also found it fosters collaboration with other internal teams invested in this sector of business as growth and acquisition is a huge part of the company strategy overall.

Edem contributed saying, “having an acquisition channel platform like Google being synced into our Looker helps other key stakeholders on the team outside of acquisition, whether it’s the product team, the engineering team, or the operations name – being able to understand the channel happenings and where there’s room for opportunity.  Because it’s such a huge channel and having Looker being able to centralize and sync all that information makes the process a little less overwhelming and really helps us to see things through a nice magnifying glass.”

Access, analyze, and activate first-party data

Marketing is typically in the backlog of integration work and IT teams often don’t have domain marketing knowledge. Every brand’s data needs are different. This is where Media.Monks can step in with expertise to guide the process and marshall technical resources to deliver against your roadmap.  Having a combination of both domain marketing expertise and deep experience in engineering and data science, Media.Monks helps brands fill the void left by a lack of internal resources and accelerate their data strategies.

With the Looker extension framework partners can develop custom applications that sit on top of and interact with Looker to fit unique customers needs. These custom Looker applications allow Media.Monks to equip customers with not just read-only dashboards, but tools and interfaces to actually trigger action across other platforms and ecosystems. 

For instance, Media.Monks recently developed a Looker application that allows a marketer to define a first party audience from CRM data, submit that audience to Google’s Customer Match API, and ultimately send the audience to a specific Google Ads account for activation. The entire end-to-end process is performed within a single screen in Looker, and makes a previously daunting and error-prone process able to be completed in a few minutes by a non-technical user.

Media.Monks-built product for activating first-party data from Looker into Google Ads

We know that breaking down silos, making data available across your organization, and building paths to activate is critical.  Looker as a technology makes it much easier but it still takes expertise, effort and time to make it work for your needs and that is where our experience and the GC [Google Cloud] partner ecosystem comes in.

With the Looker platform, the potential for new data experiences are endless and with the right partner to support your adoption, deployment and use cases, you can accelerate your speed to value at every step of your transformation journey.

Gain a more accurate understanding of ad channel performance to minimize risk of ad platforms over-reporting their own effectiveness

Uncover insights on what resonates with customers to enable better optimization of ad copy and creative

And democratize data within the company for stakeholders that need it.  IIntegrations with collaboration tools like Google Sheets and Slack provides value for people on the team even if they have limited access to Looker.

To learn more about how to approach first party data and how brands have found success using Looker, check out the webinar and see some of these strategies in action.  Additionally, you can register to attend the LookerJOIN annual conference where we will be presenting, “3 Ways to Help Your Marketing Team  Stay Ahead of the Curve.”  Look for it under the Empowering Others with Data category.  

There is no cost to attend JOIN.  Register hereto attend sessions on how Looker helps organizations build and deliver custom data-driven experiences that goes beyond just reports and dashboards, scales and grows with your business, allows developers to build innovative data products faster, and ensures data reaches everyone.

Source : Data Analytics Read More