Building streaming data pipelines on Google Cloud

Building streaming data pipelines on Google Cloud

Many customers build streaming data pipelines to ingest, process and then store data for later analysis. We’ll focus on a common pipeline design shown below. It consists of three steps: 

1. Data sources send messages with data to a Pub/Sub topic.

2. Pub/Sub buffers the messages and forwards them to a processing component.

3. After processing, the processing component stores the data in BigQuery.

For the processing component, we’ll review three alternatives, ranging from basic to advanced: 

A. BigQuery subscription.

B. Cloud Run service.

C. Dataflow pipeline.

Example use cases

Before we dive deeper into the implementation details, let’s look at a few example use cases of streaming data pipelines:

Processing ad clicks. Receiving ad clicks, running fraud prediction heuristics on a click-by-click basis, and discarding or storing them for further analysis.

Canonicalizing data formats. Receiving data from different sources, canonicalizing them into a single data model, and storing them for later analysis or further processing. 

Capturing telemetry. Storing user interactions and displaying real-time statistics, such as active users, or the average session length grouped by device type.

Keeping a change data capture log. Logging all database updates from a database to BigQuery through Pub/Sub. 

Ingesting data with Pub/Sub

Let’s start at the beginning. You have one or multiple data sources that publish messages to a Pub/Sub topic. Pub/Sub is a fully-managed messaging service. You publish messages, and Pub/Sub takes care of delivering the messages to one or many subscribers. The most convenient way to publish messages to Pub/Sub is to use the client library

To authenticate with Pub/Sub you need to provide credentials. If your data producer runs on Google Cloud, the client libraries take care of this for you and use the built-in service identity. If your workload doesn’t run on Google Cloud, you should use identity federation, or as a last resort, download a service account key (but make sure to have a strategy to rotate these long-lived credentials). 

Three alternatives for processing

It’s important to realize that some pipelines are straightforward, and some are complex. Straightforward pipelines don’t do any (or lightweight) processing before persisting the data. Advanced pipelines aggregate groups of data to reduce data storage requirements and can have multiple processing steps.

We’ll cover how to do processing using either one of the following three options:

A BigQuery subscription, a no-code pass-through solution that stores messages unchanged in a BigQuery dataset.

A Cloud Run service, for lightweight processing of individual messages without aggregation.

A Dataflow pipeline, for advanced processing (more on that later). 

Approach 1: Storing data unchanged using a BigQuery subscription

The first approach is the most straightforward one. You can stream messages from a Pub/Sub topic directly into a BigQuery dataset using a BigQuery subscription. Use it when you’re ingesting messages and don’t need to perform any processing before storing the data. 

When setting up a new subscription to a topic, you select the Write to BigQuery option, as shown here:

The details of how this subscription is implemented are completely abstracted away from users. That means there is no way to execute any code on the incoming data. In essence, it is a no-code solution. That means you can’t apply filtering on data before storing. 

You can also use this pattern if you want to first store, and perform processing later in BigQuery. This is commonly referred to as ELT (extract, load, transform).

Tip: One thing to keep in mind is that there are no guarantees that messages are written to BigQuery exactly once, so make sure to deduplicate the data when you’re querying it later. 

Approach 2: Processing messages individually using Cloud Run 

Use Cloud Run if you do need to perform some lightweight processing on the individual messages before storing them. A good example of a lightweight transformation is canonicalizing data formats – where every data source uses its own format and fields, but you want to store the data in one data format.

Cloud Run lets you run your code as a web service directly on top of Google’s infrastructure. You can configure Pub/Sub to send every message as an HTTP request using a push subscription to the Cloud Run service’s HTTPS endpoint. When a request comes in, your code does its processing and calls the BigQuery Storage Write API to insert data into a BigQuery table. You can use any programming language and framework you want on Cloud Run.

As of February 2022, push subscriptions are the recommended way to integrate Pub/Sub with Cloud Run. A push subscription automatically retries requests if they fail and you can set a dead-letter topic to receive messages that failed all delivery attempts. Refer to handling message failures to learn more. 

There might be moments when no data is submitted to your pipeline. In this case, Cloud Run automatically scales the number of instances to zero. Conversely, it scales all the way up to 1,000 container instances to handle peak load. If you’re concerned about costs, you can set a maximum number of instances. 

It’s easier to evolve the data schema with Cloud Run. You can use established tools to define and manage data schema migrations like Liquibase. Read more on using Liquibase with BigQuery. 

For added security, set the ingress policy on your Cloud Run microservices to be internal so that they can only be reached from Pub/Sub (and other internal services), create a service account for the subscription, and only give that service account access to the Cloud Run service. Read more about setting up push subscriptions in a secure way

Consider using Cloud Run as the processing component in your pipeline in these cases:

You can process messages individually, without requiring grouping and aggregating messages.

You prefer using a general programming model over using a specialized SDK.

You’re already using Cloud Run to serve web applications and prefer simplicity and consistency in your solution architecture. 

Tip: TheStorage Write APIis more efficient than the older insertAll method because it uses gRPC streaming rather than REST over HTTP. 

Approach 3: Advanced processing and aggregation of messages using Dataflow

Cloud Dataflow, a fully managed service for executing Apache Beam pipelines on Google Cloud, has long been the bedrock of building streaming pipelines on Google Cloud. It is a good choice for pipelines that aggregate groups of data to reduce data and those that have multiple processing steps. Cloud Dataflow has a UI that makes it easier to troubleshoot issues in multi-step pipelines. 

In a data stream, grouping is done using windowing. Windowing functions group unbounded collections by the timestamps. There are multiple windowing strategies available, including tumbling, hopping and session windows. Refer to the documentation on data streaming to learn more. 

Cloud Dataflow can also be leveraged for AI/ML workloads and is suited for users that want to preprocess, train, and make predictions on a machine learning model using Tensorflow. Here’s a list of great tutorials that integrate Dataflow into end-to-end machine learning workflows.

Cloud Dataflow is geared toward massive scale data processing. Spotify notably uses it to compute its yearly personalized Wrapped playlists. Read this insightful blogpost about the 2020 Wrapped pipeline on the Spotify engineering blog. 

Dataflow can autoscale its clusters both vertically and horizontally. Users can even go as far as using GPU powered instances in their clusters and Cloud Dataflow will take care of bringing new workers into the cluster to meet demand, and also destroy them afterwards when they are no longer needed.

Tip: Cap the maximum number of workers in the cluster to reduce cost and set up billing alerts. 

Which approach should you choose?

The three tools have different capabilities and levels of complexity. Dataflow is the most powerful option and the most complex, requiring users to use a specialized SDK (Apache Beam) to build their pipelines. On the other end, a BigQuery subscription doesn’t allow any processing logic and can be configured using the web console. Choosing the tool that best suits your needs will help you get better results faster. 

For massive (Spotify scale) pipelines, or when you need to reduce data using windowing, or have a complex multi-step pipeline, choose Dataflow. In all other cases, starting with Cloud Run is best, unless you’re looking for a no-code solution to connect Pub/Sub to BigQuery. In that case, choose the BigQuery subscription.

Cost is another factor to consider. Cloud Dataflow does apply automatic scaling, but won’t scale to zero instances when there is no incoming data. For some teams, this is a reason to choose Cloud Run over Dataflow.  

This comparison table summarizes the key differences.

Next steps

Read more about BigQuery subscriptions, Cloud Run, and Dataflow.

Check out this hands-on tutorial on GitHub by Jakob Pörschmann that explores all three types of processing.

I’d like to thank my co-author Graham Polley from Zencore for his contributions to this post – find him on LinkedIn or Twitter. I also want to thank Mete, Sara, Jakob, Valentin, Guillaume, Sean, Kobe, Christopher, Jason, and Wei for their review feedback.

Source : Data Analytics Read More

No cash to tip? No problem. How TackPay built its digital tipping platform on Google Cloud

No cash to tip? No problem. How TackPay built its digital tipping platform on Google Cloud

Society is going cashless. While convenient for consumers, that’s caused a drastic decrease in income for tipped workers and this is the problem TackPay addresses. TackPay is a mobile platform that allows users to send, receive and manage tips in a completely digital way, providing tipped workers with a virtual tip jar that makes it easy for them to receive cashless tips directly.

Digitizing the tipping process not only allows individuals to receive tips without cash, but also streamlines a process that has frequently been unfair, inefficient, and opaque, especially in restaurants and hotels. Through TackPay’s algorithm, venues can define the rules of distribution, and automate the tip management process, saving them time. And because tips no longer go through a company’s books but through Tackpay, it simplifies companies’ tax accounting, too.

With a simple, fast and web-based experience accessible by QR code, customers can leave a cashless tip with total flexibility, and transparency.

Technology in TackPay

Without question, our main competitor is cash. From the very beginning, TackPay has worked to make the tipping experience as easy and as fast as giving a cash tip. For this reason, the underlying technology has to deliver the highest level of performance to ensure customer satisfaction and increase their tipping potential.

For example, we need to be able to calibrate the loading of requests in countries at peak times to avoid congesting requests. Offering the page in a few thousandths of a second allows us to avoid a high dropout rate and user frustration. Transactions can also take place in remote locations with little signal, so it is crucial for the business to offer a powerful and accessible service for offline availability options. These are a few of the reasons TackPay chose Google Cloud.

Functional components

TackPay interfaces include a website, web application and a mobile app. The website is mostly informational, containing sign-up, login and forms for mailing list subscription and partnerships. The web app is the application’s functional interface itself. It has four different user experiences based on the user’s persona: partner, tipper, tipped, and group.

The partner persona has a customized web dashboard.

The tipper sees the tipping page, the application’s core functionality. It is designed to provide a light-weight and low-latency transaction to encourage the tipper to tip more efficiently and frequently.

The tipped, i.e.the receiver, can use the application to onboard into the system, their tip fund, and track their transactions via a dashboard.

The group persona allows the user to combine tips for multiple tip receivers across several services as an entity. 

The mobile interface also has similar experience to that of the web for the tipped and group personas. A user dashboard that spans across a few personas covers the feedback, wallet, transactions, network, profile, settings, bank details, withdrawal, docs features for the Tipped persona. In addition to those features, the dashboard also covers the venue details for the Group persona.

Technical architecture to enable cashless tipping

Below is the technical architecture diagram at a high level:

Ingestion
Data comes in from the web application, mobile app, third-party finance application APIs and Google Analytics. The web application and mobile app perform the core business functionality. The website and Google Analytics serve as the entry point for business analytics and marketing data. 

Application
The web application and mobile app provide the platform’s core functionality and share the same database — Cloud Firestore.

The tipper persona typically is not required to install the mobile app; they interact with the web application that can be scanned via a QR code and tip for the service. Mobile app is mainly for the tipped and the group categories. 

Some important functional triggers are also enabled between the database and application using Google Cloud Functions Gen 2. The application also uses Firebase Authentication, Cloud IAM and Logging.

Database and storage
Firestore collections are used to hold functional data. The collections include payments data for businesses, teams, the tipped, tippers and data for users, partners, feedback, social etc. BigQuery stores and processes all Google Analytics and website data, while  Cloud Storage for Firebase stores and serves user data generated from the app. 

Analytics and ML
We use BigQuery data for analytics and Vertex AI AutoML for Machine Learning. At this stage, we’re using Data Studio for on-demand, self-serve reporting, analysis, and data mashups across the data sets. The goal is to eventually integrate it with Google Cloud’s Looker in order to bring in the semantic layer and standardize on a single point of data access layer for all analytics in TackPay. 

Building towards a future of digital tipping

TackPay product has been online for a few months and is actively processing tips in many countries, including Italy, Hungary, UK, Spain, Canada. The solution has been recently installed in leading companies in the hospitality industry in Europe, becoming a reliable partner for them. There is an ambitious plan to expand into the Middle East market in the coming months. 

To enable this expansion, we’ll need to validate product engagement in specific target countries and scale up by growing the team and the product. Our technical collaboration with Google Cloud will help to make that scaling process effortless. If you are interested about tech considerations for startups, fundamentals of database design with Google Cloud, and other developer / startup topics, check out my blog.

If you want to learn more about how Google Cloud can help your startup, visit our pagehere to get more information about our program, and sign up for our communications to get a look at our community activities, digital events, special offers, and more.

Source : Data Analytics Read More

The Denodo Platform meets BigQuery

The Denodo Platform meets BigQuery

It’s only natural that the Denodo Platform would achieve theGoogle Cloud Ready – BigQuery designation earlier this month; after all, the Denodo Platform and Google BigQuery have much in common.

The Denodo Platform, powered bydata virtualization, enables real-time access across disparate on-premises and cloud data sources, without replication, andBigQuery, the cloud-based enterprise data warehouse (EDW) on Google Cloud , enables blazing-fast query-response across petabytes of data, even when some of that data is stored outside of BigQuery in on-premises systems.

For users of the Denodo Platform on Google Cloud, BigQuery certification offers confidence that the Denodo Platform’s data integration and data management capabilities work seamlessly with BigQuery, as Google only confers this designation on technology that meets stringent functional and interoperability requirements.

In addition to storage “elbow room,” BigQuery brings new analytical capabilities to Denodo Platform users on Google Cloud, including out-of-the box machine learning (ML) capabilities like Apache Zeppelin for Denodo, as well as geospatial, business intelligence (BI), and other types of data analysis tools.

But it gets better.

The Denodo Platform on Google Cloud + BigQuery

Combining the full power of the Denodo Platform with BigQuery enables easy access to a wider breadth of data, all with a single tool. The Denodo Platform’s ability to deliver data in real time over BigQuery cloud-native APIs enables frictionless data movement between on-premises, cloud, and Google Cloud Storage data sources.

Enhanced BigQuery support combines Google’s native connectivity with the Denodo Platform’s query pushdown optimization features, to process massive big-data workloads with better performance and efficiency. For further performance, BigQuery can be leveraged as a high-performance caching database for the Denodo Platform in the cloud. This supports advanced optimization techniques like multi-pass executions based on intermediate temporary tables.

Users also benefit from the same flexible pricing available on Google Cloud, letting them start small with BigQuery, and scale as needed.

Use Cases Abound

Combining the Denodo Platform with BigQuery enables a wide variety of use cases, such as:

Machine Learning/Artificial Intelligence (ML/AI) and Data Science in the Cloud

Users can leverage the Denodo Platform’s data catalog to search the available datasets and tag the right ones for analytics and ML projects. This also helps data scientists to combine data stored in BigQuery and data virtualization layers to build models in a quick and easy manner, putting cloud elasticity to work. Using the metadata and data lineage capabilities of the Denodo Platform, users can access all of the data in a governed fashion.

Zero-Downtime Migrations and Modernizations

The Denodo Platform acts as a common access point between two or more data sources, providing access to multiple sources, simultaneously, even when the sources are moved, while hiding the complexities of access from the data consumers. This enables seamless, zero-downtime migrations from on-premises systems or other cloud data warehouses (such as Oracle or SQL Server) to BigQuery. Similarly, the Denodo Platform makes it possible for stakeholders to modernize their systems, in this case their BigQuery instance, with zero impact on users.

Data Lake Creation 

Users can easily create virtual data lakes, which combine data across sources, regardless of type or location, while also enabling the definition of a common semantic model across all of the disparate sources.

Data-as-a-Service (DaaS) 

The Denodo Platform also facilitates easy delivery of BigQuery and Google Cloud Storage data (structured and semi-structured) to users as an API endpoint. With this support, the platform lets companies expose data in a controlled, curated manner, delivering only the data that is suitable for specific business partners and other external companies, and easily monetizing relevant datasets when needed.

The Dream of a Hybrid Data Warehouse, Realized

Let’s look at one way that the Denodo Platform and BigQuery can work together on Google Cloud. In the architecture illustrated below, the two technologies enable a hybrid (on-premises/cloud) data warehouse configuration.

I’d like to point out a few things in this diagram (see the numbered circles). You can:

Move your relational data for interactive querying and offline analytics to BigQuery.

Move your relational data from large scale databases and applications to Google Spanner, when you need high I/O and global consistency.

Move your relational data from Web frameworks and existing applications to Google Cloud SQL.

Combine all of these sources with the relational data sitting on-premises in a traditional data warehouse, creating a single centralized data hub.

Run real-time queries on virtual data from other applications.

Build operational reports and analytical dashboards on top of the Denodo Platform to gain insights from the data, and use Looker or other BI tools to serve thousands of end users.

Getting Started

BigQuery certification provides Denodo Platform users on Google Cloud with yet another reason to appreciate Google Cloud. Visit the Denodo Platform for Google Cloud page for more information.

If you are new to the Denodo Platform on Google Cloud, there is no better way to discover its power than to try it out for yourself. Denodo offers not only a way to do that, for free for 30 days, but also built-in guidance and support.

Source : Data Analytics Read More

The top five global data and AI trends in 2023

The top five global data and AI trends in 2023

How will your organization manage this year’s data growth and business requirements? Your actions and strategies involving data and AI will improve or undermine your organization’s competitiveness in the months and years to come. Our teams at Google Cloud have an eye on the future as we evolve our strategies to protect technology choice, simplify data integration, increase AI adoption, deliver needed information on demand, and meet security requirements. 

Google Cloud worked with* IDC on multiple studies involving global organizations across industries in order to explore how data leaders are successfully addressing key data and AI challenges. We compiled the results in our 2023 Data and AI Trends report. In it, you’ll find the metrics-rich research behind the top five data and AI trends, along with tips and customer examples for incorporating them into your plans. 

1: Show data silos the door

Given the increasing volumes of data we’re all managing, it’s no surprise that siloed transactional databases and warehousing strategies can’t meet modern demands. Organizations want to improve how they store, manage, analyze, and govern all their data, while reducing costs. They also want to eliminate conflicting insights from replicated data and empower everyone with fresh data.

A unified data cloud enables the integration of data and insights into transformative digital experiences and better decision making. Andi Gutmans
GM and VP of Engineering for Databases, Google Cloud

In the report, you can learn how to adopt a unified data cloud that supports every stage of the data lifecycle so that you can improve data usage, accessibility, and governance. Inform your strategy by drawing on organizations’ examples such as a data fabric that improves customer experiences by connecting more than 80 data silos, as well as other unified data clouds that save money and simplify growth. 

2: Usher in the age of the open data ecosystem

Data is the key to unlocking AI, speeding up development cycles, and increasing ROI. To protect against data and technology lock-in, more organizations are adopting open source software and open APIs.

Organizations want the freedom to create a data cloud that includes all formats of data from any source or cloud. Gerrit Kazmaier
VP and GM, Data & Analytics, Google Cloud

Understand how you can simplify data integration, facilitate multicloud analytics, and use the technologies you want with an open data ecosystem, as described in the report. Learn from metrics about global open source adoption and public dataset usage. And explore how global companies adopted open data ecosystems to improve patient outcomes, increase website traffic by 25%, and cut operating costs by 90%.

3: Embrace the AI tipping point

Pulling useful information out of data is easier with AI and ML. Not only can you identify patterns and answer questions faster but the technologies also make it easier to solve problems at scale.

We’ve reached the AI tipping point. Whether people realize it or not, we’re already using applications powered by AI—every day. Social media platforms, voice assistants, and driving services are easy examples. June Yang
VP, Cloud AI and Industry Solutions, Google Cloud

Organizations share how they’re reaching their goals using AI and ML by empowering “citizen data scientists” and having them focus on small wins first. Gain tips from Yang and other experts for developing your AI strategy. And read how organizations achieve outcomes such as a reduction of 7,400 tons per year in carbon emissions and a more than 200% increase in ROI from ad spend by using pattern recognition and other AI capabilities. 

4: Infuse insights everywhere

Yesterday’s BI solutions have led to outdated insights and user fatigue with the status quo, based on generic metrics and old information. Research shows that as new tools come online, expectations for BI are changing, with companies revising their strategies to improve decision making, speed up the development of new revenue streams, and increase customer acquisition and retention by providing individuals with needed information on demand.

Organizations are equipping business decision-makers with the tools they need to incorporate required insights into their everyday workflows. Kate Wright
Senior Director, Product Management, Google Cloud

In the report, you’ll discover why and how data leaders are rethinking their BI analytics strategies and applications to improve users’ trust and use of data in automated workflows, customizable dashboards, and on-demand reports. Global companies also share how they improve decision making with self-service BI, customer experiences with IoT analysis, and threat mitigation with embedded analytics.

5: Get to know your unknown data

Increasing data volumes can make it harder to know where and what data they store, which may create risk. Case in point: If a customer unexpectedly shares personally identifiable information during a recorded customer support call or chat session, that data might require specialized governance, which the standardized storage process may not provide.

If you don’t know what data you have, you cannot know that it’s accurately secured. You also don’t know what security risks you are incurring, or what security measures you need to take. Anton Chuvakin
Senior Staff Security Consultant, Google Cloud

Check out the report to learn about data security risks that are often overlooked and how to develop proactive governance strategies for your sensitive data. You can also read how global organizations have increased customer trust and productivity by improving how they discover, classify, and manage their structured and unstructured data.  

Be ready for what’s next

What’s exciting about these trends is that they’re enabling organizations across industries to realize very different goals using their choice of technologies. And although all the trends depend on each other, research shows you can realize measurable benefits whether you adopt one or all five.

Review the report yourself and learn how you can refine your organization’s data and AI strategies by drawing on the collective insights, experiences, and successes of more than 800 global organizations. 

*commissioned by Google Cloud

Source : Data Analytics Read More

Building your own private knowledge graph on Google Cloud

Building your own private knowledge graph on Google Cloud

A Knowledge Graph ingests data from multiple sources, extracts entities (e.g., people, organizations, places, or things), and establishes relationships among the entities (e.g., owner of, related to) with the help of common attributes such as surnames, addresses, and IDs.

Entities form the nodes in the graph and the relationships are the edges or connections. This graph building is a valuable step for data analysts and software developers for establishing entity linking and data validation.

The term “Knowledge Graph” was first introduced by Google in 2012 as part of a new Search feature to provide users with answer summaries based on previously collected data from other top results and sources.

Advantages of a Knowledge Graph

Building a Knowledge Graph for your data has multiple benefits:

Clustering text together that is identified as one single entity like “Da Vinci,” “Leonardo Da Vinci,” “L Da Vinci,” “Leonardo di ser Piero da Vinci,” etc. 

Attaching attributes and relationships to this particular entity, such as “painter of the Mona Lisa.”

Grouping entities based on similarities, e.g., grouping Da Vinci with Michelangelo because both are famous artists from the late 15th century.

It also provides a single source of truth that helps users discover hidden patterns and connections between entities. These linkages would have been more challenging and computationally intensive to identify using traditional relational databases.

Knowledge Graphs are widely deployed for various use cases, including but not limited to: 

Supply chain: mapping out suppliers, product parts, shipping, etc.

Lending: connecting real estate agents, borrowers, insurers, etc.

Know your customer: anti-money laundering, identity verification, etc.

Deploying on Google Cloud

Google Cloud has introduced two new services (both in Preview as of today): 

The Entity Reconciliation API lets customers build their own private Knowledge Graph with data stored in BigQuery

Google Knowledge Graph Search API lets customers search for more information about their entities from the Google Knowledge Graph.

To illustrate the new solutions, let’s explore how to build a private knowledge graph using the Entity Reconciliation API and use the generated ID to query the Google Knowledge Graph Search API. We’ll use the sample data from zoominfo.com for retail companies available on Google Cloud Marketplace (link 1, link 2). 

To start, enable the Enterprise Knowledge Graph API and then navigate to the Enterprise Knowledge Graph from the Google Cloud console.

The Entity Reconciliation API can reconcile tabular records of organization, local business, and person entities in just a few clicks.Three simple steps are involved:

 Identify the data sources in BigQuery that need to be reconciled and create a schema mapping file for each source.

 Configure and kick off a Reconciliation job through our console or API.

Review the results after job completion.

Step 1

For each job and data source, create a schema mapping file to inform how Enterprise Knowledge Graph ingests the data and maps to a common ontology using schema.org. This mapping file will be stored in a bucket in Google Cloud Storage.

For the purposes of this demo, I am choosing the organization entity type and passing in the database schema that I have for my BigQuery table. Note to always use the latest from our documentation.

code_block[StructValue([(u’code’, u’prefixes:rn ekg: http://cloud.google.com/ekg/0.0.1#rn schema: https://schema.org/rnrnmappings:rn organization:rn sources:rn – [yourprojectid:yourdataset.yourtable~bigquery]rn s: ekg:company_$(id_column_from_table)rn po:rn – [a, schema:Organization]rn – [schema:name, $(name_column_from_table)]rn – [schema:streetAddress, $(address_column_from_table)]rn – [schema:postalCode, $(ZIP_column_from_table)]rn – [schema:addressCountry, $(country_column_from_table)]rn – [schema:addressLocality, $(city_column_from_table)]rn – [schema:addressRegion, $(state_column_from_table)]rn – [ekg:recon.source_name, (chosen_source_name)]rn – [ekg:recon.source_key, $(id_column_from_table)]’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ec31d74b750>)])]

Step 2

The console page shows the list of existing entity reconciliation jobs available in the project.

Create a new job by clicking on the “Run A Job” button in the action bar, then select an entity type for entity reconciliation.

Add one or more BigQuery data sources and specify a BigQuery dataset destination where EKG will create new tables with unique names under the destination data set. To keep the generated cluster IDs constant across different runs, advanced settings like “previous BigQuery result table” are available. 

Click “DONE” to create the job.

Step 3

After the job completes, navigate to the output BigQuery table, then use a simple join query similar to the one below to review the output:

code_block[StructValue([(u’code’, u’SELECT *rnFROM `<dataset>.clusters_14002307131693260818` as RS join `<dataset>.retail_companies` as SRCrnon RS.source_key = SRC.COMPANY_IDrnorder by cluster_id;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ec31d77fdd0>)])]

This query joins the output table with the input table(s) of our Entity Reconciliation API and orders by cluster ID. Upon investigation, we can see that two entities are grouped into one cluster.

The  confidence score indicates how likely it is that these entities belong to this group. Last but not least, the cloud_kg_mid column returns the linked Google Cloud Knowledge Graph machine ID, which can be used for our Google Knowledge Graph Search API.

Running the above cURL command will return response that contains a list of entities, presented in JSON-LD format and compatible with schema.org schemas with limited external extensions.

For more information, kindly visit our documentation.

Special thanks to Lewis Liu, Product Manager and Holt Skinner, Developer Advocate for the valuable feedback on this content.

Source : Data Analytics Read More

8 ways to cut costs and drive profits using data and AI

8 ways to cut costs and drive profits using data and AI

We are increasingly seeing one question arise in virtually every customer conversation: How can the organization save costs and drive new revenue streams? 

Everyone would love a crystal ball, but what you may not realize is that you already have one. It’s in your data. By leveraging Data Cloud and AI solutions, you can put your data to work to achieve your financial objectives. Combining your data and AI reveals opportunities for your business to reduce expenses and increase profitability, which is especially valuable in an uncertain economy. 

Google Cloud customers globally are succeeding in this effort, across industries and geographies. They are improving efficiency and ROI by saving money and creating new revenue streams. We have distilled the strategies and actions they are implementing—along with customer examples and tips—in our eBook, “Make Data Work for You.” In it, you’ll find ways you can pare costs, increase profitability, and monetize your data.  

Find money in your data 

Our Google Cloud teams have identified eight strategies that successful organizations are pursuing to trim expenses and uncover new sources of revenue through intelligent use of data and AI. These use cases range from scaling small efficiencies in logistics to accelerating document-based workflows, monetizing data, and optimizing marketing spend.

The results are impressive. They include massive cost savings and additional revenue. On-time deliveries have increased sharply at one company, and procure-to-pay processing costs have fallen by more than half at another. Other organizations have reaped big gains in ecommerce upselling and customer satisfaction.

We’ve found that businesses across every industry and around the globe are able to take action on at least one of these eight strategies. Contrary to common misperceptions, implementation does not require massive technology changes, crippling disruption to your business, or burdensome new investments. 

What success looks like 

If you worry your business is not ready or you need to gain buy-in from leadership, the success stories of the 15 companies in this report are helpful examples. Learning how organizations big and small, in different industries and parts of the world, have implemented these data and AI strategies makes the opportunities more tangible.

Carrefour 
Among the world’s largest retailers, Carrefour operates supermarkets, ecommerce, and other store formats in more than 30 countries. To retain leadership in its markets, the company wanted to strengthen its omnichannel experience.

Carrefour moved to Google Data Cloud and developed a platform that gives its data scientists secure, structured access to a massive volume of data in minutes. This paved the way for smarter models of customer behavior and enabled a personalized recommendation engine for ecommerce services. 

The company saw a 60% increase in ecommerce revenue during the pandemic, which it partly attributes to this personalization. 

ATB Financial 
ATB Financial, a bank in the Canadian province of Alberta, uses its data and AI to provide real-time personalized customer service, generating more than 20,000 AI-assisted conversations monthly. Machine learning models enable agents to offer clients real-time tailored advice and product suggestions. 

Moreover, marketing campaigns and month-end processes that used to take five to eight hours now run in seconds, saving over CA$2.24 million a year. 

Bank BRI
Bank BRI, which is owned by the Indonesian government, has 75.5 million clients. Through its use of digital technologies, the institution amasses a lot of valuable data about this large customer base. 

Using Google Cloud, the bank packages this data through more than 50 monetized open APIs for more than 70 ecosystem partners who use it for credit scoring, risk management, and other applications. Fintechs, insurance companies, and financial institutions don’t have the talent or the financial resources to do quality credit scoring and fraud detection on their own, so they are turning to Bank BRI. 

Early in the effort, the project generated an additional $50 million in revenue, showing how data can drive new sources of income. 

How to get going now

Make Data Work for You” will help you launch your financial resiliency initiatives by outlining the steps to get going. The process lays the groundwork for realizing your own cost savings and new revenue streams by leveraging data and AI.

Among these steps include building frameworks to operate cost efficiently, make informed decisions related to spending and optimize your data and AI budgets.

Operate: Billing that’s specific to your use-case
Control your costs by choosing data and analytics vendors who offer industry-leading data storage solutions and flexible pricing options. For example, multiple pricing options such as flat rate and pay-as-you-go allow you to optimize your spend for best price-performance.

Inform: make informed decisions based on usage
Use your cloud vendor’s dashboards or build a billing data report to gain insights on your spending over time. Make use of cost recommendations and other forecasting tools to predict what your future expenses are going to be.

Optimize: Never pay more than you use 
While planning data analytics capacity, organizations often overprovision and overpay than what they actually use. Consider migrating your workloads that have unpredictable demand to a data warehousing solution that offers granular level autoscaling features so that you never have to pay for more than what you use.

There are other key moves that will set your initiative up for success including how to shorten time to value in building AI models and measuring impact. You can find details in the report.

A brighter future

The teams at Google Cloud helped the companies in “Make Data Work for You,” along with many more organizations, use their data and AI to achieve meaningful results. Download the full report to see how you can too.

Source : Data Analytics Read More

Optimizing the retail experience and developing customer loyalty with advanced analytics

Optimizing the retail experience and developing customer loyalty with advanced analytics

E-commerce transformation has moved at an astounding pace over the last decade. Retailers of all sizes have grappled with immense changes, as new technologies reinvent what’s possible and what consumers have come to expect from the shopping experience.

Retailers that embrace evolving shopper expectations by putting customers at the center of everything they do are finding countless opportunities to thrive. 

At Quantum Metric, we help businesses solve many problems, but virtually all of them involve understanding customers through better data visibility and analytics, and then acting on those insights.

Canadian Tire Corporation (CTC), one of Canada’s oldest and largest retailers, is an exciting example of what we have been able to accomplish through our independent software vendor (ISV) partnership with Google Cloud. Since 2018, we have worked with Google Cloud to help CTC build and optimize the digital journey of its Triangle loyalty program, which boasts 11 million active members across the country.

Now, we would like to share how our partnership with Google Cloud helped CTC achieve a 15% omnichannel sales increase by tailoring digital customer experiences.

Why Canadian Tire is a retail leader in Canada

CTC is one of Canada’s biggest retailers, with over 1,700 retail and gas outlets, alongside 13 businesses and brands, including Canadian Tire Retail, Mark’s, Sport Chek, FGL Sports, and Partsource. In 2022, Canadian Tire celebrated 100 years in business and continued to offer a wide assortment of goods that appeal to Canadians, 90% of whom live within 15 minutes of a retail location.

Even with such an extensive catalog of brands and products, Canadian Tire has always been extremely focused on customers, tailoring many of its business processes to fit the needs, demands, and preferences of the people and communities it serves.

This is why the company recognized the need to leverage the cloud to harmonize increasing digital customer and offline data sources in real-time and offer customers the e-commerce experience they expect.  

Around this time, we began to work with CTC to enable the retailer to more efficiently identify customer pain points, quantify their impact and prioritize action.

CTC was facing common problems in their digital properties, such as difficulties with adding items to the cart, API failures during checkout, and scripts conflicting with each other on certain device and browser combinations, resulting in lower conversion rates and NPS scores.

However, with the implementation of BigQuery and Quantum Metric’s advanced data capture technology, CTC was able to automate and expedite identification and resolution of issues affecting customer experience. This allowed for quicker resolution of problems, resulting in improved conversion rates and NPS scores.

To stay ahead of ongoing digital challenges, we now analyze insights fromover 65 million sessions a month across all of CTC’s brands. That’s over 1 terabyte of data every month! This data from customer interactions is then reformatted into behavioral dashboards or individual session replays of the customer’s experience to help CTC teams easily understand the information. We accomplish this by using our patented DOM reconstruction technology to take data and translate it with a focus on security and performance. The result is a 360-view into customer digital activities that enable the Canadian Tire team to fully understand what customers experience on the digital path.

Through the Quantum Metric platform, CTC can quickly identify and address issues impacting customers, ranging from shopping cart checkout problems and promo code issues to slow-loading pages.

Leveraging the power of Google Cloud in retail

Quantum Metric is proud to be built on, and partner with, Google Cloud. By capturing 100% of digital customer interactions even on the busiest days, massive brands like Canadian Tire have complete visibility year-round. This is incredibly important because it allows us to not just fix problems, but also use the full breadth of data to make decisions.

Our use of BigQuery and integration with Google Cloud lets us move away from vague details about customers, engagement averages, and overly broad analytics to uncover exact figures that highlight individual interactions. This was especially beneficial for CTC when the COVID-19 pandemic struck.

When stores shut down, Canadian Tire saw a drop off in its loyalty program engagement because people were not going to stores and did not have an intuitive way to add their loyalty information on websites and apps. Thanks to the power of BigQuery and Quantum Metric’s platform, we helped CTC understand the issue on a granular level. 

Responding to what the data told us, we delivered a new feature that gave customers the flexibility to add their loyalty information at any touchpoint. Once the initial feature was released, Canadian Tire used our session replays and UX analytics to understand customers better and quickly tweak them for greater engagement.

CTC took an iterative approach to improve the feature and, before long, saw a 72% increase in the number of people adding their loyalty information. Rather than seeing engagement drop during an extremely difficult time, Canadian Tire was able to expand its loyalty program with the help of personalization and a customer-centric design approach.

At the end of the day, Quantum Metric’s integration with BigQuery enables Canadian Tire to respond to customer needs, demands, and preferences faster and smarter. Canadian Tire also takes advantage of our unique ability to offer ungated access to all its data in Google BigQuery, as it merges data sets from Google Analytics, transactional information, and Quantum Metric itself.

Quantum Metric got started through the Google Cloud for Startups program and ultimately ended up building highly integrated tools that work seamlessly with BigQuery. We’re capturing petabyte-scale data and allowing companies of all sizes to quickly manage and understand their data, take action, and ultimately drive better experiences and higher sales.

Learn more about how Google Cloud partners can help your business solve its challenges here.

Source : Data Analytics Read More

Built with BigQuery: Aible’s serverless journey to challenge the cost vs. performance paradigm

Built with BigQuery: Aible’s serverless journey to challenge the cost vs. performance paradigm

Aible is the leader in generating business impact from AI in less than 30 days by helping teams go from raw data to business value with solutions for customer acquisition, churn prevention, demand prediction, preventative maintenance, and more. These solutions help IT and data teams identify valuable data through automated data validation, enabling collaborative open-world exploration of data, and deliver AI recommendations in enterprise applications to help teams achieve business goals while considering unique business circumstances such as marketing budgets and changing market conditions. 

For example, if a sales optimization model would require a 10% increase in sales resources for optimal revenue/profit outcome, the user can specify whether or not such a resource shift is possible, and Aible would choose the best predictive model and threshold level across thousands of model-hyperparameter combinations of models autotrained by Aible to satisfy the business needs. Thus, Aible combines business optimization and machine learning by saving off the hyperparameter-model search space and then searching for the optimal model settings given the users business goals and constraints.

As economic conditions change, many companies shift their data warehouse use cases away from standard subscription models that procure static/fixed-size infrastructure configurations, regardless of the actual utilization or demand rate. However, the paradigm breaks down for most organizations the moment they want to analyze or build predictive models based on the data in the data warehouse – all of a sudden, data scientists start bringing up server clusters that they keep running for six to nine months during the duration of the analytics or data science project because most data science and analytics platforms are not serverless today and accrue expenses if they are “always on.”

Aible’s Value Proposition (Ease of use, automation and faster ROI) powered by BigQuery’s serverless architecture 

Serverless architectures overcome unnecessary server uptime and allow for significant cost efficiencies. Instead of needing to keep servers running for the duration of analytics and data science & machine learning projects, serverless approaches let the users interact with the system in a highly responsive manner using metadata and browsers while ramping up compute resources for short lengths of time – when absolutely necessary. A serverless, fully managed enterprise data warehouse like BigQuery can save state until the next invocation or access is required and also provides beneficial security and scalability characteristics. 

Aible leverages Google Cloud to bring serverless architecture and a unique augmented approach to most analytics and data science use cases across user types while realizing significant cost efficiencies. Aible realized a simple fact – in the time a human can ask a handful of questions, an AI can ask millions of questions and save off the answers as metadata. Then, if you had a truly serverless end-to-end system, users could get their questions answered without hitting the server with the raw data again. 

For example, one user may create a dashboard focused on sales channels, while another user may analyze geographical patterns of sales, and a third user might benchmark different salespeople’s performance; but all of these could be done based on manipulating the metadata. Aible’s end-to-end serverless user interface runs directly in the user’s browser and accesses saved off metadata in the customer’s cloud account.

The big question was whether the cost was indeed lower if the AI asked a million questions all at once? In January 2023, Google and Aible worked with a joint Fortune 500 customer to test out this architecture. The test was run using Aible on BigQuery without any special optimizations. The customer had sole discretion over what datasets they used. The results were outstanding. Over two weeks, more than 75 datasets of various sizes were evaluated. The total number of rows exceeded 100 million, and the total number of questions answered and then saved off was over 150 million. The total cost across all that evaluation was just $80.

At this customer, traditional analytics and data science projects typically take about four months to complete. Based on their typical completion time, they estimated that it would have cost more than $200,000 in server and associated costs to conduct these 75 projects. As shown in the table above, the AI-first end-to-end serverless approach was more than 1,000 times efficient compared to traditional servers. 

The following diagram shows exactly why the combined Aible and Google, AI-first end-to-end serverless environment was so efficient. Note that because Aible could run the actual queries serverless on BigQuery, it was able to analyze any size data on a truly end-to-end serverless environment. Aible, supports AWS and Azure as well. The architecture would work exactly the same way using Lambdas and Function Apps for small and medium sized datasets. But, for larger datasets on AWS and Azure, Aible today brings up Spark and at that point the efficiency of the system drops significantly compared to the end-to-end serverless capabilities offered on Google Cloud. 

As shown in the example below, a typical data analysis project may run for six months, requiring 4,320 hours of server time, Aible may actively conduct ‘analysis’ activities for just six hours during the entire project. That translates to a 720-times reduction in server time. But, Aible’s serverless analysis is also three times more cost-effective than the same analysis on comparable servers, according to this benchmark by Intel and Aible.

When Aible needs to evaluate, transform, analyze data, or create predictive models, it pushes the relevant queries to the customer-owned BigQuery datasets or BigQueryML models, as appropriate. It then saves the relevant metadata (including analysis results and models) in the customer’s own private Google Cloud project in Cloud Storage or BigQuery as appropriate. Whenever a user interacts with the analysis results or models, all of the work is done in their browsers, and the metadata is securely accessed as necessary. Aible never gets access to the customer’s data, which remains securely in the customer’s own private Google Cloud project.

Aible built on Google Cloud Platform services

1. Aible Sense

Aible Sense starts you on the data journey and helps you go from overwhelming data to valuable data. With no upfront effort, Aible Sense completely automates the data engineering and data science tasks to ensure a dataset is of sufficient quality (running tests like outlier detection, inclusion probabilities, SHAP values, etc.) to generate statistically valid insights, high-impact predictive models, and high-value data warehouses.

The image below depicts the Aible Sense architecture deployed on Google Cloud. Aible is pushing the analysis workload to BigQuery, BigQueryML, and Vertex AI as appropriate to do the feature creation and tests described above:

2. Aible Explore

Aible Explore enables your team to brainstorm with their data. Open world exploration reveals new paths for discovery and helps to identify patterns and relationships among variables. With guided data exploration and augmented analytics, Aible Explore helps business users visually understand business drivers, uncover root causes, and identify contextual insights in minutes. Aible exports dynamic Looker dashboards with a single click, creates the necessary LookML, which is the language needed to build the semantic model, and points to the underlying data in BigQuery. Aible enables rapid deployment of Looker dashboards on BQ data by generating the necessary LookML code without the need for further user intervention thus drastically reducing the cycle time.

The image below depicts the Aible Explore architecture deployed on Google Cloud. Because BigQuery scales exceptionally well for large and complex data, by pushing the queries to BQ, Aible was finally able to enable analysis on any size data without resorting to bringing up spark clusters:

3. Aible Optimize

Aible Optimize considers your unique benefit of correct predictions and cost of incorrect predictions, and business constraints such as marketing budget limits that may prevent you from acting upon every AI recommendation. It then shows you exactly how the AI recommendations would impact your business given such business realities. The optimal predictive model is automatically deployed as a serverless (CloudRun) restful endpoint that can be consumed from enterprise applications or systems such as Looker and Salesforce.

The image below depicts the Aible Optimize architecture deployed on Google Cloud. With regard to training models, because BigQueryML and VertexAI scale exceptionally well for large and complex datasets, by leveraging this underlying technology Aible was finally able to enable the training of predictive models on any size data without having to resort to bringing up spark clusters and at the same time adding extra levels of resilience beyond the ones provided by the spark framework.

The proof is in the pudding – Overstock’s customer journey: 

Overstock.com used Aible to improve speed to data-quality evaluation from weeks to minutes per dataset. The entire Aible project took just 5 days, including installation and integration with Overstock’s BigQuery to Executive review and acceptance of results. 

Joel Weight, Overstock.com’s CTO wrote, “We extensively use Google BigQuery. Aible’s seamless integration with BigQuery allowed us to analyze datasets with a single click, and in a matter of minutes automatically get to a dynamic dashboard showing us the key insights we need to see. This would have taken weeks of work using our current best practices. When we can analyze data in minutes, we can get fresh insights instantly as market conditions and customer behavior changes.”

Joel’s comment points to a far more valuable reason to use Aible – beyond massive analysis cost reduction. In rapidly changing markets, the most actionable patterns will be the ‘unknown unknowns.’ Of course, dashboards can be quickly refreshed with new data, but they still ask the same data questions as they always have. What about new insights hidden in the data? The questions we have yet to think to ask? The traditional manual analysis would take weeks or months to detect such insights, and even then, they can’t ask all possible questions. Aible on BigQuery can ask millions of questions and present the key insights in the order of how the insights affect business KPI such as revenue, costs, etc. And it can do so in minutes. This completely changes the art of the possible of who can conduct analysis and how quickly it can generate results. 

Aible natively leverages Google BigQuery, part of Google’s  data cloud, to parallelize these data evaluations, data transformations, explorations, and model training, across virtually unlimited resources. Aible seamlessly analyzes data from various sources by securely replicating the data in the customer’s own BigQuery dataset. Aible also seamlessly generates native Looker dashboards on top of data staged in BigQuery (including data from other sources that Aible automatically stages in BigQuery), automatically taking care of all necessary steps, including custom LookML generation.

Conclusion

Google’s data cloud provides a complete platform for building data-driven applications from simplified data ingestion, processing, and storage to powerful analytics, AI, ML, and data sharing capabilities — all integrated with the open, secure, and sustainable Google Cloud platform. With a diverse partner ecosystem, open-source tools, and APIs, Google Cloud can provide technology companies the portability and differentiators they need.

To learn more about Aible on Google Cloud, visit Aible.com

Click here to learn more about Google Cloud’s Built with BigQuery initiative. 

We thank the Google Cloud team member who contributed to the blog: Christian Williams, Principal Architect, Cloud Partner Engineering

Source : Data Analytics Read More

Meet our Data Champions: Emily Bobis, driving road intelligence in Australia

Meet our Data Champions: Emily Bobis, driving road intelligence in Australia

Editor’s note: This is the second blog in Meet the Google Cloud Data Champions, a series celebrating the people behind data and AI-driven transformations. Each blog features a champion’s career journey, lessons learned, advice they would give other leaders, and more. This story features Emily Bobis, Co-Founder of Compass IoT, an award-winning Australian road intelligence company that uses connected vehicle data to improve road safety, infrastructure and city planning. Read more about Compass IoT’s work.

Tell us about yourself. Where did you grow up? What did your journey into tech look like? 

My journey into tech was unintentional — I always had a perception about the type of people that worked in tech, and I wasn’t “it.” I was in the last year of my undergrad degree at the University of Sydney and I applied for a scholarship for a short-term exchange in Singapore. Turns out, there were four scholarships available and I was one of only four people who applied so I got the scholarship entirely based on luck. On that trip is where I met my co-founder, Angus McDonald.

I worked with Angus on his first startup, a bike sharing service called Airbike. This was my first experience in tech-enabled mobility. Airbike exposed a big data gap in smart cities — how could we design better and safer cities that reflect the way people actually move around them?  This problem became the foundation of why we started Compass IoT.

Which leaders and/or companies have inspired you along your career journey?

We’re very fortunate in Sydney to have a startup ecosystem that is full of founders that genuinely want to help each other succeed. Alex Carpenter who runs the Guild of Entrepreneurs online community and Murray Hurps who spearheads University of Technology Sydney startups are two of the most generous and kind people you could ask for to represent the space.

It might sound odd, but my Taekwondo instructor, Alan Lau, also deserves some recognition. The skills I’ve learnt during my training with Alan — resilience, perseverance, and integrating constant feedback to improve — are skills that directly translate into me being a better entrepreneur. Something people don’t know about me is that I’m a black belt in Taekwondo! 2023 will be my 13th year of training.

Why was having a data/AI strategy important in developing your company, Compass IoT?

Managing data on a large scale can become very expensive and messy. A long-term strategy helps to build products that scale, without huge increases in complexity, cost, or compromising the quality of the product you’re delivering. In the case of Compass, having a strategy and using globally scalable tools from Google Cloud such as BigQuery, Pub/Sub, Cloud Run, and Google Kubernetes Engine enabled us to grow without impacting our data latency and end-user experience.

What’s the coolest thing you and/or your team has accomplished by leveraging data/AI?

We’re incredibly lucky that our customers are great people that jump at opportunities to apply data-driven problem-solving, so it’s difficult to narrow down to a single project. The coolest thing is seeing all the different applications of connected vehicle data across everything from understanding freight routes, improving road safety, to helping local governments prioritize road maintenance and repair after severe flooding.

One of the coolest things was seeing our data used to halve crashes on one of Sydney’s busiest roads, and reduce rear-end crashes on a highway offramp simply by changing how long drivers had to wait at the traffic lights — read the full case study. We ingest billions of data points across Australia every day; Pub/Sub is critical to our ability to deliver near real-time results to our customers with incredibly low latency. Google Cloud’s data processing capabilities makes it possible to monitor changes on a road network where the lives and safety of drivers could be at stake. Road accidents are one of the biggest killers of young Australians under the age of 24, so it’s awesome to know that our technology is being used to proactively save the lives of some of the most vulnerable road users.

What was the best advice you received as you were starting your data/AI journey?

I always refer to one piece of advice that we received from a mentor and friend Brad Deveson: When in doubt, do something. It’s so easy to become overwhelmed and hyper focused on making the ‘right’ decision and avoiding failure, that you don’t make any decision at all. If you’re not sure what decision to make, doing something is better than doing nothing. And if you make a mistake? Take a page out of Ross Geller’s book and pivot.

What’s an important lesson you learned along the way to becoming more data/AI driven? Were there challenges you had to overcome?

The most important lesson I’ve learned, particularly in a data-driven and relatively new industry, is that innovation does not equal acceptance. There is no guarantee your intended customers will be chasing you down for your product or service, or even know that you exist. It is incredibly important to invest a lot of time, patience, and empathy into education and upskilling your customers. Your customers are the main character of your brand story and you are the guide that helps them succeed, not the other way around. 

One of the biggest challenges for non-technical founders of a tech company to overcome is to understand realistic timeframes for development tasks and then managing the expectations of your customers accordingly. Being able to communicate why timelines need to be certain lengths is crucial for delivering high-quality results while keeping both your team and your customers happy. Having a development team who you can trust is essential here. 

Want to learn more about the latest innovations in Google Data Cloud for databases, data analytics, business intelligence, and AI? Join us at the Google Data Cloud & AI Summit to gain expert insights and data strategies.

Related Article

Meet our Data Champions: Di Mayze, calling creative brains to draw stories from data

Join Di Mayze, Global Head of Data & AI at WPP, creative transformation company, as she shares her career journey, lessons, and advice to…

Read Article

Source : Data Analytics Read More

Built with BigQuery: How BigQuery helps Leverege deliver business-critical enterprise IoT solutions at scale

Built with BigQuery: How BigQuery helps Leverege deliver business-critical enterprise IoT solutions at scale

Introduction

Leverege is a software company that enables market leaders around the globe to quickly and cost effectively build enterprise IoT applications to provide data-centric decision capability, optimize operations, improve customer experience, deliver customer value, and increase revenue. Leverege’s premier SaaS product, the Leverege IoT Stack, runs natively on Google Cloud and seamlessly integrates with Google’s vast array of AI/ML products.

Leverege uses BigQuery as a key component of its data and analytics pipeline to deliver innovative IoT solutions at scale. BigQuery provides an ideal foundation for IoT systems with its data warehousing capabilities, out-of-the-box data management features, real-time analytics, cross cloud data integration, and security and compliance standards. These features enable customers to easily integrate data processes and use the resulting datasets to identify trends and apply insights into operations. 

Context and IoT industry background

The Internet of Things (IoT) connects sensors, machines, and devices to the internet, allowing businesses in every industry to move data from the physical world to the digital world, on the edge and in the cloud. The adoption of large-scale IoT solutions gives businesses the data they need to improve efficiency, reduce costs, increase revenue, and drive innovation. 

The power of IoT solutions, and their impact on the global economy, are driving demand for robust and secure enterprise data warehouse capabilities. IoT presents a particular challenge on the infrastructure level because many technical requirements at scale cannot be predicted in advance. Some customers need to manage massive IoT datasets while others require real-time data streaming or fine-grained access controls. 

The breadth of infrastructure requirements in the IoT space means Leverege depends on partnering with a best-in-class cloud computing provider. On the technical side, a full-featured data warehouse is required to meet customer needs and bring them to scale. On the financial side, the end-to-end solution must be designed to manage and reduce overall costs, accounting for each of the solution’s components (hardware, connectivity, infrastructure, and software).

By leveraging the scalability and flexibility of Google Cloud Platform and BigQuery, Leverege’s customers can affordably store, process, and analyze data from millions of connected devices and extract the value they need from sensor data.

Introduction to Leverege using Google Cloud

Leverege offers a customizable multi-layer IoT stack to help organizations quickly and easily build and deploy IoT solutions that provide measurable business value. The Leverege IoT Stack consists of three components:

Leverege Connect is focused on device management, enabling the secure provisioning, connection and management of distributed IoT devices. Leverege Connect serves as a replacement for Google IoT Core which will be retired in August 2023 and supports protocols such as MQTT, HTTP, UDP, and CoAP.

Leverege Architect is focused on data management, enabling the ingestion, organization, and contextualization of device and business data with the ability to apply AI/ML for powerful insights and/or expose via APIs to external services.

Leverege Build optimizes application development, enabling the generation, configuration, and branding of end-user applications with tailored experiences on a per-role basis; all with no-code tooling.

Leverege IoT Stack is deployed with Google Kubernetes Engine (GKE), a fully managed kubernetes service for managing collections of microservices. Leverege uses Google Cloud Pub/Sub, a fully managed service, as the primary means of message routing for data ingestion, and Google Firebase for real-time data and user interface hosting. For long-term data storage, historical querying and analysis, and real-time insights , Leverege relies on BigQuery.

Leveraging BigQuery to deliver and manage IoT solutions at scale

Use case #1: Automating vehicle auctions for the world’s largest automobile wholesaler

The world’s leading used vehicle marketplace faced the costly challenge of efficiently orchestrating and executing simultaneous in-person and online car auctions on parking lots up to 600 acres in size. Before the IoT solution was deployed, manually staging thousands of vehicles each day involved hundreds of people finding specific vehicles based on hard-to-discover information and attempting to arrange them in precise order. This manual process was highly inefficient, unreliable, and negatively impacted the customer experience since vehicles routinely missed the auction or were out of sequence. 

To solve the problem, the customer built low-cost, long battery life GPS trackers and placed them inside all of the vehicles on the lot. Leverege integrated the devices into a holistic end-to-end solution, providing full awareness and visibility into precise car location, diagnostics, automated querying, analysis reports, and movement with walking directions to vehicles of interest. This digital transformation saved the customer millions of dollars a year while simultaneously increasing customer satisfaction by a significant amount.

After the solution scaled nationwide, monitoring the health of the devices and system was paramount for operational success. BigQuery data partitioning and autonomous analysis jobs allowed for a cost effective way to manage and segment system alerts and reports of overall system health using very large datasets.

Use case #2: Analyzing the state and readiness of boats anywhere in the world in real-time 

Working with the largest boat engine manufacturer in the world, Leverege delivered an IoT solution providing boat owners and fleet managers with real-time, 24/7 access to the state, readiness, and location of their boats around the globe.

Seamlessly and reliably providing real-time marine data to boat owners requires technical integration across hardware, software, and connectivity, a problem uniquely suited for an IoT solution. The customer’s “Connected Boat” product reports a high volume of disparate data including the status of every electrical, mechanical, and engine subsystem. Some of this data is only important historically when incidents and issues arise and boat owners need to investigate. 

BigQuery allows Leverege to record the full volume of historical data at a low storage cost, while only paying to access small segments of data on-demand using table partitioning. 

For each of these examples, historical analysis using BigQuery can help identify pain points and improve operational efficiencies. They can also do so with both public datasets and private datasets. This means an auto wholesaler can expose data for specific vehicles, but not the entire dataset (i.e., no API queries). Likewise, a boat engine manufacturer can make subsets of data available to different end users.

Leverege IoT Stack reference architecture: Integrating components to deliver robust, scalable, and secure solutions 

The Leverege IoT Stack is built on top of Google Cloud’s infrastructure, making use of several core components that work together to deliver a robust, scalable, and secure solution. These components include:

GKE: Leverege uses GKE to deploy a collection of microservices and easily scale end-to-end IoT solutions. These microservices handle tasks such as device management, data ingestion, and real-time data processing. In addition, GKE secures high degree of business continuity, enables self-healing and fault tolerance, which allow Leverege to provide enterprise-grade availability and uptime. These capabilities are crucial for Leverege to meet requirements specified by Service-Level Agreements.

Pub/Sub: Leverege uses Pub/Sub to orchestrate the routing of messages for data ingestion, allowing customers to process data in near real-time. This provides a highly auto scalable, fault-tolerant message queuing system.

Firebase: Leverege uses Firebase for real-time data and UI hosting, providing customers with a responsive and interactive user experience. With Firebase, customers can easily access and visualize IoT data, as well as build and scale applications with minimal effort.

BigQuery: BigQuery is a fundamental part of the Leverege solution. It enables customers to run long-term data storage and complex, historical SQL-like queries. These queries can be run on large amounts of data in real-time, providing customers actionable insights that can help improve operational efficiencies.

Solution: Leveraging core BigQuery features for IoT use cases 

Many technology companies make extensive use of specific BigQuery features to deliver business-critical outcomes. Some use cases demand sub-second latency; others require adaptable ML models. By contrast, enterprise IoT use cases typically include a broad set of requirements necessitating the use of the full breadth of BigQuery’s core features. For example, Leverege uses an array of BigQuery features, including: 

Data Storage: BigQuery serves as a limitless storage platform allowing Leverege customers to store and manage large-scale IoT data with high availability, including real-time and historical data. Some of Leverege’s integrated devices can report thousands of times a day. At a scale of millions of devices, Leverege’s customers need a scalable data warehouse.

Real-Time Streaming: BigQuery also provides a powerful streaming capability, which allows the Leverege IoT Stack to ingest and process large amounts of data in near real-time. This is crucial to components of Leverege Build, which offers out-of-the-box charts and graphs using historical data. These tools are more valuable with the integration and use of real-time data. Streaming capabilities ensure customers easily access full-scope data without searching Google Firebase.

Data Partitioning: BigQuery enables cost-effective, fast queries by providing customizable data partitioning. The Leverege IoT stack partitions nearly all historical tables by ingestion time. Because most internal history queries are time-based, this results in significant cost savings.

Data Encryption: BigQuery provides built-in encryption at rest by default, allows customers to securely store sensitive data and protect it against unauthorized access.

Access Control: BigQuery provides numerous secure data sharing capabilities. Leverege uses linked data sets and authorized views with row level policies to enforce strict access control. These policies are critical because many IoT projects allow for multi-tenancy and data siloing.

Data Governance: BigQuery provides a robust set of data governance and security features, including fine-grained access controls, which Leverege uses to enforce intricate access control policies down to the row level.

In addition to BigQuery’s core features, Leverege uses BigQuery Analytics Hub private data exchanges and Authorized Views on Datasets provides distinct advantages over old methods (e.g. CSV exports and FTP drops). Authorized Views on Leverege’s BigQuery datasets allow for intricate access policies to be enforced, while also providing Leverege’s customers the ability to analyze data using tools like Looker. Using these BigQuery features, Leverege can give customers controlled and metered access to source data without providing direct access. This feature is fundamental to meeting governance requirements across the enterprise.

BigQuery’s built-in machine learning capabilities also allow for advanced analysis and prediction of trends and patterns within the data, providing valuable insights for our customers without moving the data to external systems. Furthermore, the ability to set up automatic data refresh and materialized views in BigQuery ensures that our customers are always working with the most up-to-date and accurate data by getting better performance and reducing unnecessary costs.

Benefits and outcomes

Google Cloud infrastructure and BigQuery features enable Leverege to provide a highly scalable IoT stack. In IoT, the central challenge isn’t deploying small-scale solutions; it’s deploying and managing large-scale, performative solutions and applications by scaling in a short span of time without rearchitecting. 

BigQuery table partitioning splits data into mini tables divided by an arbitrary time range. For many Leverege customers, data is divided by day, and enforced when querying data through the Leverege IoT Stack. Partitioning data tables by time range guarantees queries are restricted to a small subset of data falling within the targeted time range. By using partitioning, Leverege can deliver a performant solution at minimal cost. 

BigQuery clustering further enhances performance by splitting data into designated fields. To make queries more efficient, Leverege uses clustering to query data that meet pre-designated filter criteria. In a large-scale solution with 100,000 devices, Leverege can cluster data tables and query the history of single devices, greatly accelerating searches and making the system much more performant. In addition, the reclustering happens seamlessly in the background without any extra costs.

The integration of the Leverege IoT Stack and Google Cloud, including BigQuery, today power business-critical enterprise IoT solutions at scale. The continued rapid pace of development on the infrastructure and application levels will be essential in delivering the next generation of IoT solutions.

Click here to learn more about Leverege’s capabilities or to request a demo.

The Built with BigQuery advantage for ISVs

Google is helping tech companies like Leverege build innovative applications on Google’s data cloud with simplified access to technology, helpful and dedicated engineering support, and joint go-to-market programs through the Built with BigQuery initiative, launched in April as part of the Google Data Cloud Summit. Participating companies can: 

Get started fast with a Google-funded, pre-configured sandbox. 

Accelerate product design and architecture through access to designated experts from the ISV Center of Excellence who can provide insight into key use cases, architectural patterns, and best practices. 

Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.

BigQuery gives ISVs the advantage of a powerful, highly scalable data warehouse that’s integrated with Google Cloud’s open, secure, sustainable platform. And with a huge partner ecosystem and support for multi-cloud, open source tools and APIs, Google provides technology companies the portability and extensibility they need to avoid data lock-in. 

Click here to learn more about Built with BigQuery.

We thank the Google Cloud and Leverege team members who co-authored the blog: Leverege: Tony Lioon, Director, DevOps. Google:  Sujit Khasnis, Solutions Architect & Adnan Fida, Transformation Technical Lead

Related Article

Built with BigQuery: How Oden provides actionable recommendations with network resiliency to optimize manufacturing processes

Oden uses BigQuery to provide real-time visibility, efficiency recommendations and resiliency in the face of network disruptions in manuf…

Read Article

Source : Data Analytics Read More