Why data clean rooms are key to the future of data sharing and privacy for retailers

Why data clean rooms are key to the future of data sharing and privacy for retailers

Editor’s note: Today we hear from Lytics, whose customer data platform is used by marketers to build personalized digital experiences and one-to-one marketing campaigns using a data science and machine learning decision engine powered by Google Cloud.

Data clean rooms are a powerful, and often underutilized, tool for leveraging information across a business, between its brands, and with its partners. Think of a data clean room as a data-focused equivalent of a physical clean room, where the objective is to keep what’s inside the clean room protected. From marketing to IT to the C-suite, clean rooms are a way to transform how teams use data, and according to IAB data, are also now essential business solutions for audience insights, measurement, and data activation. Still, this only scratches the surface of what’s possible with clean room technology. And as a result, IAB predicts that companies will invest 29% more in 2023 to make the most of their data clean room capabilities as they look ahead.

Lytics’ customer data platform (CDP) enables organizations to connect more meaningfully with their customers. And together with Google Cloud, we recognize that it’s time to fundamentally reimagine what retailers and consumer packaged goods (CPG) brands can really accomplish with data — and in particular, with data clean rooms.

Why are clean rooms so valuable?

Clean rooms have been used for many years in the finance and health industries to enhance data while maintaining security, but are now becoming increasingly ubiquitous among retail and CPG brands that are navigating a heavily privacy-minded present. 

Data, collected by brands directly as opposed to via a third-party, cross-site cookie, is only going to continue to rise in value. Clean rooms make it possible to embrace, with ease and with confidence:

Data sharing and activation. Having better information available to marketing, IT, and executive teams strengthens the entire enterprise. At the same time, you can keep your PII protected and eliminate concerns about data risks.

Enhanced customer profiles. Data today comes from a myriad of sources. By pooling data in clean rooms, you can augment your existing customer profiles with more details, enriching the view of your customers across teams.

Contact sharing. Contact information is critical for outreach. Profiles without accurate contact information are not actionable, so adding these details using clean rooms is invaluable.

Match lists. Comparing lists and matching contacts to certain traits or identities becomes possible, with more data points available.

Joint campaigns. Campaigns run in collaboration with other parts of the enterprise help with efficiency, accuracy, and consistency of message.

Identifying clean room use cases across an enterprise 

Data clean rooms are critical to retail and CPG organizations that want to commoditize the information they collect and store, but they often get mistaken as a tool designed primarily to benefit the technical and data teams responsible for enterprise-wide data quality and pipeline health. On the contrary, there are a long list of core benefits of using a data clean room across internal units.

Some of the top use cases for clean rooms as part of your marketing organization include:

1. Ensuring compliance

Compliance is an ever-present, ever-growing aspect of marketing operations today, as the compliance landscape is constantly shifting, adding new updates and mandates to consider. Using a data clean room provider lets marketers eliminate the guesswork of compliance: staying up to speed on all the possible variances and changes. For marketing teams, it’s a tremendous savings of time and money to ensure that your data policies and usage are compliant.

2. Leveraging data anonymization

Imagine the possibilities of gaining information on your customers or groups of customers without knowing their identities. Anonymization is an effective way to market while protecting customer data privacy. In fact, the infamous personalization-privacy paradox can be solved using anonymized data via data clean rooms. Even if that data does contain personally identifiable information, it’s usable because it can be scrubbed via encryption or hashing in the clean room.

3. Embracing better profile optimization

Data clean rooms let you analyze, manipulate and filter data contained in consumer profiles. For example, you can strip away third-party data and look at just first-party data, or take your existing first-party data and overlay it with third-party data. Clean rooms let you consider different factors, such as groups that may respond to one email message or product feature but not another. For A/B testing, segmenting and list building, the data clean room gives you powerful capabilities. All the while, data will remain protected and secure.

For IT and data teams, clean rooms offer possibilities that can enhance and expand the impact of your data:

1. Leveling up your data privacy and security

Privacy and security are top of mind for IT professionals across industries, but especially in retail and CPG where customer-centricity is critical to success. Every day, new cyberthreats emerge that can jeopardize business operations and brand reputations. A compromised system due to a ransomware attack or a data loss can be exceedingly costly. A data clean room can offer your brand a safe environment in which the business can manage, organize and use data. It’s a powerful solution that allows you to work with data while reducing the risk of exposure or compromise.

2. Enhancing machine learning (ML)

Artificial intelligence and machine learning are increasingly used across IT teams in multiple applications. The key is to have enough data to feed into the algorithms to improve learning. Data clean rooms offer expansive arrays of rich, actionable information. This information can help improve how machines learn and adapt, building better models that are smarter and more accurate. With better models, you’ll be able to expand the insights provided and generate better results.

For executives and leadership teams, the list of use cases continues:

1. Driving more revenue and reducing costs

Data sharing means opportunity, both in the costs of managing data and the financial possibilities. For one, streamlining your data management processes means organizations can significantly lower data management expenses. It will also lower your operational costs through the efficient access, analysis, and use of the data you have.

2. Locking in more partnerships

By partnering in the sharing of data, organizations are able to create new relationships that can safely and richly expand the brand. With new data partners and new datasets, you can find new commonalities that can create new and unforeseen business opportunities. The potential impacts on the organization are expansive. Retailers can, for example, leverage the data and relationships to forge new opportunities in product development, customer service, marketing and sales.

All of that data inevitably will lead to deeper insights and discoveries. These new observations can lead to new income streams that monetize data in heretofore unimaginable ways. These may be new streams, new products and services, and, in some cases, new business models.

About the Lytics Clean Room Solution, built with BigQuery

Lytics and Google Cloud have developed a scalable and repeatable offering for securely sharing data, utilizing Analytics Hub, a secure data exchange, within BigQuery and the Lytics Platform. 

Lytics Clean Room Solution is a secure data sharing and enrichment offering from Lytics that runs on Google Cloud. The solution boasts an integration with BigQuery that makes Lytics an ideal application to simplify and unlock data sharing by unifying and coalescing datasets that help businesses to build or expand existing BigQuery data warehouses. With Analytics Hub, Lytics offers capabilities that improve data management needs on behalf of organizations focused on maximizing the value of that data, and can decrease the time to value in complex data sharing scenarios, meaning that partnership collaboration is safe and secure — and cross-brand activation can be done in just a few hours. 

With the Lytics Clean Room Solution, retailers and CPG brands alike can securely share data hosted on BigQuery and activate shared data into customer profiles. The solution provides tighter control of mission critical data for faster activation, and can also be leveraged to comply with stringent privacy constraints, industry compliance standards and newer regulations.

Data clean rooms provide an extraordinary opportunity to transform the way you do business and connect with customers. Especially in retail, technologies that are complementary, integrated and scalable, like clean rooms, allow you to maximize their capabilities and turn your enterprise data tools into business accelerators — but only if you have the foresight to make (and maximize) the investment.

Read the new ebook from Lytics and Google Cloud to learn more about how retail and consumer brands can unlock business value with data that is connected, intelligent and secure.

Related Article

Using data advocacy to close the consumer privacy trust gap

When it comes to data privacy, brands are out of touch with what consumers really want. Start with data advocacy to bridge the trust gap.

Read Article

Source : Data Analytics Read More

Unify your data assets with an open analytics lakehouse

Unify your data assets with an open analytics lakehouse

For over a decade, the technology industry has searched for ways to store and analyze vast amounts of data that can handle an organization’s volume, latency, resilience, and varying data access requirements. Companies have been making the best of existing technology stacks to tackle these issues, which typically involves trying to either make a data lake behave like an interactive data warehouse or make a data warehouse act like a data lake — processing and storing vast amounts of semi-structured data. Both approaches have resulted in unhappy users, high costs, and data duplication across the enterprise. 

The need for architecture designed to address complex data needs for all users including data analysts, data engineers, and data scientists. 

Historically for analytics, organizations have implemented different solutions for different data use cases: data warehouses for storing and analyzing structured aggregate data primarily used for business intelligence (BI) and reporting, and data lakes for unstructured and semi-structured data, in large volumes, primarily used for data exploration and machine learning (ML) workloads. This approach often resulted in extensive data movement, processing, and duplication, requiring complex extract, transform, and load (ETL) pipelines. Operationalizing and governing this architecture took time and effort and reduced agility. As organizations move to the cloud, they want to break these silos.

Moving to the cloud brings otherwise disparate data sources together and paves the way for everyone to become part of the data and AI ecosystem. It is undeniable that organizations want to leverage data science capabilities at scale, but many still need to realize their return on investments. According to a recent study, 91% of organizations increase investments in data and AI. Yet only 20% see their models go into production deployment. Business Users, Data Analysts, Data Engineers, and Data Scientists want to become part of the Data and AI ecosystem.

The rise of the analytics lakehouse

Google Cloud’s analytics lakehouse combines the key benefits of data lakes and data warehouses without the overhead of each.  We discuss the architecture in detail throughout the “Build an analytics lakehouse on Google Cloud” technical whitepaper. However, in a nutshell, this end-to-end architecture enables organizations to extract data in real-time regardless of which cloud or datastore the data resides in and use it in aggregate for greater insight and artificial intelligence (AI), with governance and unified access across teams.

By breaking the barriers between data sources and providing serverless architectures, the game becomes choosing the optimal processing framework that suits your skills and business requirements. Here are the building blocks of the analytics lakehouse architecture to simplify the experience, while removing silos, risk, and cost:

What makes Google’s analytics lakehouse approach unique?

Google’s analytics lakehouse is not a completely new product but is built on Google’s trusted services such as Cloud Storage, BigQuery, Dataproc, Dataflow, Looker, Dataplex, Vertex AI and others. Leveraging Google Cloud’s resiliency, durability, and scalability, Google enables customers to innovate faster with an open, unified, intelligent data platform. This data platform is the foundation for Google’s analytics lakehouse, which blurs the lines between traditional data warehouses and data lakes to provide customers with both benefits. Bring your analytics to your data wherever it resides with the
the analytics lakehouse architecture. These architecture components include: 

Ingestion: Users can ingest data from various sources, including but not limited to real-time streams, change logs directly from transactional systems, and structured, semi-structured, and unstructured data on files.

Data processing: Data is then processed and moved onto a series of zones. First, data is stored as is within the raw zone. The next layer can handle typical ETL/ELT operations such as data cleansing, enrichment, filtering, and other transformations within the enriched zone. Finally, business-level aggregates are stored in the curated layer for consumption.

Flexible storage options: An analytics lakehouse approach which allows users to leverage open-source Apache Parquet, Iceberg, and BigQuery managed storage, Providing users with the storage options and meeting them where they are based on their requirements.

Data consumption: At any stage, data can be accessed directly from BigQuery, Serverless Spark, Apache Beam, BI tools, or Machine Learning (ML) applications. Providing the choice of compute platforms with unified serverless applications, organizations can leverage any framework that meets their needs. Data consumption does not impact processing due to the complete separation of compute and storage. Users are free to choose serverless applications and run queries within seconds. In addition, the lakehouse provides the dynamic platform to scale advanced new use cases with data-science use cases. With built-in ML inside the lakehouse, you can accelerate time to value.

Data governance: A unified data governance layer provides a centralized place to manage, monitor, and govern your data in the lakehouse and make this data securely accessible to various analytics and data science tools.

FinOps: Google’s Data Cloud can auto adjust fluctuations in demand and can intelligently manage capacity, so you don’t pay for more than you use. Capabilities include dynamic autoscaling, in combination with right-fitting, which saves up to 40% in committed compute capacity for query analysis.

“BigQuery’s flexible support for pricing allows PayPal to consolidate data as a lakehouse. Compressed storage along with autoscale options in BigQuery help us provide scalable data processing pipelines and data usage in a cost-effective manner to our user community.”  — Bala Natarajan, VP Enterprise Data Platforms at PayPal 

Freedom of ‘data architecture’ choice without more data silos

Every organization has its own data culture and capabilities. Yet each is expected to use popular technology and solutions like everyone else. Your organization may be built on years of legacy applications, and you may have developed a considerable amount of expertise and knowledge, yet you may be asked to adopt a new approach based on the latest technology trend. On the other end of the spectrum, you may come from a digital-native organization with no legacy systems at all, but be expected to follow the same principles as process-driven, established organizations. The question is, should you use data processing technology that doesn’t match your organization style, or should you focus on leveraging your culture and skills?

At Google Cloud, we believe in providing choice to our customers — the option of an open platform that minimizes dependencies on a specific framework, vendor, or file format. Not only organizations, but also the teams in each organization, should be able to leverage their skills and do what’s right for them. Let’s go against the school of thought, how about we decouple storage and compute and we do this physically rather than just logically unlike most of the solutions. At the same time, we remove the computational needs with fully managed serverless applications as mentioned earlier. Then the game becomes leveraging the optimal application framework to solve your data challenges to meet your business requirements. In this way, you can capitalize on your team’s skill sets and improve time to market.  

Organizations that want to build their analytics lakehouse using open-source technologies can easily do so by using low-cost object storage provided by Cloud Storage or from other clouds — storing data in open formats like Parquet and Iceberg, for example. Processing engines and frameworks like Spark and Hadoop use these and many other file types, and can be run on Dataproc or regular virtual machines (VMs) to enable transactions. This open-source-based solution has the benefits of portability, community support, and flexibility (though it requires extra effort in terms of  configuration, tuning, and scaling). Alternatively, Dataproc is a managed version of Hadoop, which minimizes the management overhead of Hadoop systems, while still being able to access non-proprietary, open-source data types.

Bring ML to your data

There are many users within an organization who have a part to play in the end-to-end data lifecycle. Consider a data analyst, who can simply write SQL queries to create data pipelines and analyze insights from BigQuery. Or a data scientist who dabbles with different aspects of building and validating models. Or an ML engineer who is responsible for the model to work without issues to end users in production systems. Users like data engineers, data analysts, and data scientists all have different needs, and we have intentionally built a comprehensive platform for them in mind.

Google Cloud also offers cloud-native tools to build an analytics lakehouse with the cost and performance benefits of the cloud. These include a few key pieces that we will discuss throughout the whitepaper:

Different storage options and optimizations depending on the data sources and end users consuming the data.

Several serverless and stateful compute engines, balancing the benefits of speed and costs as required by each use case for processing and analytics.

Democratized and self-service BI and ML tools, to maximize the value of data stored in the lakehouse.

Governance, ensuring productive and accountable use of data so that bureaucracy does not inhibit innovation and enablement.

Advanced analytics and AI

BigQuery supports predictive analytics through BigQuery ML, an in-database ML capability for ML training and predictions using SQL. It helps users with classification, regression, time-series forecasting, anomaly detection, and recommendation use cases. Users can also do predictive analytics with unstructured data for vision and text, leveraging Google’s state-of-the-art pre-trained model services like Vertex Vision, Natural Language Processing (Text) and Translate. This can be extended to video and text, with BigQuery’s built-in batch ML inference engine, which enables users to bring their own models to BigQuery, thereby simplifying data pipeline creation. Users can also leverage Vertex AI and third-party frameworks. 

Generative AI is a powerful and emerging technology but organizations are lacking a way to easily activate AI and move from experimentation into production. Integration with Cloud AI for Generative AIwill embed advanced text analysis with your analytics lakehouse. This opens up new possibilities for your data teams to use AI for sentiment analysis, data classification, enrichment, and language translations. 

Automate orchestration of repeatable tasks

Underpinning these architectural components is resilient automation and orchestration of repeatable tasks. With automation, as data moves through the system, improved accuracy instills confidence in end users to trust it, making them more likely to interact with and evangelize the analytics lakehouse. 

To learn more about the analytics lakehouse on Google Cloud, download the complimentary white paper. 

Further resources

Learn how Squarespace reduces the number of escalations by 87% with the analytics lakehouse.

Learn how Dun & Bradstreet improved performance by 5X with the analytics lakehouse.

Source : Data Analytics Read More

Accelerate Procure-to-Pay insights with Google Cloud Cortex Framework

Accelerate Procure-to-Pay insights with Google Cloud Cortex Framework

Enterprises running SAP for procurement and accounts payable are always looking to create greater efficiencies by combining these data sets to monitor vendor performance for quality and reliability, analyze global spend across the organization and optimize the use of working capital to make timely vendor payments that earn the highest discounts. 

But in today’s inflationary environment, the need to analyze data from procure-to-pay processes is more important than ever as rising prices threaten to reduce purchasing power and erode real income. Many of these enterprises are looking for accelerated ways to link their enterprise procure-to-pay data with surrounding non-SAP data sets and sources to gain more meaningful insights and business outcomes. Getting there faster, given the complexity and scale of managing and tying this data together, can be an expensive and challenging proposition.

To embark on this journey, many companies choose Google’s Data Cloud to integrate, accelerate and augment business insights through a cloud-first data platform approach with BigQuery to power data-driven innovation at scale. Next, they take advantage of best practices and accelerator content delivered with Google Cloud Cortex Framework to establish an open, scalable data foundation that can enable connected insights across a variety of use cases. Today, we are excited to announce the next offering of accelerators available that expand Cortex Data Foundation to include new packaged analytics solution templates and content for procure-to-pay processes. 

Introducing new analytics content for procure-to-pay

Release 4.2 of Google Cloud Cortex Framework includes new data marts, semantic views and template Looker dashboards to support operational analytics on three procure-to-pay topics and the relevant metrics for each.

Leverage the metrics delivered in our new Accounts Payable content to identify potential issues and areas for optimization with respect to short-term obligations to creditors and suppliers which have not yet been paid:

Accounts Payable Balance Due

Days Payable Outstanding

Accounts Payable Aging

Accounts Payable Turnover

Accounts Payable by Top Vendors

Upcoming Payments and Penalties

Cash Discount Utilization

Blocked and Parked Invoices

Analyze procurement spend on goods and services across your organization and identify opportunities to reduce costs and improve efficiency with the following metrics delivered in our new Spend Analysis content:

Total Spend

Spend by Top Vendors

Spend by Purchasing Organization

Spend by Purchasing Group

Spend by Vendor Country

Spend by Material Type

Active Vendors

Use the metrics included with our new Vendor Performance content to improve efficiency and profits by comprehensively analyzing the delivery performance, quality, accuracy and unit costs of your suppliers and then take tactical decisions to improve your bottom line by increasing business with your most reliable and lowest cost vendors:

On-time Delivery Performance

In-full Delivery Performance

Rejections Rate

Invoice Accuracy

Purchase Price Variance

Vendor Lead Time

Open Purchase Orders

What’s next

This release extends upon prior content releases for SAP and other data sources to further enhance the value of Cortex Data Foundation across private, public and community data sources. Google Cloud Cortex Framework continues to expand content to help better meet our customers’ needs for data analytics transformation journeys. Stay tuned for more announcements coming soon! 

To learn more about Google Cloud Cortex Framework, visit our solution page, and try out Cortex Data Foundation today to discover what’s possible!

Related Article

Accelerate SAP innovation with Google Cloud Cortex Framework

Google Cloud Cortex Framework is a foundation with endorsed reference architectures for customers to more quickly deploy technologies fro…

Read Article

Source : Data Analytics Read More

Bring analytics to your data: What’s new with BigQuery federated queries

Bring analytics to your data: What’s new with BigQuery federated queries

Google Cloud provides a unified, open, and intelligent data cloud for all your operational and analytical data. Whether your data is stored in Cloud Storage, Cloud Bigtable (NoSQL) or even another cloud, with BigQuery, you can run analytical queries directly on the data. One strength in particular for BigQuery is the reduction of toil that normally comes with ingesting data into your data warehouse from your operational databases. If you use a fully managed operational database such as Cloud Spanner, Cloud SQL or Cloud Bigtable, BigQuery can help simplify and unify your operational and analytical databases. 

One simplified approach for data movement from MySQL, PostgreSQL, AlloyDB, and Oracle databases directly into BigQuery is Datastream for BigQuery, a serverless and easy-to-use change data capture replication service that allows you to synchronize Cloud SQL or AlloyDB data into BigQuery. Another approach used by thousands of customers for both data movement as well as querying data in place from either Cloud SQL or Cloud Spanner are BigQuery Federated Queries, which let you send a query statement to an operational database and get the result back as BigQuery data (including a conversion to BigQuery data types). 

In this post, we will take a look at the different ways BigQuery federated query customers BT Group and MadHive take advantage of the Data Cloud’s unified data warehouse and operational databases. We will also review what’s new in federated queries, including:

SQL Pushdown

Private IP Access 

The setting of priority queues for Spanner federation

Spanner to BigQuery JSON type mapping 

How customers use federated queries 

There are thousands of customers already using federated queries for various reasons and in different ways. We have seen financial institutions who need to maintain regulated projects of customer data use federated queries to push down calculations into their operational stores and then bring back snapshots of aggregated data into BigQuery for further processing. Network security vendors have reduced the pain of maintaining and managing hundreds of ETL jobs by moving to federated queries that are triggered as BigQuery scheduled queries. Healthcare providers who have found themselves restricted on the use of CDC have been able to provide their data analysts with federated query access to the operational data from BigQuery in near real time.

Madhive, a leading technology company for modern tv advertising, makes heavy use of BigQuery to perform campaign analytics and data ingestion at petabyte scale.  Madhive has adopted federated queries to enrich large amounts of data in their ETL pipelines with data stored in Cloud SQL.  As the lead of the data engineering team explains, “federated queries vastly simplify our data pipelines by providing a seamless integration between BigQuery and CloudSQL, that is both easy to maintain and does not require additional tooling.”

BT Group’s Digital unit takes a different approach to federated queries. Instead of using federated queries for ETL, Digital prefers to reduce data duplication. Digital’s teams create views on top of the external query functions and let their analysts query the underlying operational databases via BigQuery views. This performs well enough for most use cases because of a new SQL pushdown feature described below. Crucially, this delivers a speed boost at reduced effort, improving the experience for the teams and enabling the teams to more rapidly support new and existing customer requirements.

New BigQuery federated query features

SQL pushdown

SQL pushdown is an optimization technique. It happens when BigQuery delegates operations like filtering down to the external data source (e.g., Cloud SQL or Cloud Spanner database) instead of performing them on their own. Thanks to that smaller amount of data needs to be transferred back to BigQuery, the overall query execution time is lower and the cost can be significantly reduced since less data is processed. SQL pushdown ecompasses both column pruning (SELECT clauses) and filter pushdowns (WHERE clauses).

Let’s take a look at how pushdowns work under the hood at the example of the SQL that is now generated in Cloud SQL via federated queries.

BigQuery

code_block[StructValue([(u’code’, u’SELECT COUNT(*) FROMrn(Select * fromrnEXTERNAL_QUERY(“<connection>”,rn”select * from operations_table”)rn)rnwhere a = ‘Y’rnand b not in (‘COMPLETE’,’CANCELLED’) and c = ‘Y’;“`’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed29cfa7d90>)])]

Cloud SQL – no pushdowns

code_block[StructValue([(u’code’, u’SELECT *rnFROM operations_table’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed29d942510>)])]

Cloud SQL – with pushdowns

code_block[StructValue([(u’code’, u’SELECTrn “a”,rn “b”,rn “c”rnFROM (rn SELECT *rn FROM operations_table) trnWHERErn ((“a” = $1)rn AND (NOT “b” IN ($2,rn $3))rn AND (“c” = $4))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed29e72c6d0>)])]

In the first column you can see a BigQuery query. As you can see it uses the EXTERNAL_QUERY function for communication with a Cloud SQL database. In the middle column you can see the query that would be sent to Cloud SQL without pushdowns. It’s exactly the same query that was provided by a user. If the source table has millions of rows and hundreds of columns, all of them would be sent to BigQuery even though only some of them are really needed. Finally, in the last column you can see how the original query provided by a user would be rewritten with pushdowns. This time only some columns and some rows would be sent back to BigQuery. 

SQL pushdowns increase the flexibility with which you can use federated queries in BigQuery. This is the feature that allows BT Group to simply put views on top of their federated query functions and get performant cross-database queries. BT Group’s Digital unit reviewed queries and found that a query similar to the above originally took 10 minutes to be executed by BigQuery federated queries. Now with pushdowns it takes only 26 seconds!

Currently, SQL pushdowns are only applied to queries of the form SELECT * FROM T since these queries constitute a significant percentage of all federated queries. Another limitation is that currently pushdowns are not supported in all cases e.g. not all data types are supported for filter pushdowns. We plan to support more queries and more types of pushdowns, so stay tuned.

Private IP access 
Many customers have sensitive datasets they need to keep isolated from the public internet. This new release allows customers to use federated queries on instances with private IP while keeping their data isolated from the public internet. 

Manage Spanner execution priority over federation
Cloud Spanner offers a request priority feature, which lets customers assign HIGH, MEDIUM and LOW priorities to specific queries. Queries to Spanner made via a federated BigQuery connection tend to be analytical in nature; for analytical queries,the best practice is to avoid contending with transactional or application requests. However, an analytical query should take precedence over something like a background job or scheduled backup. For most customer use cases, the default setting of MEDIUM meets this requirement. 

However, we had requests from gaming companies who wanted to ensure that their games never be interrupted for analytical queries, social media platforms who need to be up 24/7, and automotive companies that did not want to risk any analytical queries contending with their Spanner operations. For these outlier situations where prioritizing operations at all cost is paramount, customers can now set a LOW priority on their federated query. Please just use caution with this setting because LOW priority jobs can be preempted which could lead to failed queries. 

Here is a sample query that uses this new feature:

code_block[StructValue([(u’code’, u’SELECT *rnFROM EXTERNAL_QUERY(rn ‘my-project.us.example-db’,rn ”’SELECT customer_id, MIN(order_date) AS first_order_datern FROM ordersrn GROUP BY customer_id”’,rn ‘{“query_execution_priority”:”low”}’);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ed28f17ea90>)])]

Spanner to BQ JSON type mapping 

Both Spanner and BigQuery offer native data types for working with JSON. Unfortunately, since one JSON data type is intended for operational queries and the BigQuery native data type is optimized for analytical queries, the two were not initially compatible. However, with the introduction of a new JSON data type mapping for Spanner, the Spanner JSON type (including the Spanner PostgreSQL JSONB type) will automatically be converted into the BigQuery native JSON data type, allowing you to work with semi-structured, schema-changing data, across both operational and analytical databases 

Getting started

There are a few ways to get started with BigQuery. New customers get $300 in free credits to spend on BigQuery. All customers get 10 GB storage and up to 1 TB queries free per month, not charged against their credits. You can get these credits by signing up for the BigQuery free trial. Not ready yet? You can use the BigQuery sandbox without a credit card to see how it works. 

To learn more about federated queries with BigQuery, see the documentation. To learn more about Google Cloud databases such as Cloud SQL, Spanner and Bigtable, get started here.

Source : Data Analytics Read More

Datastream’s PostgreSQL source and BigQuery destination now generally available

Datastream’s PostgreSQL source and BigQuery destination now generally available

Last year, we announced the preview launch of Datastream for BigQuery, which provides seamless replication of data from operational databases, directly into BigQuery, Google Cloud’s serverless data warehouse, enabling organizations to quickly and easily make decisions based on real-time data. We’re happy to announce that Datastream for BigQuery is now generally available. 

Overview

Datastream for BigQuery delivers a unique, truly seamless and easy-to-use experience that enables real-time insights in BigQuery with just a few steps. Using BigQuery’s newly developed change data capture (CDC) and Storage Write API’s UPSERT functionality, Datastream efficiently replicates updates directly from source systems into BigQuery tables in real-time. You no longer have to waste valuable resources building and managing complex data pipelines, self-managed staging tables, tricky DML merge logic, or manual conversion from database-specific data types into BigQuery data types. Just configure your source database, connection type, and destination in BigQuery and you’re all set. Datastream for BigQuery will backfill historical data and continuously replicate new changes as they happen. 

How customers are using Datastream for BigQuery

Falabella, Latin America’s largest retail platform, has a physical presence in 100 locations and an online store. To monitor and continuously improve their business, Falabella relies on data analytics in their day-to-day business for various use cases, including:

Customer analytics: monitor customer behavior, preferences, and purchasing habits to optimize marketing efforts and improve customer experience.

Seller analytics: monitor seller performance, track sales and revenue data, and identify trends or issues that may impact the business.

Logistics analytics: monitor and improve shipping and delivery processes.

Sales and revenue management: monitor sales and revenue data, especially during sales events.

“Previously, data was replicated using full database snapshots which took hours to generate and load to BigQuery,” says René Delgado, Head of Data Solutions at Falabella. “This process was orchestrated using some custom tools that were created internally. When something failed we had to do manual checks in many places, and these custom tools were difficult to debug and repair. The first immediate benefit of Datastream is that we no longer have to maintain/monitor these custom data tools: ‘best code = no code!’”

In other cases, data scientists would spin up expensive database replicas to run their analytics queries. “With all the data now available in BigQuery, simply eliminating the need to create and manage these databases helped save Falabella ~$10,000 USD/month,” says Delgado.

“With Datastream, we have a single tool to perform seamless, near real-time replication of our operational data to BigQuery. Datastream helps us get much quicker insights on our operational data. This enables us to deliver more stable data products and to better address our business needs.”

New PostgreSQL Source

We are also excited to announce the general availability of Datastream’s PostgreSQL source. With the PostgreSQL source, Datastream can now ingest changes from a range of PostgreSQL databases, including AlloyDB, Cloud SQL, Amazon RDS, and self-hosted. Datastream’s PostgreSQL source reads from PostgreSQL’s Write-Ahead Log (WAL) using logical decoding. Using logical decoding gives you more flexibility and has a minimal impact on the database server’s load.

What we learned during the preview

Since our preview announcement, many customers have used Datastream to move data from PostgreSQL and other databases into BigQuery. They repeatedly praised Datastream’s ease of use, sharing how quickly they were able to successfully replicate data using Datastream, and comparing their experience with other solutions that took weeks or even months to achieve the same task. For example, one customer stated: “It’s brilliant. I can do an entire proof of concept in one week and be ready for production the next week.” Customers also highlighted Datastream’s robustness, noting how easily and transparently it handles typical scenarios such as upgrading the source database, handling database restarts, and managing failovers.

Getting started

Check out our quickstart for a detailed guide on creating a new Datastream stream. You can also try out this SkillsBoost lab for a step-by-step walkthrough of replicating from PostgreSQL to BigQuery.

Source : Data Analytics Read More

Why next-gen analytics needs comprehensive data quality monitoring with Anomalo

Why next-gen analytics needs comprehensive data quality monitoring with Anomalo

We often hear that being data-driven is essential for organizations. But there’s something missing from this statement. You need to be driven by high-quality data — or else you might be driving in the entirely wrong direction.

Google BigQuery unlocks powerful next-generation analytics and ML applications. To use BigQuery with high-quality data, you need a comprehensive data quality monitoring solution that alerts you of unexpected changes in the data itself and pinpoints the reasons behind those changes.

In this article, we’ll show you how to instill trust in your next-generation analytics strategy by setting up a comprehensive data quality monitoring strategy with BigQuery and Anomalo.

How data issues can affect the quality of your analytics

With a fully managed data warehouse like BigQuery, you can quickly spin up a modern data stack that supports a variety of analytics use cases. However, just as it’s easy to take advantage of more data sources and volume than ever before, it also leaves more scope for data issues. Unexpected changes in data values is a deep issue that’s particularly difficult to detect and understand, since there are often many possible explanations to evaluate. Was a spike in purchases caused by an error somewhere in your data pipeline, or by a seasonal trend? Is your ML model performing poorly because its input data has suddenly changed, or because there was a flaw in the model design?

Without confidence in the data quality, we see that teams face problems such as:

An executive notices some unusual numbers on a dashboard and suspects that there is a data quality bug, causing a fire drill for the analytics/data engineering teams as they try to find the root cause. 

An ML model starts performing erratically, affecting customers, because the distribution of the production data has drifted from the distribution of the training data without anyone on the team being alerted to the change.

In early 2022, there was a high-profile example of what can go wrong when Equifax reported millions of incorrect credit scores. Lacking the automated data quality needed to detect the issue, financial institutions used this data in production and ended up denying loans to qualified individuals.

With comprehensive data quality, you can detect complex issues at scale

Enterprises need data quality tools that can help them detect and resolve complicated data issues, before issues affect BI dashboards and reports or downstream ML models. These tools can answer questions like:

Is my data correct and consistent, based on the recent past and subject matter experts’ expectations of what the data should look like? 

Are there significant changes in my metrics? 

Why is my data unexpectedly changing? What is causing these issues?

Are my ML inputs drifting? Why?

Foundational data observability includes a number of basic tests, such as whether data pipelines completed successfully, or whether the volume of data ingested was in line with expectations. These tests focus more on the process than on the data itself. On the other hand, comprehensive data quality monitoring goes beyond basic observability checks and looks at the actual contents of the data. It helps with the hardest parts of data quality, like tracking data drift and monitoring metrics changes. 

Data observability might be sufficient if you’re in the early innings of your data journey, but if you’re using data to make decisions or as an input into ML models, as our customers are, then basic checks are not enough to ensure your data is accurate and trustworthy.

Dataplex is an intelligent data fabric that provides a way to manage, monitor, and govern your distributed data at scale. Dataplex provides two options to validate data quality: Auto data quality (Public Preview) & Dataplex data quality task. Dataplex AutoDQ and data quality now enable next-generation data quality solutions that automate rule creation and at-scale deployment of data quality. We also work with our partners like Anomalo with richer features to provide solutions completeness to our customers. 

Go beyond basic data observability with BigQuery and Anomalo 

Anomalo, a Google Cloud Ready – BigQuery partner, is a comprehensive data quality monitoring platform that plugs directly into your data stack. Anomalo’s deep data quality goes beyond checks like data freshness to automatically detect key metrics changes and table anomalies. Below, you can see how Anomalo is deployed on GCP and integrates with BigQuery and other services to identify customer’s data quality issues:

With continuous monitoring, you have peace of mind that your data is always accurate even as it evolves over time. The platform’s no-code UI makes it easy for anyone to be a data steward or consumer.

Anomalo provides rich, visual alerts that integrate with Slack and Gmail. Here’s an example of how Anomalo can let you know when data falls outside of an expected range. Observers get a quick overview of how unusual the data is compared to the norm, and can drill deeper for a complete breakdown of the data factors that contributed to the deviation.

Alert in Slack:

Expanded root cause analysis in Anomalo:

Anomalo supports a one-click integration with BigQuery. Simply enter a few details about your account, and Anomalo will automatically start providing both data observability and automated data quality for all of your selected tables.

In conclusion

BigQuery offers unprecedented scalability and speed for your analytics needs. As you use these capabilities to do more with your data, it’s important to have confidence that the data itself is high quality. Leveraging data insights without automated data quality can expose your business, products, and users to unwanted risk. When quality is in place, not only is risk minimized, but everyone knows they can trust the data, increasing the adoption of analytics across the organization. 

With Anomalo, BigQuery users can monitor for data issues and resolve them quickly. Click here to learn more about the BigQuery and Anmalo partnership. Click here to learn more about Anomalo.

Source : Data Analytics Read More

Squarespace reduces number of escalations by 87 percent with analytics lakehouse on Google Cloud

Squarespace reduces number of escalations by 87 percent with analytics lakehouse on Google Cloud

Editor’s note:Today we hear from Squarespace, an all-in-one website building and ecommerce platform, about its migration from a self-hosted Hadoop ecosystem to Google Cloud, running BigQuery, Dataproc and Google Kubernetes Engine (GKE). Read on to learn about how they planned and executed the migration, and what kinds of results they’re seeing.

Imagine you are a makeup artist who runs an in-home beauty business. As a small business owner, you don’t have a software development team who can build you a website. Yet you need to showcase your business, allow users to book appointments directly with you, handle the processing of payments, and continuously market your services to new and existing clients. 

Enter Squarespace, an all-in-one platform for websites, domains, online stores, marketing tools, and scheduling that allows users to create an online presence for their business. In this scenario, Squarespace enables the makeup artist to focus entirely on growing and running the business while Squarespace would handle digital website presence and administrative scheduling tasks. Squarespace controls and processes data for customers and their end-users. Squarespace stores data to help innovate customer user features and drive customer-driven priority investments in the platform.

Until Q4 2022, Squarespace had a self-hosted Hadoop ecosystem composed of two independently managed Hadoop clusters. Both clusters were “duplicated” because we utilized an active/passive model for geo-redundancy, and staging instances also existed. Over time, the software and hardware infrastructure started to age. We quickly ran out of disk space and came up against hard limits on what we could store. “Data purges,” or bulk deletion of files to free up disk space, became a near-quarterly exercise. 

By early 2021 we knew we couldn’t sustain the pace of growth that we saw in platform usage, particularly with our on-premises Presto and Hive deployments, which housed several hundred tables. The data platform team tasked with maintaining this infrastructure was also small and unable to keep up with scaling and maintaining the infrastructure while delivering platform functionality to users. We had hit a critical decision point: double-down on running our infrastructure or move to a cloud-managed solution. Supported by leadership, we opted for Google Cloud because we had confidence in the platform and knew we could migrate and deploy quickly.

Project planning

Dependency mapping

We took the necessary time upfront to map our planned cloud infrastructure. We decided what we wanted to keep, update or replace. This strategy was beneficial, as we could choose the best method for a given component. For example, we decided to replace our on-premises Hadoop Distributed File System (HDFS) with Cloud Storage using the Hadoop GCS Connector but to keep our existing reporting jobs unchanged and not rewrite them.

In-depth dependency tracking

We identified stakeholders and began communicating our intent early. After we engaged the teams, we began weekly syncs to discuss blockers. We also relied on visual project plans to manage the work, which helped us understand the dependencies across teams to complete the migration.

Radical deprioritization

We worked backward from our target date and ensured the next two months of work were refined during the migration. With the help of top-down executive sponsorship, the team consistently fought back interrupted work by saying ‘no, not now’ to non-critical requests. This allowed us to provide realistic timelines for when we would be able to complete those requests post-migration.

Responsibilities

We broke down responsibilities by team, allowing us to optimize how our time was spent.

The data platform teams built reusable tooling to rename and drop tables and to programmatically ‘diff; entire database tables to check for exact matching outputs. They also created a Trino instance on Google Cloud that mirrored the on-prem Presto.

The data engineering teams leveraged these tools while moving each of their data pipelines.

Technology strategy

Approach

We took an iterative approach to migrating the compute and storage for all of our systems and pipelines. Changing only small, individual pieces at a time, let us validate the outputs at every migration phase, realize iterative wins, all while saving hours of manual validation time to achieve the overall architecture shown below. 

Phase 1: Cutover platform and policies

Our first step was to cut over our query engine, Presto, to run on Google Kubernetes Engine (GKE). Deploying this service gave end users a place to start experimenting with queries with an updated version of the software, now called Trino. We also cut over our platform policies, meaning we established usage guidelines for using Cloud Storage buckets or assigning IAM privileges to access data. 

Phase 2: Move compute to the Cloud

Once we had a Trino instance in Google Cloud, we granted it access to our on-prem data stores. We updated our Trino Airflow operator to run jobs in Google Cloud instead of on-premises. Spark processes were migrated to Dataproc. One by one, stakeholder teams switched from executing on-premises compute to cloud-based compute while still reading and writing data on-premises. They also had the option to migrate straight to Google Cloud-based storage. We let teams decide what sequencing fit in their current commitment schedule but also stipulated a hard deadline to end on-premises storage.

Phase 3: Storage cutover 

After confirming the jobs were all successfully running in Google Cloud and validating the output, we started redirecting downstream processes to read from Google Cloud data sources (Cloud Storage/BigQuery). Over time we watched the HDFS read/writes to the old clusters get to zero. At that point we knew we could shut down the on-premises hardware.

Phase 4: Cutover orchestration

Our orchestration platform, Airflow, runs on-premises. We’re investigating moving to Cloud Composer in 2023. This component is a thin slice of functionality, but represents one of the largest lifts due to the number of teams and jobs involved.

Leadership support

Our internal project sponsor was the Infrastructure team which needed to sunset the hardware. To make room for the data team to focus solely on Google Cloud migration, the Infrastructure leadership team found any opportunities possible to take responsibilities off of the data group.

Leadership within the data group shielded their engineers from all other asks from other parts of the organization, giving them plenty of support to say “no” to all non-migration related requests.

Looking forward, what’s next for Squarespace? 

Following the successful migration to Google Cloud of our Hadoop ecosystem, we have seen the significant maintenance burden of the infrastructure disappear. From the months before our migration compared to the months immediately after, we’ve seen an 87% drop in the number of escalations. The data platform and data infrastructure teams have turned their attention away from monitoring the health of various services/filesystems, and are now focused on delivering new features and better software that our internal users need to move our business forward.

Next, in our analytics lakehouse plan is to continue the success we had with migrating our data lake and move more of the infrastructure responsibilities to Google Cloud. We’re  actively planning the migration of our on-prem data warehouse to BigQuery and have begun to explore moving our Airflow Instances to Cloud Composer

To learn how you can get started on your data modernization journey, contact Google Cloud.

Source : Data Analytics Read More

How BCW is using decentralized technology on Google Cloud

How BCW is using decentralized technology on Google Cloud

Editor’s note: Today’s blog discusses efforts by Arkhia, the organization developed by BCW Group, to provide Infrastructure-as-a-Service (IaaS) and Web3 backend solutions for blockchain & DLT developers. It is the first of three posts. 

BCW is a Web3 venture studio and enterprise consulting firm serving enterprise clients who want to integrate existing products or otherwise branch into the Web3 space. In this first of a three-part blog series, we will look at how BCW products use decentralized technology on Google Cloud. 

BCW provides companies with an arsenal of support services that include consulting, technical, and go-to-market. This ensures our customers’ early and ongoing success in accommodating the sophisticated demands that new entrants into the space face.

Arkhia and the Hedera Network

The first product outlined in this series, Arkhia, focuses on Hedera network infrastructure. Hedera has seen some of the largest inflows of Web3 developer activity over the past several months, which Arkhia is poised to serve with infrastructure and API solutions. 

For at least 15 years, the Web2 space has been dominated by traditional CRUD or central data storage applications. These applications worked well enough and were a decidedly significant improvement over the slow and rigid server structures they replaced. With the rise of Web3, however, a new paradigm in assets, computing, and security has arisen offering new forms of human interaction. As with any new paradigm, new hurdles have emerged that can inhibit adoption, such as growing hardware demands, increased need for reliability from relatively new protocols and codebases, and a lack of consistent definitions and access points.

To mitigate these challenges, Arkhia provides a gateway and workbench for aspiring Web3 developers and enterprises to access the underlying infrastructure of decentralized ledger technology (DLT). For example, Arkhia’s Workbench tool lets builders instantly view data schemas on the Hedera mirror node. The Watchtower tool enables subscribing to Hedera Consensus Service (HCS) through WebSockets by automatically translating HTTP2 to HTTP1, a critical functionality for higher-end development goals. Additionally, Arkhia provides enhanced APIs such as streams, subscriptions, and queries through the HCS gRPC, REST API, and JSON RPC Relay.

A fundamental aspect of our offerings is the use of Google Cloud products, which support the ability to build at-scale without compromising the principles of Web3. Pairing product process and service selection with Google Cloud technologies has empowered Arkhia to apply successful principles from the Web2 world to this new decentralized model of interaction.

Leveraging Google Cloud data services

The robust and wide ranging data services on Google Cloud, such as BigQuery, Dataflow, and Airflow, along with core infrastructural offerings such as Cloud Load Balancing and Cloud Storage, have enabled Arkhia to build out a highly scaled, wide reaching application that transforms a DLT into a data lake. This merges both the worlds of analytics and observation with participation, meeting the demands of internal and externally oriented strategies.

When viewing the customer journey, we find that a good DLT use case is nested in sound data strategy. In considering how to apply data strategy or blockchain strategy, we must consider the clients’ needs for specific use cases. Anyone familiar with design thinking frameworks will find this first step a no-brainer. Most importantly, we must determine how data will fit into that strategy. 

When thinking about our customers, we tend to think of data in terms of external versus internal posturing. With external posturing, data is proactively applied when making decisions concerning customer success and marketing efforts in order to cultivate the initiatives and businesses of our clients. In an internal posture, data is applied to accommodate digital transformations, regulatory compliance, and optimization processes.

Building a DLT posture is similar. When we think externally, our customers begin to see blockchain as a core component of products and services such as payment rails, remittance services, exchanges (both DEX and CEX), and proof-of-reserve systems – with so many more to list. 

Thinking internally, we often have customers looking to high-performance applications to service their organizations as mechanisms to reduce cost between departments and/or create CRM or employee incentive applications. It should be noted that the lines between external and internal thinking are often initially drawn between private and public chains. By making Google Cloud an inextricable part of its operations, Arkhia’s flexible set of tools allows clients to blur these lines and reduce stack.

Arkhia meets both postures simultaneously by creating access to the unique service layers of Hedera. By building a workbench of tooling around core out-of-the-box Hedera Hashgraph services such as HCS, Hedera Token Service (HTS), and the mirror node layers, access points for data and transactions are merged. This also enables technology teams to build effective technologies and strategies as they venture into the DLT wilds.

The Arkhia team leverages Google Cloud to mitigate challenges that arise from nascent code and libraries in DLT and creates a stable platform on which products can be built.

The collaboration between Google Cloud and Arkhia addresses concerns about the ubiquity of cloud technologies concerning the growing conversations on open ledgers (e.g. Bitcoin, Ethereum, etc.) and the privilege of anonymity which they ostensibly support. In Hedera’s case, there is a tolerance, if not an expectation, that known parties handle the infrastructure, which is important in data systems.

BigQuery, Billing, and Pub-Sub offerings lay the groundwork for the Arkhia team to investigate node-as-a-service offerings, highly scaled JSON-RPC deployments, and even independent ancillary services that can speed up builds, allow developers to compose Hedera services, and allow for further application-level flexibility.

As BigQuery expands the possibilities for the world of Web3 through ETL initiatives and data ingestion, Arkhia ensures that users can move forward confidently regardless of high-storage-intensive paradigms that are typical to the DLT ecosystem. From a business management perspective, BigQuery eliminates the need for the overhead of a high-cost infrastructure because it provides the opportunity to stream and consume blockchain data in near real-time for decentralized apps (dApps) and other services. It does all this with real-time analytics on all kinds of data to increase and improve both internal and external insights.

For business managers at Arkhia who work in the DLT IaaS space, Cloud Billing provides several improvements. Controlling billing with a detailed overview of services, costs, and computation/storage units can give managers the critical edge they need to not just survive, but thrive. Arkhia uses Google Cloud’s sophisticated features to adequately manage its scale and costs.

A comprehensive suite of APIs and Pub/Sub offerings

Google Cloud also has a comprehensive suite of APIs and Pub/Sub offerings for baked-in functionality, enabling quickstart POC’s to rapidly build and iterate at the speed of Web 3 innovation. Due to a high level of internal networking sophistication, Pub/Sub tools integrate easily with services such as BigQuery and Google Kubernetes Engine to reduce development time and tooling for real-time blockchain data. Using Google Cloud’s Pub/Sub to improve data streaming performance between data systems and clients is attractive to clients who want high-availability, security, and performance from data that is sensitive to time and accuracy.

Google Cloud’s computing and services empower platforms such as Arkhia to rapidly build scalable and stable infrastructure on the rapidly changing states of blockchain environments. As a greater number of firms transition their operations to environmentally sustainable tools that support Web3 functionality, Google Cloud will assist infrastructure developers in maintaining an ethos of decentralization.

Finally, Google cloud’s commitment to sustainability allows Arkhia to participate in a carbon-neutral decentralized service without violating the core values or benchmarks established by the curators of the Hedera protocol.

As we continue to combine principles of data strategy, practice, and capabilities, we see Google Cloud’s offerings as a critical piece of tooling for our system. From end-to-end, Arkhia and Google Cloud seek to build decentralized services for our clients and customers while supporting developers who grapple with the unique challenges of DLT and blockchain.

Source : Data Analytics Read More

New BigQuery editions: flexibility and predictability for your data cloud

New BigQuery editions: flexibility and predictability for your data cloud

When it comes to their data platforms, organizations want flexibility, predictable pricing, and the best price performance. Today at the Data Cloud & AI Summit, we are announcing BigQuery editions with three pricing tiers — Standard, Enterprise and Enterprise Plus — for you to choose from, with the ability to mix and match for the right price-performance based on your individual workload needs. 

BigQuery editions come with two innovations. First, we are announcing compute capacity autoscaling that adds fine-grained compute resources in real-time to match the needs of your workload demands, and ensure you only pay for the compute capacity you use. Second, compressed storage pricing allows you to only pay for data storage after it’s been highly compressed. With compressed storage pricing, you can reduce your storage costs while increasing your data footprint at the same time. These updates reflect our commitment to offer new, flexible pricing models for our cloud portfolio.

With over a decade of continuous innovation and working together with customers, we’ve made BigQuery one of the most unified, open, secure and intelligent data analytics platforms on the market, and a central component of your data cloud. Unique capabilities include BigQuery ML for using machine learning through SQL, BigQuery Omni for cross-cloud analytics, BigLake for unifying data warehouses and lakes, support for analyzing all types of data, an integrated experience for Apache Spark, geospatial analysis, and much more. All these capabilities build on the recent innovations we announced at Google Cloud Next in 2022.

“Google Cloud has taken a significant step to mature the way customers can consume data analytics. Fine-grained autoscaling ensures customers pay only for what they use, and the new BigQuery editions is designed to provide more pricing choice for their workloads.” — Sanjeev Mohan, Principal at SanjMo & former Gartner Research VP. 

With our new flexible pricing options, the ability to mix and match editions, and multi-year usage discounts, BigQuery customers can gain improved predictability and lower total cost of ownership. In addition, with BigQuery’s new granular autoscaling, we estimate customers can reduce their current committed capacity by 30-40%. 

“BigQuery’s flexible support for pricing allows PayPal to consolidate data as a lakehouse. Compressed storage along with autoscale options in BigQuery helps us provide scalable data processing pipelines and data usage in a cost-effective manner to our user community.” – Bala Natarajan, VP Enterprise Data Platforms at PayPal. 

More flexibility to optimize data workloads for price-performance

BigQuery editions allow you to pick the right feature set for individual workload requirements. For example, the Standard Edition is best for ad-hoc, development, and test workloads, while Enterprise has increased security, governance, machine learning and data management features. Enterprise Plus is targeted at mission-critical workloads that demand high uptime, availability and recovery requirements, or have complex regulatory needs. The table below describes each packaging option.

Prices above are for the US. For regional pricing, refer to the detailed pricing page

Pay only for what you use

BigQuery autoscaler manages compute capacity for you. You can set up maximum and optional baseline compute capacity, and let BigQuery take care of provisioning and optimizing compute capacity based on usage without any manual intervention on your part. This ensures you get sufficient capacity while reducing management overhead and underutilized capacity. 

Unlike alternative VM-based solutions that charge for a full warehouse with pre-provisioned, fixed capacity, BigQuery harnesses the power of a serverless architecture to provision additional capacity in increments of slots with per-minute billing, so you only pay for what you use. 

“BigQuery’s new pricing flexibility allows us to use editions to support the needs of our business at the most granular level.” — Antoine Castex, Group Data Architect at L’Oréal.

Here are a few examples of customers benefiting from autoscaling: 

Retailers experiencing spikes in demand, scaling for a few hours a couple of times per year

Analysts compiling quarterly financial reports for the CFO

Startups managing unpredictable needs in the early stages of their business

Digital natives preparing for variable demand during new product launches

Healthcare organizations scaling usage during seasonal outbreaks like the flu

Lower your data storage costs

As data volumes grow exponentially, customers find it increasingly complex and expensive to store and manage data at scale. With the compressed storage billing model you can manage complexity across all data types while keeping costs low.

Compressed storage in BigQuery is grounded in our years of innovation in storage optimization, columnar compression, and compaction. With this feature, leader in security operations Exabeam has achieved a compression rate of more than 12:1 and can store more data at a lower cost which helps their customers solve the most complex security challenges. As customers migrate to BigQuery editions or continue to leverage the on-demand model, they can take advantage of the compressed storage billing model to store more data cost-efficiently.

Next steps for BigQuery customers 

Starting on July 5, 2023, BigQuery customers will no longer be able to purchase flat-rate annual, flat-rate monthly, and flex slot commitments. Customers already leveraging existing flat-rate pricing can begin migrating their flat and flex capacity to the right edition based on their business requirements, with options to move to edition tiers as their needs change. 

Taking into account BigQuery’s serverless functionality, query performance, and capability improvements, we are increasing the price of the on-demand analysis model by 25% across all regions, starting on July 5, 2023. 

Irrespective of which pricing model you choose, the combination of these innovations with multi-year commitment usage discounts, can help you lower your total cost of ownership. Refer to the latest BigQuery cost optimization guide to learn more.

Customers will receive more information about the changes coming to BigQuery’s commercial model through a Mandatory Service Announcement email in the next few days. In the meantime, check out the FAQs, pricing information and product documentation, and register for the upcoming BigQuery roadmap session on April 5, 2023 to learn more about BigQuery’s latest innovations.

1. All Google Cloud Platform wide certifications including ISO 9001, ISO 27001, SOC 1-3, PCI (link for full list)
2. Roadmap functionality

Source : Data Analytics Read More

Solving for the next era of innovation and efficiency with data and AI

Solving for the next era of innovation and efficiency with data and AI

Even in today’s changing business climate, our customers’ needs have never been more clear: They want to reduce operating costs, boost revenue, and transform customer experiences. Today, at our third annual Google Data Cloud & AI Summit, we are announcing new product innovations and partner offerings that can optimize price-performance, help you take advantage of open ecosystems, securely set data standards, and bring the magic of AI and ML to existing data, while embracing a vibrant partner ecosystem. Our key innovations will enable customers to:

Improve data cost predictability using BigQuery editions

Break free from legacy databases with AlloyDB Omni

Unify trusted metrics across the organization with Looker Modeler

Extend AI & ML insights to BigQuery and other third-party platforms  

Help reduce operating costs for BigQuery

In the face of fast-changing market conditions, organizations need smarter systems that provide the required efficiency and flexibility to adapt. That is why today, we’re excited to introduce new BigQuery pricing editions along with innovations for autoscaling and a new compressed storage billing model.

BigQuery editions provide more choice and flexibility for you to select the right feature set for various workload requirements. You can mix and match among Standard, Enterprise, and Enterprise Plus editions to achieve the preferred price-performance by workload.

BigQuery editions include the ability for single or multi-year commitments at lower prices for predictable workloads and new autoscaling that supports unpredictable workloads by providing the option to pay only for the compute capacity you use. And unlike alternative VM-based solutions that charge for a full warehouse with a pre-provisioned, fixed capacity, BigQuery harnesses the power of a serverless architecture to provision additional capacity in granular increments to help you not overpay for underutilized capacity. Additionally, we are offering a new compressed storage billing model for BigQuery editions customers, which can reduce costs depending on the type of data stored. 

Break free from legacy databases with AlloyDB

For many organizations, reducing costs means migrating from expensive legacy databases. But sometimes, they can’t move as fast as they want, because their workloads are restricted to on-premises data centers due to regulatory or data sovereignty requirements, or they’re running their application at the edge. Many customers need a path to support in-place modernization with AlloyDB, our high performance, PostgreSQL-compatible database, as a stepping stone to the cloud. 

Today, we’re excited to announce the technology preview of AlloyDB Omni, a downloadable edition of AlloyDB designed to run on-premises, at the edge, across clouds, or even on developer laptops. AlloyDB Omni offers the AlloyDB benefits you’ve come to love, including high performance, PostgreSQL compatibility, and Google Cloud support, all at a fraction of the cost of legacy databases. In our performance tests, AlloyDB Omni is more than 2x faster than standard PostgreSQL for transactional workloads, and delivers up to 100x faster analytical queries than standard PostgreSQL. Download the free developer offering today at https://cloud.google.com/alloydb/omni.

And to make it easy for you to take advantage of our open data cloud, we’re announcing Google Cloud’s new Database Migration Assessment (DMA) tool, as part of the Database Migration Program. This new tool provides easy-to-understand reports that demonstrate the effort required to move to one of our PostgreSQL databases — whether it’s AlloyDB or Cloud SQL. Contact us today at g.co/cloud/migrate-today to get started with your migration journey.

Securely set data standards

Data-driven organizations need to know they can trust the data in their business intelligence (BI) tools. Today we are announcing Looker Modeler, which allows you to define metrics about your business using Looker’s innovative semantic modeling layer. Looker Modeler is the single source of truth for your metrics, which you can share with the BI tools of your choice, such as PowerBI, Tableau, and ThoughtSpot, or Google solutions like Connected Sheets and Looker Studio, providing users with quality data to make informed decisions. 

In addition to Looker Modeler, we are also announcing BigQuery data clean rooms, to help organizations to share and match datasets across companies while respecting user privacy. In Q3, you should be able to use BigQuery data clean rooms to share data and collaborate on analysis with trusted partners, all while preserving privacy protections. One common use case for marketers could be combining ads campaign data with your first-party data to unlock insights and improve campaigns.

We are also extending our vision for data clean rooms with several new partnerships. Habu will integrate with BigQuery to support privacy safe data orchestration and their data clean room service. LiveRamp on Google Cloud will enable privacy-centric data collaboration and identity resolution right within BigQuery to help drive more effective data partnerships. Lytics is a customer data platform built on BigQuery, to help activate insights across marketing channels.

Bring ML to your data

BigQuery ML, which empowers data analysts to use machine learning through existing SQL tools and skills, saw over 200% year overview growth in usage in 2022. Since BigQuery ML became generally available in 2019, customers have run hundreds of millions of prediction and training queries. Google Cloud provides infrastructure for developers to work with data, AI, and ML, including Vertex AI, Cloud Tensor Processing Units (TPUs), and the latest GPUs from Nvidia. To bring ML closer to your data, we are announcing new capabilities in BigQuery that will allow users to import models such as PyTorch, host remote models on Vertex AI, and run pre-trained models from Vertex AI.

Building on our open ecosystem for AI development, we’re also announcing partnerships to bring more choice and capabilities for customers to turn their data into insights from AI and ML, including new integrations between: 

DataRobot and BigQuery provide users with repeatable code patterns to help developers modernize deployment and experiment with ML models more quickly. 

Neo4j and BigQuery, allowing users to extend SQL analysis with graph data science and ML using BigQuery, Vertex AI and Colab notebooks. 

ThoughtSpot and multiple Google Cloud services — BigQuery, Looker, and Connected Sheets — which will provide more AI-driven, natural language search capabilities to help users more quickly get insights from their business data.

Accelerate your Data Cloud with an open ecosystem

Over 900 software partners power their applications using Google’s Data Cloud. Partners have extended Google Cloud’s open ecosystem by introducing new ways for customers to accelerate their data journeys. Here are a few updates from our data cloud partners: 

Crux Informatics is making more than 1,000 new datasets available on Analytics Hub, with plans to increase to over 2,000 datasets later this year. 

Starburst is deepening its integration with BigQuery and Dataplex so that customers can bring analytics to their data no matter where it resides, including data lakes, multi and hybrid cloud sources. 

Collibra introduced new features across BigQuery, Dataplex, Cloud Storage, and AlloyDB to help customers gain a deeper understanding of their business with trusted data.

Informatica launched a cloud-native, AI-powered master data management service on Google Cloud to make it easier for customers to connect data across the enterprise for a contextual 360-degree view and insights  in BigQuery. 

Google Cloud Ready for AlloyDB is a new program that recognizes partner solutions that have met stringent integration requirements with AlloyDB. Thirty partners have already achieved the Cloud Ready – AlloyDB designation, including Collibra, Confluent, Datadog, Microstrategy, and Striim. 

At Google Cloud, we believe that data and AI have the power to transform your business. Join all our sessions at the Google Data Cloud & AI Summit for more on the announcements we’ve highlighted today. Dive into customer and partner sessions, and access hands-on content on the summit website. Finally, join our Data Cloud Live events series happening in a city near you.

Source : Data Analytics Read More