Spark on Google Cloud: Serverless Spark jobs made seamless for all data users

Spark on Google Cloud: Serverless Spark jobs made seamless for all data users

Apache Spark has become a popular platform as it can serve all of data engineering, data exploration, and machine learning use cases. However, Spark still requires the on-premises way of managing clusters and tuning infrastructure for each job. Also, end to end use cases require Spark to be used along with technologies like TensorFlow, and programming languages like SQL and Python. Today, these operate in silos, with Spark on unstructured data lakes, SQL on data warehouses, and TensorFlow in completely separate machine learning platforms. This increases costs, reduces agility, and makes governance extremely hard; prohibiting enterprises from making insights available to the right users at the right time.

Announcing Spark on Google Cloud, now serverless and integrated

We are excited to announce Spark on Google Cloud, bringing industry’s first autoscaling serverless Spark, seamlessly integrated with the best of Google Cloud and open source tools, so you can effortlessly power ETL, data science, and data analytics use cases at scale. Google Cloud has been running large scale business critical Spark workloads for enterprise customers for 6+ years, using open source Spark in Dataproc. Today, we are furthering our commitment by enabling customers to:

Eliminate time spent managing Spark clusters: With serverless Spark, users submit their Spark jobs, and let them do auto-provision, and autoscale to finish.

Enable data users of all levels: Connect, analyze, and execute Spark jobs from the interface of users’ choice including BigQuery, Vertex AI or Dataplex, in 2 clicks, without any custom integrations.

Retain flexibility of consumption: No one size fits all. Use Spark as serverless, deploy on Google Kubernetes Engine (GKE), or on compute clusters based on the requirements.

With Spark on Google Cloud, we are providing a way for customers to use Spark in a cloud native manner (serverless), and seamlessly with tools used by data engineers, data analysts, and data scientists for their use cases. These tools will help customers on their way to realize the data platform redesign they have embarked on.

“Deutsche Bank is using Spark for a variety of different use cases. Migrating to GCP and adopting Serverless Spark for Dataproc allows us to optimize our resource utilization and reduce manual effort so our engineering teams can focus on delivering data products for our business instead of managing infrastructure. At the same time we can retain the existing code base and knowhow of our engineers, thus boosting adoption and making the migration a seamless experience.”—Balaji Maragalla, Director Big Data Platform, Deutsche Bank

“We see serverless Spark playing a central role in our data strategy. Serverless Spark will provide an efficient, seamless solution for teams that aren’t familiar with big data technology or don’t need to bother with idiosyncrasies of Spark to solve their own processing needs. We’re excited about the serverless aspect of the offering, as well as the seamless integration with BigQuery, Vertex AI, Dataplex and other data services.” —Saral Jain, Director of Engineering, Infrastructure and Data, Snap Inc.

Dataproc Serverless for Spark

Per IDC, developers spend 40% time writing code, and 60% of the time tuning infrastructure and managing clusters. Furthermore, not all Spark developers are infrastructure experts, resulting in higher costs and productivity impact. With serverless Spark, developers can spend all their time on the code and logic. They do not need to manage clusters or tune infrastructure. They submit Spark jobs from their interface of choice, and processing is auto-scaled to match the needs of the job. Furthermore, while Spark users today pay for the time the infrastructure is running, with serverless Spark they only pay for the job duration.

Spark through BigQuery

BigQuery, the leading data warehouse, now provides a unified interface for data analysts to write SQL or PySpark. The code is executed using serverless Spark seamlessly, without the need for infrastructure provisioning. BigQuery has been the pioneer for serverless data warehousing, and now supports serverless Spark for Spark-based analytics.

Spark through Vertex AI

Data scientists no longer need to go through custom integrations to use Spark with their notebooks. Through Vertex AI Workbench, they can connect to Spark with a single click, and do interactive development. With Vertex AI, Spark can easily be used together with other ML frameworks like TensorFlow, Pytorch, Sci-kit learn, and BigQuery ML. All the Google Cloud security, compliance, and IAM are automatically applied across Vertex AI and Spark. Once you are ready to deploy the ML models, the notebook can be executed as a Spark job in Dataproc, and scheduled as part of Vertex AI Pipelines.

Spark through Dataplex

Dataplex is an intelligent data fabric that enables organizations to centrally manage, monitor, and govern their data across data lakes, data warehouses, and data marts with consistent controls, providing access to trusted data and powering analytics at scale. Now, you can use Spark on distributed data natively through Dataplex. Dataplex provides a collaborative analytics interface, with 1-click access to SparkSQL, Notebooks, or PySpark, and the ability to save, share, search notebooks and scripts alongside data.

Flexibility of consumption

We understand one size does not fit all. Spark is available for consumption in 3 different ways based on your specific needs. For customers standardizing on Kubernetes for infrastructure management, run Spark on Google Kubernetes Engine (GKE) to improve resource utilization and simplify infrastructure management. For customers looking for Hadoop style infrastructure management, run Spark on Google Compute Engine (GCE). For customers, who’re looking for no-ops Spark deployment, use serverless Spark! 

ESG Senior Analyst Mike Leone commented, “Google Cloud is making Spark easier to use and more accessible to a wide range of users through a single, integrated platform. The ability to run Spark in a serverless manner, and through BigQuery and Vertex AI will create significant productivity improvement for customers. Further, Google’s focus on security and governance makes this Spark portfolio useful to all enterprises as they continue migrating to the Cloud.”

Getting started

Dataproc Serverless for Spark will be Generally Available within a few weeks. BigQuery and Dataplex integration is in Private Preview. Vertex AI workbench is available in Public Preview, you can get started here. For all capabilities, you can request for Preview access through this form.

You can work with Google Cloud partners to get started as well.

“We are excited to partner with Google Cloud as we look to provide our joint customers with the latest innovations on Spark. We see Spark being used for a variety of analytics and ML use cases. Google is taking Spark a step further by making it serverless, and available through BigQuery, Vertex AI and Dataplex for a wide spectrum of users.” —Sharad Kumar, Cloud First data and AI Lead at Accenture

For more information, visit our website or the watch announcement video and our conversation with Snap at Next 2021.

Source : Data Analytics Read More

Here’s what you missed at Next ’21

Here’s what you missed at Next ’21

Google Cloud Next ‘21 is over, but the learning is just beginning. With three days of keynotes, deep dives, and announcements, there was a lot to take in! But don’t worry if you missed something—the Google Cloud Blog team is here to round up our favorite announcements of Next ‘21.

The biggest announcements

You can catch up on all the Next announcements in this comprehensive list, but we know that’s a lot! Here are the standouts.

Living on the edge

You get a cloud … and you get a cloud! We think Oprah would approve of Google Distributed Cloud, announced during Monday’s Thomas Kurian keynote: a portfolio of fully managed hardware and software solutions that extend Google Cloud’s infrastructure and services to data centers and the edge.  Distributed Cloud is powered by Anthos, which also got a slate of upgrades this week including VM support, and you’ll find it useful in all sorts of situations from running low-latency edge workloads or private 5G/LTE solutions to meeting local sovereignty requirements. Reality at the edge is messy, but managing it doesn’t have to be.

Google security on your side

The Google Cybersecurity Action Team (GCAT) might sound like a cult-classic 80s Saturday morning cartoon lineup, but it’s also a group of security experts we’ve assembled to bring Google-grade security chops to governments and businesses around the world. You can rely on them for threat briefings, proven security blueprints, and strategic sessions designed to help you build a trusted cloud. To get things started, GCAT has released a Security and Resilience Framework using Google Cloud and partner technologies. Now we just need to work on a theme song.

AI breakthroughs for industry

Buzzwords begone. The whole point of machine learning and AI is to do something with it, something that helps your business. So we’ve made Contact Center AI (CCAI) Insights generally available, and also added Contract DocAI to our DocAI lineup. CCAI Insights helps you mine contact center interactions to create better customer experiences—whether your call center is staffed by humans or virtual agents. Contract DocAI makes it faster and less expensive to analyze contracts, the most critical documents of all. Both are business tools that solve real problems—no buzzwords necessary.

Sprucing up with the cleanest cloud

Google Cloud is proud of our sustainability track record as the cleanest cloud in the industry. But we want to help you go even further. With the newly-announced Carbon Footprint tool, every Google Cloud user—that means you!—can access the gross carbon emissions associated with the services you use in Google Cloud. Now you can measure, track, and report your carbon footprint. Plus we’ve integrated sustainability into Unattended Project Recommender, so you can reduce your footprint even further by deleting unattended projects.

Data analytics unite!

Unification was the theme of this year’s data announcements. Vertex AI Workbench launched in public preview—a single Jupyter-based environment for data scientists to complete all of their ML work, from experimentation, to deployment, to managing and monitoring models. But it’s not just for Vertex AI—you can also analyze data from BigQuery, Dataproc, Spark, and Looker in one interface.

And don’t sleep on the private alpha launch of BigQuery Omni, which takes that theme of unification even further by allowing you to analyze data from other clouds using BigQuery. Later this month, you’ll be able to securely query S3 data in AWS or Azure Blob Storage data in Azure directly through the familiar BigQuery user interface, bringing the power of BigQuery to where the data resides. 

Collaboration gets cloudier

Google Workspace (the artist formerly known as G Suite) is a core part of Google Cloud, and we announced all sorts of exciting updates and integrations to our collaboration products this year. Client-side encryption for Google Meet, Data Loss Prevention (DLP) for Chat, and Drive labels for sensitive files are all new at Next. Perhaps most critical for organizations is our just-announced Work Safer Program, which helps protect your Google Workspace users against rising cybersecurity threats with industry-leading solutions from Google and our partners.

Keeping the band together

Throughout Next, we were proud to celebrate our most innovative customers, partners, and community members. And we want to keep the good vibes going all year long with our new Innovators community. You’ll get the inside scoop on our roadmap, get access to exclusive events, and much more. Everyone is welcome to join, and we’ll have all sorts of cool opportunities for Innovators coming up. Join the program today to stay informed and come along on the journey. 

Keynotes and sessions

The live sessions are over, but you can still register to view sessions on demand through November 5th. We’ve created a collection of themed playlists to guide you—whether you’re a developer, an executive, or an industry expert, you’ll find something helpful here. If you’ve got time for just one session, we recommend CEO Thomas Kurian’s keynote, which covers many of this year’s biggest announcements.

Thanks again for learning and growing with us in 2021. We’ll have more Next recaps and breakdowns coming up on the blog in the weeks to come—stay tuned!

Related Article

Solving for What’s Next

Exciting announcements, customer stories, and technical deep dives headline this year’s Google Cloud Next. Thomas Kurian reveals the late…

Read Article

Source : Data Analytics Read More

Accelerate SAP innovation with Google Cloud Cortex Framework

Accelerate SAP innovation with Google Cloud Cortex Framework

Digital transformation is about gaining speed, agility, and efficiency. The faster and more easily your organization operates on a modern cloud platform, the sooner it can experience the benefits.

Today, we are excited to introduce Google Cloud Cortex Framework, a foundation of endorsed solution reference templates and content for customers to accelerate business outcomes with less risk, complexity, and cost. Google Cloud Cortex Framework allows you to kickstart insights and reduce time-to-value with reference architectures, packaged services, and deployment accelerators that guide you from planning to delivery so you can get up and running quickly. You can deploy templatized solutions from Google Cloud and our trusted partners for specific use cases and business scenarios in a faster, more cost-effective way.

Our data foundation release

In our first release, customers can take advantage of a rich data foundation of building blocks and templates for SAP environments. Customers can leverage our:

Scalable data cloud foundation to combine the best of SAP and non-SAP data to drive new insights; 

Pre-defined BigQuery operational data marts and change data capture (CDC) processing scripts to take the guesswork out of modeling and data processing; and 

BigQuery ML templates, which provide advanced machine-learning capabilities for common business scenarios such as Product Recommendations and Customer Segmentation. 

See below for an example of some of these templates within BigQuery.

Together with plug-and-play Looker dashboard templates, customers can gain fast insights into sales, orders, products, customers, and much more. But this is just the beginning. We see Google Cloud Cortex Framework as a “content factory” that will expand to address new use cases, incorporate best practices, industry scenarios, and build on our cumulative experiences in enterprise environments.

“At Google Cloud, our goal is to make it as easy as possible for SAP customers to modernize in the cloud,” says Abdul Razack, VP, Solutions Engineering, Technology Solutions and Strategy, Google Cloud. “Google Cloud Cortex Framework is our latest innovation to that end. With readily available reference architectures and other tools, SAP customers now have what they need to design, build, and deploy advanced cloud solutions and accelerate business outcomes.”

Get up and running quickly

The Google Cloud Cortex Framework helps us answer a common question we hear from our customers: “How do I get started?” Google Cloud Cortex Framework can help customers with an off-the-shelf packaged approach that they can implement and customize to their own specifications and provides multiple benefits:

Accelerate business outcomes with easy-to-leverage, scenario-driven reference architectures and content that remove the guesswork from deployments. Expedite value with line-of-business and industry example solutions and packaged services from Google Cloud and partners.

Reduce risk, complexity, and cost with proven deployment templates. Deploy the industry’s most advanced cloud-native capabilities at a fraction of the time and cost of from-scratch, in-house efforts. Support business process improvement with accurate and relevant insights to quickly deliver differentiating capabilities to your customers.

Leverage a scalable technology strategy for future innovation by standardizing on a reusable data and analytics architecture. Easily identify and support the innovative technologies required to deliver a full range of current and future scenarios. Provide the building blocks and blueprints you need to prepare for the future, and upskill your team so they can deploy the technology you need to support your business objectives today and tomorrow.

Our partner ecosystem makes Google Cloud Cortex Framework possible

Today’s launch of Google Cloud Cortex Framework includes support from a large ecosystem of partners such as Accenture, Infosys, Palantir, C3.AI, Informatica, HVR, Qlik, Pluto7, ATOS, CapGemini, Cognizant, Deloitte, HCL, Lemongrass, NIMBL, PwC, SpringML and TCS who will be offering solutions and services to accelerate customer innovation. These partners are adopting and augmenting Google Cloud Cortex Framework to enable customers to more rapidly deploy and drive value for their organizations. With vast customer and partner interest in advancing data landscapes leveraging Google Cloud, we will continue to develop the ecosystem of Google Cloud Cortex Framework partners.

As foundational partners, Accenture and Infosys have been instrumental in our solution engineering efforts, leveraging their strengths in the data and analytics space.  

“Organizations today rely on increasing volumes of data to quickly react and respond to change. To handle the high volume and variety of data from disparate sources, our clients need a modern data foundation that can respond rapidly to those growing demands. Google’s Cortex enables us to align our assets and industry solution models into a consistent architecture for our clients to drive business agility, customer intimacy, and real-time decision-making.” – Tom Stuermer, global lead of Accenture Google Business Group at Accenture.

“Infosys is excited to partner with Google Cloud to drive the adoption of Google Cloud Cortex Framework, unlocking value from SAP and non-SAP data and enabling insights-driven digital enterprises across multiple industry domains within our large SAP customer base. Google Cloud Cortex Framework complements Infosys Cobalt that brings together our extensive SAP, data analytics and Google Cloud capabilities to help clients fast-track their cloud adoption and accelerate their business transformation.” – Sunil Senan, SVP and Business Head – Data & Analytics, Infosys

Building on decades of innovation with Google Cloud Cortex Framework

To illustrate the opportunities that Google Cloud Cortex Framework will bring to our customers, we developed an initial release that combines multiple Google data sets with SAP enterprise data. By leveraging machine learning and other Google technologies, companies can deliver new analytics and gain new insights. An example of this is demand shaping.

Demand shaping will benefit line-of-business executives and supply-chain professionals, who can leverage the Google Cloud Cortex Framework reference architecture to improve supply-chain operations by improving business processes or accelerating time-to-insight with analytics. Chief data officers (or any executive responsible for data and analytics) will also benefit by saving time, building on reusable components, and following best practices to get innovative cloud solutions up and running as quickly, effectively, and efficiently as possible. Today’s enterprises can use Google Cloud Cortex Framework to create a reusable architecture that can adapt and expand to new scenarios to gain better visibility into signals that influence demand forecasts. 

Of course, Google Cloud customers aren’t just interested in scenarios that apply to the data and analytics space. Future Google Cloud Cortex Framework offerings will help provide recommended approaches to better implement use cases in consumer-facing industries, including consumer packaged goods and supply chain and the delivery of improved customer experiences, as well as infrastructure and application workload management integration—all to drive insights to execution and improve automation of business processes. The common denominator will always be the ability to not only reduce the time and effort to deploy and manage each solution, but also to develop a technology strategy that can scale above and beyond an individual scenario or use case. 

Are you interested in learning more? Watch our session at Google Cloud Next ’21 and fill out this form to connect with our solution experts on the latest content, deployment options and free tailor-made innovation discovery workshops.

Related Article

Read Article

Source : Data Analytics Read More

Turn data into value with a unified and open data cloud

Turn data into value with a unified and open data cloud

Today at Google Cloud Next we are announcing innovations that will enable data teams to simplify how they work with data and derive value from it faster. These new solutions will help organizations build modern data architectures with real-time analytics to power innovative, mission-critical, data-driven applications. 

Too often, even the best minds in data are constrained by ineffective systems and technologies. A recent study showed that only 32%of companies surveyed gained value from their data investments. Previous approaches have resulted in difficult to access, slow, unreliable, complex, and fragmented systems. 

At Google Cloud, we are committed to changing this reality by helping customers simplify their approach to data to build their data clouds. Google Cloud’s data platform is simply unmatched for speed, scale, security, and reliability for any size organization with built-in, industry-leading machine learning (ML) and artificial intelligence (AI), and an open standards-based approach.

Vertex AI and data platform services unlock rapid ML modeling 

With the launch of Vertex AI in May 2021, we empowered data scientists and engineers to build reliable, standardized AI pipelines that take advantage of the power of Google Cloud’s data pipelines. Today, we are taking this a step further with the launch of Vertex AI Workbench, a unified user experience to build and deploy ML models faster, accelerating time-to-value for data scientists and their organizations. We’ve integrated data engineering capabilities directly into the data science environment, which lets you ingest and analyze data, and deploy and manage ML models, all from a single interface.

Data scientists can now build and train models 5X faster on Vertex AI than on traditional notebooks. This is primarily enabled by integrations across data services (like Dataproc, BigQuery, Dataplex, and Looker), which significantly reduce context switching. The unified experience of Vertex AI let’s data scientists coordinate, transform, secure and monitor Machine Learning Operations (MLOps) from within a single interface, for their long-running, self-improving, and safely-managed AI services.

“As per IDC’s AI StrategiesView 2021, model development duration, scalable deployment, and model management are three of the top five challenges in scaling AI initiatives,” said Ritu Jyoti, Group Vice President, AI and Automation Research Practice at IDC. “Vertex AI Workbench provides a collaborative development environment for the entire ML workflow – connecting data services such as BigQuery and Spark on Google Cloud, to Vertex AI and MLOps services. As such, data scientists and engineers will be able to deploy and manage more models, more easily and quickly, from within one interface.”

Ecommerce company, Wayfair, has transformed its merchandising capabilities with data and AI services. “At Wayfair, data is at the center of our business. With more than 22 million products from more than 16,000 suppliers, the process of helping customers find the exact right item for their needs across our vast ecosystem presents exciting challenges,” said Matt Ferrari, Head of Ad Tech, Customer Intelligence, and Machine Learning; Engineering and Product at Wayfair. “From managing our online catalog and inventory, to building a strong logistics network, to making it easier to share product data with suppliers, we rely on services including BigQuery to ensure that we are able to access high-performance, low-maintenance data at scale. Vertex AI Workbench and Vertex AI Training accelerate our adoption of highly scalable model development and training capabilities.”

BigQuery Omni: Breaking data silos with cross-cloud analytics and governance

Businesses across a variety of industries are choosing Google Cloud to develop their data cloud strategies and better predict business outcomes — BigQuery is a key part of that solution portfolio. To address complex data management across hybrid and multicloud environments, this month we are announcing the general availability of BigQuery Omni, which allows customers to analyze data across Google Cloud, AWS, and Azure. Healthcare provider, Johnson and Johnson was able to combine data in Google Cloud and AWS S3 with BigQuery Omni without needing data to migrate. 

This flexible, fully-managed, cross-cloud analytics solution allows you to cost-effectively and securely answer questions and share results from a single pane of glass across your datasets, wherever you are. In addition to these multicloud capabilities, Dataplex will be generally available this quarter to provide an intelligent data fabric that enables you to keep your data distributed while making it securely accessible to all your analytics tools.

Spark on Google Cloud simplifies data engineering 

To help make data engineering even easier, we are announcing the general availability of Spark on Google Cloud, the world’s first autoscaling and serverless Spark service for the Google Cloud data platform. This allows data engineers, data scientists, and data analysts to use Spark from their preferred interfaces without data replication or custom integrations. Using this capability, developers can write applications and pipelines that autoscale without any manual infrastructure provisioning or tuning. This new service makes Spark a first class citizen on Google Cloud, and enables customers to get started in seconds and scale infinitely, regardless if you start in BigQuery, Dataproc, Dataplex, or Vertex AI.

Spanner meets PostgreSQL: global, relational scale with a popular interface

We’re continuing to make Cloud Spanner, our fully managed, globally scalable, relational database, available to more customers now with a PostgreSQL interface, now in preview. With this new PostgreSQL interface, enterprises can take advantage of Spanner’s unmatched global scale, 99.999% availability, and strong consistency using skills and tools from the popular PostgreSQL ecosystem. 

This interface supports Spanner’s rich feature set that uses the most popular PostgreSQL data types and SQL features to reduce the barrier to entry for building transformational applications. Using the tools and skills they already have, developer teams gain flexibility and peace of mind because the schemas and queries they build against the PostgreSQL interface can be easily ported to another Postgres environment. Complete this form to request access to the preview.

Our commitment to the PostgreSQL ecosystem has been long standing. Customers choose Cloud SQL for the flexibility to run PostgreSQL, MySQL and SQL Server workloads. Cloud SQL provides a rich extension collection, configuration flags, and open ecosystem, without the hassle of database provisioning, storage capacity management, or other time-consuming tasks.

Auto Trader has migrated approximately 65% of their Oracle footprint to Cloud SQL, which remains a strategic priority for the company. Using Cloud SQL, BigQuery, and Looker to facilitate access to data for their users, and with Cloud SQL’s fully managed services, Auto Trader’s release cadence has improved by over 140% (year-over-year), enabling an impressive peak of 458 releases to production in a single day.

Looker integrations make augmented analytics a reality

We are announcing a new integration between Tableau and Looker that will allow customers to operationalize analytics and more effectively scale their deployments with trusted, real-time data, and less maintenance for developers and administrators. Tableau customers will soon be able to leverage Looker’s semantic model, enabling new levels of data governance while democratizing access to data. They will also be able to pair their enterprise semantic layer with Tableau’s leading analytics platform. The future might be uncertain, but together with our partners we can help you plan for it. 

We remain committed to developing new ways to help organizations go beyond traditional business intelligence with Looker. In addition to innovating within Looker, we’re continuing to integrate within other parts of Google Cloud. Today, we are sharing new ways to help customers deliver trusted data experiences and leverage augmented analytics to take intelligent action. 

First, we’re enabling you to democratize access to trusted data in tools where you are already familiar. Connected Sheets already allows you to interactively explore BigQuery data in a familiar spreadsheet interface and will soon be able to leverage the governed data and business metrics in Looker’s semantic model. It will be available in preview by the end of this year. 

Another integration we’re announcing is Looker’s Solution for Contact Center AI, which helps you gain a deeper understanding and appreciation of your customers’ full journey by unlocking insights from all of your company’s first-party data, such as contextualizing support calls to make sure your most valuable customers receive the best service. 

We’re also sharing the new Looker Block for Healthcare NLP API, which provides simplified access to intelligent insights from unstructured medical text. Compatible with Fast Healthcare Interoperability Resources (FHIR), healthcare providers, payers, and pharma companies can quickly understand the context and relationships of medical concepts within the text, and in turn, can begin to link this to other clinical data sources for additional AI and ML actions. 

Bringing the best of Google together with Google Earth Engine and Google Cloud

We are thrilled to announce the preview of Google Earth Engine on Google Cloud. This launch makes Google Earth Engine’s 50+ petabyte catalog of satellite imagery and geospatial data sets available for planetary-scale analysis. Google Cloud customers will be able to integrate Earth Engine with BigQuery, Google Cloud’s ML technologies, and Google Maps Platform. This gives data teams a way to better understand how the world is changing and what actions they can take — from sustainable sourcing, to saving energy and materials costs, to understanding business risks, to serving new customer needs. 

For over a decade, Earth Engine has supported the work of researchers and NGOs from around the world, and this new integration brings the best of Google and Google Cloud together to empower enterprises to create a sustainable future for our planet and for your business.

At Google Cloud, we are deeply grateful to work with companies of all sizes, and across industries, to build their data clouds. Join my keynote session to hear how organizations are leveraging the full power of data, from databases to analytics that support decision making to AI and ML that predict and automate the future. We’ll also highlight our latest product innovations for BigQuery, Spanner, Looker, and Vertex AI.

I can’t wait to hear how you will turn data into intelligence and look forward to connecting with you.

Related Article

New Google Cloud innovations to unify your data cloud

Google Cloud unveils news data analytics products and services to support open data cloud.

Read Article

Source : Data Analytics Read More

Introducing Intelligent Products Essentials: helping manufacturers build AI-powered smart products, faster

Introducing Intelligent Products Essentials: helping manufacturers build AI-powered smart products, faster

Expectations for both consumer and commercial products have changed. Consumers want products that evolve with their needs, adapt to their preferences, and stay up-to-date over time. Manufacturers, in turn, need to create products that provide engaging customer experiences not only to better compete in the marketplace, but also to provide new monetization opportunities. 

However, embedding intelligence into new and existing products is challenging. Updating hardware is costly, and existing connected products do not have the capability to add new features. Furthermore, manufacturers do not have sufficient customer insights due to product telemetry and customer data silos, and may lack the AI expertise to quickly develop and deploy these features. 

That’s why today we’re launching Intelligent Products Essentials, a solution that allows manufacturers to rapidly deliver products that adapt to their owners, update features over-the-air using AI at the edge, and provide customer insights using analytics in the cloud. The solution is designed to assist manufacturers in their product development journeys—whether developing a new product or enhancing existing ones. 

With Intelligent Products Essentials, manufacturers can:

Personalize customer experiences: Provide a compelling ownership experience that evolves over the lifetime of the product. For example, a chatbot that contextualizes responses based on product status and customer profile.

Manage and update products over-the-air: Deploy updates to products in the field,  gather performance insights and evolve capabilities over time with monetization opportunities.

Predict parts and service issues: Detect operating thresholds, anomalies and predict failures to proactively recommend service using AI, reducing warranty claims, decreasing parts shortages and increasing customer satisfaction.

In order to help manufacturers quickly deploy these use cases and many more, Intelligent Products Essentials provides the following:

Edge connections: Connect and ingest raw or time-series product telemetry from various device platforms utilizing IoT Core or Pub/Sub and enable deployment and management of firmware over-the-air and machine learning models with Vertex AI at the edge.

Ownership App Template: Easily build connected product companion apps that work on smartphones, tablets, and computers. Use a pre-built API and accompanying sample app that can incorporate product or device registration, identity management, and provide application behavior analytics using Firebase.

Product fleet management: Manage, update and analyze fleets of connected products via APIs, Google Kubernetes Engine, and Looker.

AI services: Create new features or capabilities for your products using AI and machine learning products such as DialogFlow, Vision AI, AutoML, all from Vertex AI.

Enterprise data integration: Integrate data sources such as Enterprise Asset Management (EAM), Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) systems and others using Dataflow and BigQuery.

Intelligent Products Essentials helps manufacturers build new features across consumer, industrial, enterprise, and transportation products. Manufacturers can implement the solution in-house, or work with one of our certified solution integration partners like Quantifi and Softserve

“The focus on intelligent products that Google Cloud is deploying provides a digital option for manufacturers and users. At its heart, systems like Intelligent Product Essentials are all about decision making. IDC sees faster and more effective decision-making as the fundamental reason for the drive to digitize products and processes. It’s how you can make faster and more effective decisions to meet heightened customer expectations, generate faster cash flow, and better revenue realization,” said Kevin Prouty, Group Vice President at IDC. “Digital offerings like Google’s Intelligent Product Essentials potentially go the last mile with the ability to connect the digital thread all the way through to the final user.”

Customers adopting Intelligent Products Essentials

GE Appliances, a Haier company, are enhancing their appliances using new AI-powered intelligent features to enable:

Intelligent cooking: Help cook the perfect meal to personal preferences, regardless of your expertise and abilities in the kitchen.

Frictionless service: Build smart appliances that know when they need maintenance and make it simple to take action or schedule services.

Integrated digital lifestyle: Make appliances useful at every step of the way by integrating them with digital lifestyle services – for example, automating appliance behaviors according to customer calendars, such as oven preheating or scheduling the dishwasher to run in the late evening.

“Intelligent Products Essentials enhances our smart appliances ecosystem, offering richer consumer habit insights. This enables us to develop and offer new features and experiences to integrate with their digital lifestyle.“ —Shawn Stover, Vice-president Smart Home Solutions at GE Appliances.

Serial 1, Powered by Harley-Davidson, is using Intelligent Product Essentials to manage and update its next generation eBicycles, and personalize its customers’ digital ownership experiences. 

“At Serial 1, we are dedicated to creating the easiest and most intuitive way to experience the fun, freedom, and adventure of riding a pedal-assist electric bicycle. Connectivity is a key component of delivering that mission, and working together to integrate Intelligent Product Essentials into our eBicycles will ensure that our customers enjoy the best possible user experience.”— Jason Huntsman, President, Serial 1. 

Magic Leap, an augmented reality pioneer with industry-leading hardware and software, is building field service solutions with Intelligent Products Essentials with the goal of connecting manufacturers, dealers, and customers to more proactive and intelligent service.

“We look forward to using Intelligent Products Essentials to enable us to rapidly integrate manufacturers’ product data with dealer service partners into our field service solution. We’re excited to partner with Google Cloud as we continue to push the boundaries of physical interaction with the digital world.” — Walter Delph, Chief Business Officer, Magic Leap

Intelligent Product Essentials is available today. To learn more, visit our website.

Related Article

What is Cloud IoT Core?

Cloud IoT Core is a managed service to securely connect, manage, and ingest data from global device fleets

Read Article

Source : Data Analytics Read More

Analyzing Twitter sentiment with new Workflows processing capabilities

Analyzing Twitter sentiment with new Workflows processing capabilities

The Workflows team recently announced the general availability of iteration syntax and connectors

Iteration syntax supports easier creation and better readability of workflows that process many items. You can use a for loop to iterate through a collection of data in a list or map, and keep track of the current index. If you have a specific range of numeric values to iterate through, you can also use range-based iteration

Click to enlarge

Connectors have been in preview since January. Think of connectors like client libraries for workflows to use other services. They handle authentication, request formats, retries, and waiting for long-running operations to complete. Check out our previous blog post for more details on connectors. Since January, the number of available connectors has increased from 5 to 20.

The combination of iteration syntax and connectors enables you to implement robust batch processing use cases. Let’s take a look at a concrete sample. In this example, you will create a workflow to analyze sentiments of the latest tweets for a Twitter handle. You will be using the Cloud Natural Language API connector and iteration syntax.

APIs for Twitter sentiment analysis

The workflow will use the Twitter API and Natural Language API. Let’s take a closer look at them.

Twitter API 

To use the Twitter API, you’ll need a developer account. Once you have the account, you need to create an app and get a bearer token to use in your API calls. Twitter has an API to search for Tweets. 

Here’s an example to get 100 Tweets from the @GoogleCloudTech handle using the Twitter search API:

Natural Language API

Natural Language API uses machine learning to reveal the structure and meaning of text. It has methods such as sentiment analysis, entity analysis, syntactic analysis, and more. In this example, you will use sentiment analysis. Sentiment analysis inspects the given text and identifies the prevailing emotional attitude within the text, especially to characterize a writer’s attitude as positive, negative, or neutral.

You can see a sample sentiment analysis response here. You will use the score of documentSentiment to identify the sentiment of each post. Scores range between -1.0 (negative) and 1.0 (positive) and correspond to the overall emotional leaning of the text. You will also calculate the average and minimum sentiment score of all processed tweets.

Define the workflow

Let’s start building the workflow in a workflow.yaml file.

In the init step, read the bearer token, Twitter handle, and max results for the Twitter API as runtime arguments. Also initialize some sentiment analysis related variables:

In the searchTweets step, fetch tweets using the Twitter API:

In the processPosts step, analyze each tweet and keep track of the sentiment scores. Notice how each tweet is analyzed using the new for-in iteration syntax with its access to the current index.

Under the processPosts step, there are multiple substeps. The analyzeSentiment step uses the Language API connector to analyze the text of a tweet and the next two steps calculate the total sentiment and keep track of the minimum sentiment score and index:

Once outside the processPosts step, calculate the average sentiment score, and then log and return the results

Deploy and execute the workflow

To try out the workflow, let’s deploy and execute it.

Deploy the workflow:

Execute the workflow (don’t forget to pass in your own bearer token):

After a minute or so, you should see the see the result with sentiment scores:

Next

Thanks to the iteration syntax and connectors, we were able to read and analyze Tweets in an intuitive and robust workflow with no code. Please reach out to @meteatamel and krisabraun@ for questions and feedback.

Twitter sentiment analysis on GitHub.

Share feedback, interesting use cases and customer requests

Related Article

Introducing Workflows callbacks

Introducing Workflows callbacks. Thanks to callbacks, you can put a human being or autonomous system into the loop. If your processes req…

Read Article

Source : Data Analytics Read More

Dataflow Pipelines, deploy and manage data pipelines at scale

Dataflow Pipelines, deploy and manage data pipelines at scale

We see data engineers use Dataflow for a wide variety of their data processing needs, ranging from ingesting data into their data warehouses and data lakes to processing data for machine learning use cases to implementing sophisticated streaming analytics applications. While the use cases and what customers do varies, there is one common need that all of these users have: the need to create, monitor and manage dozens, if not hundreds of, Dataflow jobs. As a result, users have asked us for a scalable way to schedule, observe and troubleshoot Dataflow jobs. 

We are excited to announce a new capability – Dataflow Pipelines – into Preview that address the problem of managing Dataflow jobs at scale. Dataflow Pipelines introduces a new management abstraction – Pipelines – that map to the logical pipelines that users care about and provides a single pane of glass view for observation and management.

With Data Pipelines, data engineers can easily perform tasks such as the following. 

Running jobs on a recurring schedule: With Data Pipelines, users can “schedule” recurrent batch jobs by just providing a schedule in cron format. The pipeline will then automatically create Dataflow jobs as per the schedule. The input file names can be parameterized for incremental batch pipeline processing. Dataflow uses Cloud Scheduler to schedule the jobs.

Creating and tracking SLO: One of the key monitoring goals is to ensure that data pipelines are delivering data that the downstream business teams need. In the past, it was not easy to define SLOs and set up alerts on those. With Data Pipelines, SLO configuration and alerting is natively supported and users can define them easily at Pipeline level.

Health monitoring & Tracking: Data Pipelines makes it easy to monitor and reason about your pipelines by providing aggregated metrics on a project and at a pipeline level. These metrics (both batch and streaming) along with history of previous execution runs provide a detailed overview of the pipelines at a glance. In addition, the ability to easily identify problematic jobs and dive into the job level pages makes troubleshooting easier.

Here is a short video that provides an overview of Data Pipelines.

If you have any feedback or questions, please write to us at google-data-pipelines-feedback@googlegroups.com.

Source : Data Analytics Read More

Google Cloud joins forces with EDM Council to build a more secure and governed data cloud

Google Cloud joins forces with EDM Council to build a more secure and governed data cloud

Google Cloud joins the EDM Council to announce the release of the CDMC framework v1.1.1. This has been an industry wide effort which started in the summer of 2020, where leading cloud providers, data governance vendors and experts worked together to define the best practices for data management in the cloud. The CDMC Framework captures expertise from the group and defines clear criteria to manage, govern, secure and ensure privacy of data in the cloud. Google Cloud implements most of the mission critical controls and automations in Dataplex – Google Cloud’s own first party solution to organize, manage and ensure data governance for data across Google Clouds’ native data storage systems. Leveraging Dataplex, and working with the best practices in the CDMC framework, can ensure adequate control over sensitive data, and sensitive data workloads. Additionally, Google Clouds’ data services allow a high degree of configurability which, together with the integration with specialised data management software provided by our partners like Collibra, provide a rich eco-system for customers to implement solutions which adhere to the CDMC best practices.

The CDMC framework is a joint venture between hundreds of organizations across the globe, including major Cloud Service Providers, technology service organizations, privacy firms and major consultancy and advisory firms who have come together to define best practices. The framework spans governance and accountability, cataloging and classification, accessibility and usage, protection and privacy and data lifecycle management. The framework represents a milestone in adoption of industry best practices for data management and we believe that it will contribute to build trust, confidence and accountability for the adoption of cloud, particularly for sensitive data. Capitalising on this, Google Cloud is going to make publicly available Dataplex, which will implement cataloging, lifecycle management, governance and most of the other controls in the framework (others are available on a per product basis).

“Google Cloud customers, who include financial services, regulated entities, and privacy minded organizations continue to benefit from Google’s competency in handling sensitive data. The CDMC framework ensures that Google’s best practices are shared and augmented from feedback across the industry” Said Evren Eryurek, Google’s Director of Product Management at Google Cloud, a Key leader for Big Data in Google Cloud. אא

The organizing body of which Google Cloud is a member of, the EDM Council, is a global non-profit trade association, with over 250 member organizations from the US, Canada, UK, Europe, South Africa, Japan, Asia, Singapore and Australia, and over 10,000 data management professionals as members. The EDM Council provides a venue for data professionals to interact, communicate, and collaborate on the challenges and advances in data management as a critical organizational function. The Council provides research, education and exposure to how data, as an asset, is being curated today, and vision of how it must be managed in the future.

For more about DataplexFor more information about CDMC Framework, and a downloadable docFor more about the EDM Council

Source : Data Analytics Read More

Building the data analyst driven organization from the first principles

Building the data analyst driven organization from the first principles

In this blog series, a companion to our white paper, we’re exploring different types of data-driven organizations. In our previous blogs of this series, a data scientist driven organization seeks to maximize value derived from data by making it highly accessible and discoverable, while also applying robust governance and operational rigor to rapidly deploy ML models. A data engineering driven organization typically provides 3 categories of data workers, with data engineers acting as the stewards of data that is used to generate analyses by an analytics team, for consumption by business users. Many of the same design decisions and technologies come into play between these organization types, but the social and organizational aspects are different. 

Regardless of the composition of your organization’s data workers and their exact roles, you’re probably facing a lot of the same challenges. Some of these may be familiar to you:

Your data is stale, noisy, or otherwise untrustworthy

You need reliable data quickly in order to make rapid business decisions, but integrating new data sources is time consuming and costly

You struggle to find a balance among reducing risk, increasing profitability, and innovation

A lot of your time is spent on pulling reports for regulatory compliance instead of generating insights for the business

Some of these challenges are more profound for companies in a highly regulated industry, but data freshness, time to insights, reduction of risk, and innovation are key to any company. The common thread is the tremendous pressure to transform insights into business value, as fast as possible. Your customers are demanding accurate and faster interactions driven by data. As a result, your organization needs to sharpen your data analytics capabilities to stay competitive.

At the same time, technology is evolving around you, creating a skill gap with the introduction of new technologies such as data lakes or data processing frameworks such as Spark. These technologies are powerful but require programming skills in languages such as Java or Scala. They present a radically different paradigm to the classic SQL declarative approach. There is a delicate balance of data workers within a company, and more traditional data architectures require very specific technical skills. Any new technology stack that disrupts this balance requires a redistribution of technical skills or a different ratio of engineering resources to other data workers. It’s often easier for a department head to justify an additional person on the team with new skills than it is to make broad sweeping changes to a central IT department, and as a result, evolution and new skill sets only occur in pockets of the org chart.

So, why doesn’t technology adapt to your needs?

The rise and fall of technologies such as Hadoop has revealed the elephant in the room (pun intended). Technology needs to fit into your culture and needs to build on your capabilities. This allows you to be more productive, reflect business needs, and preserve your subject matter expertise. You don’t need to become an engineering driven organization to leverage new technology!

We’re going to explore how a platform like BigQuery, a pioneer in the concept of a cloud structured data lake, can provide a scalable processing engine and storage layer that can deal with the new and diverse data sources, via a familiar, SQL-based user interface.

Figure 1 – Data analysts. skill set gap on a data warehouse + data lake architecture vs Structured Data Lake architecture

How do you build the “data-driven” agenda for data analyst driven organizations?

Before discussing the main levers to pull for the transformation, let’s define what we mean by a data analyst driven organization. It should be noted that whether an organization is analyst driven is not a binary concept, but instead presents a wide range of overlapping characteristics:

Mature industry. At the macro level, these organizations are red-brick established names with legacy systems. Generally, the industry in which they operate can be considered mature and stable.

Competition from emerging digital natives. From a competitive standpoint, in addition to other similar organizations, there are also emerging digital organizations (for instance. fintech) that aim to capture the fastest growing digital areas and customer segments that have the highest potential.

EDW + Batch ETL. Technically speaking, the central information piece comes in the form of an enterprise data warehouse (EDW) built over the years with a high level of technical debt and legacy technology. The transformation of the data within the data warehouse is carried out through scheduled ETL (Extract Transform Load) processes such as nightly batches. This batch process adds to the latency of serving the data. 

Business Intelligence. Most data workers in the organization are used to answering business questions by launching SQL queries against a centralized data warehouse, creating reports and dashboard using BI tools. In addition, spreadsheets are used to access similar data. Thus, the internal talent pool is most comfortable with SQL, BI tools, and spreadsheets.

Narrowing the focus to the data department, the main personas and processes in these types of organizations can be generalized as follows:

Data analysts, focused on receiving, understanding, and serving the requests coming from the business and making sense of the relevant data.

Business analysts put the information into a context and act upon the analytical insights. 

Data Engineers, focused on the downstream data pipeline and the first phases of data transformation, such as, loading and integration of new sources. In addition, managing the data governance and data quality processes.

Finally and given its relevance, it is also worth digging deeper on what we understand by a data analyst. As a data analyst, your goal is to meet the information needs of your organization. You are responsible for the logical design and maintenance of the data itself. Some of the tasks may include creating layout and design of tables to meet the business processes, reorganization, and transformation of sources. In addition, you’re also responsible for the generation of reports and insights that effectively communicate trends, patterns, or predictions that the business asks for. 

Going back to our original question of how we can build the mission for the data analyst driven organizations, the answer is: using, and expanding  the experience and skill-set of the data analyst community.

Figure 3 – Data analysts domain expansions for the development of a data-driven strategy

On one hand, we promote the trend of data analysts making steps into the business side. As discussed earlier, data analysts bring in valuable knowledge with a deep knowledge of business domains and with sufficient technical skills to analyze data regardless of its volume or size. 

Cloud-based data warehouses and serverless technologies such as BigQuery contribute to this expansion of responsibilities toward right (as highlighted in Figure 3). In a way, allowing data analysts to focus on adding value rather than wasting time and effort in administrative / technical management tasks. Furthermore, you can now invest that extra time going deeper into the business without being limited by the volume or type of data that the storage system supports.

On the other hand, new data processing paradigms enable a movement in the opposite direction for the data analysts area of ​​responsibility. You can use SQL as the fundamental query tool for data analysis, but now you can also use it for data processing/transformation. In the process, data analysts are able to take on some of the data engineering’s work: data integration and enrichment.

Figure 4 – ELT paradigm – a SQL-first approach to data engineering

Data analyst driven organizations embrace the concept of ELT (Extract-Load-Transform) rather than the traditional ETL (Extract-Transform-Load). The main difference is the common data processing tasks are handled after the data is loaded to the data warehouse. ELT makes extensive use of SQL logic to enhance, cleanse, normalize, refine, and integrate data and make it ready for analysis. There are several benefits of such an approach: it reduces time to act, data is loaded immediately, and it is made available to multiple users concurrently. 

A robust, transformational, actionable architecture for data analyst driven organizations

So far we have talked briefly about the technological innovations that enable the data transformation, in this section we are going to focus on a more detailed description of these building blocks.

To define a high-level architecture, we are going to start by defining the first principles from which we derive the components and interrelationships. It goes without saying that a real organization must adapt these principles and therefore the architecture decisions to its reality and existing investments.

Principle #1: SQL as the analytics “lingua franca”

Technology should adapt to the current organizational culture. Prioritize components that offer a SQL interface, no matter where they are in the data processing pipeline.

Principle #2: Rise of the Structured Data Lake

Information systems infrastructure and its data should converge, to help expand the possibilities of analytical processing on new and diverse data sources. This may mean merging a traditional data warehouse with a data lake to eliminate silos. 

Principle #3: Assume and plan for “data/schema liquidity”

Storage is cheap, so your organization no longer needs to impose rigid rules regarding data structures before data arrives. Moving away from a schema-on-write to schema-on-read model enables real-time access to data.  Data can be kept in its raw form and then transformed into the schema that will be useful. In addition, the data platform can manage the process of keeping these copies in sync (for instance using materialized views, CDC, etc.). So do not be afraid to maintain several copies of the same data asset,  

Combining these principles we can define a high-level architecture like the one shown in the following diagram.

Figure 5 – A high-level informational architecture for the data analyst driven organizations

What components do we observe in the informational architecture of this type of organization?

First of all, a modern data platform should support an increasing number of data analysis patterns: 

the “classic” Business Intelligence workloads with tools such as Looker,

a SQL based ad hoc analytics interface allowing management of data pipelines through ELT

Enabling data science use cases with machine learning techniques 

real-time event processing 

Although the first two patterns are quite close to the traditional SQL data warehousing world, the last two present innovations in the form of SQL abstractions to more advanced analytical patterns. In the realm of machine learning, for example, we have BigQuery ML, which lets us execute machine learning models in BigQuery using standard SQL queries. And Dataflow SQL streaming extensions enable aggregating data streams with the underlying Dataflow sources like Pub/Sub or Kafka. Think for a moment the world of possibilities that technology enables without the need to invest in new profiles and/or roles.

For a data analysts driven organization, the data preparation and transformation challenge is a clear and loud message in choice between ELT vs ETL. Use ELT wherever possible; the significant difference with this new paradigm is where the data is transformed – inside the Structured Data Lake and by using SQL.

It is possible to transform data with SQL without sacrificing functionalities offered by extensive data integration suites. But how do you handle scheduling, dependency management, data quality, or operations monitoring? Products such as dbt or BigQuery Dataform bring a software engineering approach to data modeling and building data workflows. At the same time, they allow non-programmers to carry out robust data transformations. 

Modelling techniques such as Data Vault 2.0 are making a comeback due to the power of ELT in the new Cloud driven data warehouses. Therefore, It is important to note that the logical distribution of the data remains unaltered following the classical patterns such as the Immon or Kimball reference architectures. [1] [2]

In data analyst driven organizations, data engineering teams generally control extraction of data from source systems. While it can be made easier through the use of SQL-based tools, enabling data analysts to do some of that work, there is still a need for a strong data engineering team. There are batch jobs that would still require creating data pipelines that would be more suitable for ETL. For example, bringing data from a mainframe to a data warehouse would require additional processing steps: data types need to be mapped, COBOL books need to be converted, and so on. In addition, for use cases like real time analytics, the data engineering teams will configure the streaming data sources such as Pub/Sub or Kafka topics. The way that you deal with generic tasks is still the same — they can be written as generic ETL pipelines and then reconfigured by the analysts. For example, applying data quality validation checks from various source datasets to the target environment.  The main point is that with the power of cloud data warehouses, it is now possible to use ELT instead of traditional ETL tasks. However, as described above there are use cases such as data quality applications that we need ETL. 

In summary 

In this article we have identified the data analyst driven organization and reviewed the challenges faced by them. We have seen how it is possible to build a transformation plan around one of your most valuable assets: data analysts. We have also reviewed the main components that appear in a modern, scalable informational architecture needed to efficiently use such an organization. Data analysts’ responsibilities are expanding to advanced data engineering tasks such as automatic learning or real-time event processing. All of these are still possible through our familiar and beloved favorite interface: SQL. To get started with the Google Cloud data ecosystem, please feel free to contact us or start a free trial.

Source : Data Analytics Read More

Liberating your mainframe data with Confluent and Google Cloud

Liberating your mainframe data with Confluent and Google Cloud

Are you looking for the best way to migrate and replicate your mainframe data? Google Cloud and Confluent have teamed up to provide an end-to-end solution for connecting your mainframe application data with the advanced analytics capabilities of Google Cloud.

In this article, we will discuss how you can use Confluent Connect to replicate messages from IBM MQ and Db2 to Google Cloud. This allows you to work with your mainframe data in the cloud, and enables you to build new applications and analytical capabilities using Google Cloud’s machine learning solutions. You also benefit by reducing impact on your production mainframe workloads, and reducing general purpose compute costs. In other words, you can continue using your mainframe to run your mission-critical business workloads while setting your data in motion for innovation.

Here’s an example use case that demonstrates how using the Confluent MQ connector with Google Cloud can impact your bottom line. One of our customers is saving millions of dollars per year on mainframe cycles by leveraging z Integrated Information Processor (zIIP) engines for data processing.

Moving these workloads to zIIP, off of GP (general purpose) compute, and away from CHINIT (Channel Initiator) routes directly leads to reduced MSU licensing. As an example, a customer in the financial services industry saw a 50% reduction in CPU usage per message. These cost savings can enable you to direct budget resources toward differentiating activities, such as commercializing your valuable mainframe data to open up new revenue streams and improve customer service.

On the technical side, Confluent guarantees exactly-once message semantics, preserives message order and unleashes that data to be accessed by existing and new applications that need a high throughput, low latency event driven architecture. This means that you can rely on the accuracy and consistency of your data in Google Cloud as if you were querying it directly from your mainframe database.

Once you have this data in your Confluent cluster, you can leverage  the combined capabilities of Confluent and Google Cloud. You can modernize the way your consumers access your data by providing a single, standard source of truth without impacting production services. Confluent integrates directly with Apigee, Google Cloud’s API platform for developing and managing APIs.

Because Confluent integrates with BigQuery, you can also leverage the advanced analytical capabilities of BigQuery ML and Vertex AI to realize value from your latent mainframe data, and build new systems of insight that were not possible on the mainframe. And most of all, you can open up new avenues for innovation by allowing consumers to access the data when they need it, speeding up time to value and enabling faster business decisions.

You now have a bridge to cloud for your mainframe application data. Get started by deploying Confluent from the Google Cloud marketplace.

Related Article

Beyond mainframe modernization: The art of possibilities

Mainframe modernization has been a hot topic over the past decade or so. Over time, the term “modernization” itself is manifested in many…

Read Article

Source : Data Analytics Read More