Google Cloud Next Rollup for Data Analytics

Google Cloud Next Rollup for Data Analytics

October 23rd (this past Saturday!) was my 4th Googlevarsery and we are wrapping an incredible Google Next 2021!

When I started in 2017, we had a dream of making BigQuery Intelligent Data Warehouse that would power every organization’s data driven digital transformation. 

This year at Next, It was amazing to see Google Cloud’s CEO, Thomas Kurian, kick off his keynote with CTO of WalMart, Suresh Kumar , talking about how his organization is giving its data the “BigQuery treatment”.

AS  I recap Next 2021 and  reflect on our amazing journey over the past 4 years, I’m so proud of the opportunity I’ve had to work with some of the world’s most innovative companies from Twitter to Walmart to Home Depot, Snap, Paypal and many others.   

So much of what we announced at Next is the result of years of hard work, persistence and commitment to delivering the best analytics experience for customers. 

I believe that one of the reasons why customers choose Google for data is because we have shown a strong alignment between our strategy and theirs and because we’ve been relentlessly delivering innovation at the speed they require. 

Unified Smart Analytics Platform 

Over the past 4 years our focus has been to build industries leading unified smart analytics platforms. BigQuery is at the heart of this vision and seamlessly integrates with all our other services. Customers can use BigQuery to query data in BigQuery Storage, Google Cloud Storage, AWS S3, Azure Blobstore, various databases like BigTable, Spanner, Cloud SQL etc. They can also use  any engine like Spark, Dataflow, Vertex AI with BigQuery. BigQuery automatically syncs all its metadata with Data Catalog and users can then run a Data Loss Prevention service to identify sensitive data and tag it. These tags can then be used to create access policies. 

In addition to Google services, all our partner products also integrate with BigQuery seamlessly. Some of the key partners highlighted at Next 21 included Data Ingestion (Fivetran, Informatica & Confluent), Data preparation (Trifacta, DBT),  Data Governance (Colibra), Data Science (Databricks, Dataiku) and BI (Tableau, PowerBI, Qlik etc).

Planet Scale analytics with BigQuery

BigQuery is an amazing platform and over the past 11 years we have continued to innovate in various aspects. Scalability has always been a huge differentiator for BigQuery. BigQuery has many customers with more than 100 petabytes of data and our largest customer is now approaching  an exabyte of data. Our large customers have run queries over trillions of rows. 

But scale for us is not just about storing or processing a lot of data. Scale is also how we can reach every organization in the world. This is the reason we launched BigQuery Sandbox which enables organizations to get started with BigQuery without a credit card. This has enabled us to reach tens of thousands of customers. Additionally to make it easy to get started with BigQuery we have built integrations with various Google tools like Firebase, Google Ads, Google Analytics 360, etc. 

Finally, to simplify adoption we now provide options for customers to choose whether they would like to pay per query, buy flat rate subscriptions or buy per second capacity. With our autoscaling capabilities we can provide customers best value by mixing flat rate subscription discounts with auto scaling with flex slots.

Intelligent Data Warehouse to empower every data analyst to become a data scientist

BigQuery ML is one of the  biggest innovations that we have brought to market over the past few years. Our vision is to make every data analyst a data scientist by democratizing Machine learning. 80% of time is spent in moving, prepping and transforming data for the ML platform. This also causes a huge data governance problem as now every data scientist has a copy of your most valuable data.  Our approach was very simple.  We asked:”what if we could bring ML to data rather than taking data to an ML engine?” 

That is how BigQuery ML was born. Simply write 2 lines of SQL code and create ML models. 

Over the past 4 years we have launched many models like regression, matrix factorization, anomaly detection, time series, XGboost, DNN etc. These  models are used by customers to solve complex  business problems simply from segmentation, recommendations, time series forecasting, package delivery estimation etc. The service is very popular: 80%+ of our top customers are using BigQueryML today.  When you consider that the average adoption rate of ML/AI is in the low 30%, 80% is a pretty good result!

We announced tighter integration of BQML with Vertex AI. Model explainability will provide the ability to explain the results of predictive ML classification and regression models by understanding how each feature contributes to the predicted result. Also users will be able to manage, compare and deploy BigQuery ML models in Vertex; leverage Vertex Pipelines to train and predict BigQuery ML models.

Real-time streaming analytics with BigQuery 

Customer expectations are changing and everyone wants everything in an instant: according to Gartner, by the end of 2024, 75% of enterprises will shift from piloting to operationalizing AI, driving a 5X increase in streaming data and analytics infrastructures.

The BigQuery’s storage engine is optimized for real-time streaming. BigQuery supports streaming ingestion of 10s of millions of events in real-time and there is no impact on query performance. Additionally customers  can use materialized views and BI Engine (which is now GA) on top of streaming data. We guarantee always fast, always fresh data. Our system automatically updates MVs and BI Engine. 

Many customers also use our PubSub service to collect real-time events and process these through Dataflow prior to ingesting into BigQuery. This is a streaming ETL pattern which is very popular. Last year,we announced PubSub Lite to  provide customers with a 90% lower price point and aTCO that is lower than any DIY Kafka deployment. 

We also announced Dataflow Prime, it is our next generation platform for Dataflow. Big Data processing platforms have only focused on horizontal scaling to optimize workloads. But we have seen new patterns and use cases like streaming AI where you may have a few steps in pipelines that perform data prep and then customers  have to run a GPU based model. Customers  want to use different sizes and shapes of machines to run these pipelines in the most optimum manner. This is exactly what Dataflow Prime does. It delivers vertical auto scaling with the right fitting for your pipelines. We believe this should lower costs for pipelines significantly.

With Datastream as our change data capture service (built on Alooma technology), we have solved the last key problem space for customers. We can automatically detect changes in your operational databases like MySQL, Postgres, Oracle etc and sync them in BigQuery.

Most importantly, all these products work seamlessly with each other through a set of templates. Our goal is to make this even more seamless over next year. 

Open Data Analytics with BigQuery

Google has always been a big believer in Open Source initiatives. Our customers love using various open source offerings like Spark, Flink, Presto, Airflow etc. With Dataproc & Composer our customers have been able to run various of these open source frameworks on GCP and leverage our scale, speed and security. Dataproc is a great service and delivers massive savings to customers moving from on-prem Hadoop environments. But customers want to focus on jobs and not clusters. 

That’s why we launched Dataproc Serverless Spark (GA) offering at Next 2021. This new service adheres to one of our key design principles we started with: make data simple.  

Just like with BigQuery, you can simply RUN QUERY. With Spark on Google Cloud, you simply RUN JOB.  ZDNet did a great piece on this.  I invite you to check it out!

Many of our customers are moving to Kubernetes and wanted to use that as the platform for Spark. Our upcoming Spark on GKE offering will give the ability to deploy spark workloads on existing Kubernetes clusters.  

But for me the most exciting capability we have is, the ability to run Spark directly on BigQuery Storage. BigQuery storage is highly optimized analytical storage. By running Spark directly on it, we again bring compute to data and avoid moving data to compute. 

BigSearch to power Log Analytics

We are bringing the power of Search to BigQuery. Customers already ingest massive amounts of log data into BigQuery and perform analytics on it. Our customers have been asking us for better support for native JSON and Search. At Next 21 we announced the upcoming availability of both these capabilities.

Fast cross column search will provide efficient indexing of structured, semi-structured and unstructured data. User friendly SQL functions let customers rapidly find data points without having to scan all the text in your table or even know which column the data resides in. 

This will be tightly integrated with native JSON, allowing customers to get BigQuery performance and storage optimizations on JSON as well as search on unstructured or constantly changing  data structures. 

Multi & Cross Cloud Analytics

Research on multi cloud adoption is unequivocal — 92% of businesses in 2021 report having a multi cloud strategy. We have always believed in providing customers choice to our customers and meeting them where they are. It was clear that all our customers wanted us to take our gems like BigQuery to other clouds as their data was distributed on different clouds. 

Additionally it was clear that customers wanted cross cloud analytics not multi-cloud solutions that can just run in different clouds. In short, see all their data with a single pane of glass, perform analysis on top of any data without worrying about where it is located, avoid egress costs and finally perform cross cloud analysis across datasets on different clouds.

With BigQuery Omni, we deliver on this vision, with a new way of analyzing data stored in multiple public clouds.  Unlike competitors, BigQuery Omni does not create silos across different clouds. BigQUery provides a single control plane that shows an analyst all data they have access to across all clouds. Analyst just writes the query and we send it to the right cloud across AWS, Azure or GCP to execute it locally. Hence no egress costs are incurred. 

We announced BQ Omni GA for both AWS and Azure at Google Next 21 and I’m really proud of the team for delivering on this vision.  Check out Vidya’s session and learn from Johnson and Johnson how they innovate in a multi-cloud world.

Geospatial Analytics with BigQuery and Earth Engine

We have partnered with our Google Geospatial team to deliver GIS functionality inside BigQuery over the years. At Next we announced that customers will be able to integrate Earth Engine with BigQuery, Google Cloud’s ML technologies, and Google Maps Platform. 

Think about all the scenarios and use-cases your team’s going to be able to enable sustainable sourcing, saving energy or understanding business risks.

We’re integrating the best of Google and Google Cloud together to – again – make it easier to work with data to create a sustainable future for our planet.  

BigQuery as a Data Exchange & Sharing Platform

BigQuery was built to be a sharing platform. Today we have 3000+ organizations sharing more than 250 petabytes of data across organizations. Google also brings more than 150 public datasets to be used across various use cases. In addition to this, we are also bringing some of the most unique datasets like Google Trends to BigQuery. This will enable organizations to understand in real-time trends and apply to their business problems.

I am super excited about the Analytics Hub Preview announcement. Analytics Hub will provide the ability for organizations to build private and public analytics exchanges. This will include data, insights, ML Models and visualizations. This is built on top of the industry leading security capabilities of BigQuery.

Breaking Data Silos

Data is distributed across various systems in the organization and making it easy to break the data silo and make all this data accessible to all is critical. I’m also particularly excited about the Migration Factory we’re building with Informatica and the work we are doing for data movement, intelligent data wrangling with players like Trifacta and FiveTran, with whom we share over 1,000 customers (and growing!).  Additionally we continue to deliver native Google service to help our customers. 

We acquired Cask in 2018 and launched our self service Data Integration service in Data Fusion. Now Fusion allows customers to create complex pipelines with just simple drag and drop. This year we focused on unlocking SAP data for our customers. We have launched various SAP connectors and accelerators to achieve this.

At GCP Next we also announced our BigQuery Migration service in preview. Many of our customers are migrating their legacy data warehouses and data lakes to BigQuery. BigQuery Migration Service provides end-to-end tools to simplify migrations for these customers. 

And today, to make migrations to BigQuery easier for even more customers, I am super excited to announce the acquisition of CompilerWorks. CompilerWorks’ Transpiler is designed from the ground up to facilitate SQL migration in the real world and will help our customers accelerate their migrations. It supports migrations from over 10 legacy enterprises data warehouses and we will be making it available as part of our BigQuery Migration service in the coming months.

Data Democratization with BigQuery

Over the past 4 years we have focused a lot on making it very  easy to derive actionable insights from data in BigQuery. Our priority has been to provide a strong ecosystem of partners that can provide you with great tools to achieve this but also deliver native Google capabilities. 

With our BI engine GA announcement which we introduced in 2019, previewed earlier this year and showcased with tools like Microsoft PowerBI and Tableau, is now available for all to play with.

BigQuery + Data Studio are like peanut butter and Jelly. They just work well together. We launched BI Engine first with Data Studio and scaled it to all the users. More than 40% of our BigQuery customers use Data Studio. Once we knew BI Engine works extremely well we now have made it an integral part of BigQuery API and launched it for all our internal and partner BI tools. 

We announced GA for BI Engine at Next 2021 but we were already GA with Data Studio for the past 2 years. We recently moved the Data Studio team back into Google Cloud making the partnership even stronger. If you have not used Data Studio, I encourage you to take a look and get started for free today here!! 

Connected Sheets for BigQuery is one of my favorite combinations. You can give every business user in your organization the ability to analyze billions of records using standard Google Sheets experience. I personally use it everyday to analyze all our product data. 

We acquired Looker in Feb 2020 with a vision of providing a semantic modeling layer to our customers with a governed BI solution. Looker is tightly integrated with BigQuery including BigQuery ML. Our latest partnership with Tableau where Tableau customers will soon be able to leverage Looker’s semantic model, enabling new levels of data governance while democratizing access to data. 

Finally, I have a dream that one day we will bring Google Assistant to your enterprise data. This is the vision of Data QnA. We are in early innings on this and we will continue to work hard to make this vision a reality. 

Intelligent Data Fabric to unify the platform

Another important trend that shaped our market is the Data Mesh.  Earlier this year, Starburst invited me to talk about this very topic. We have been working for years on this concept, and although we would love for all data to be neatly organized in one place, we know that our customers’ reality is that it is not (If you want to know more about this, read about my debate on this topic with Fivetran’s George Fraser, a16z’s Martin Casado and Databricks’ Ali Ghodsi).

Everything I’ve learned from customers over my years in this field is that they don’t just need a data catalog or a set of data quality and governance tools, they need an intelligent data fabric.  That is why we created Dataplex, whose general availability we announced at Next.

Dataplex enables customers to centrally manage, monitor, and govern data across data lakes, data warehouses, and data marts, while also ensuring data is securely accessible to a variety of analytics and data science tools.  It lets customers organize and manage data in a way that makes sense for their business, without data movement or duplication. It provides logical constructs – lakes, data zones, and assets – which enable customers to abstract away the underlying storage systems to build a foundation for setting policies around data access, security, lifecycle management, and so on.  Check out Prajakta Damle’s session and learn from Deutsche Bank how they are thinking about a unified data mesh across distributed data.

Closing Thoughts

Analysts have recognized our momentum and, as I look back at this year, I couldn’t thank our customers and partners enough for the support they provided my team and I across our large Data Analytics portfolio: in March, Google BigQuery was named a Leader in The Forrester Wave™: Cloud Data Warehouse, Q1 2021.  And in June, Dataflow was named a Leader in The Forrester Wave™: Streaming Analytics, Q2 2021 report.

If you want to get a taste for why customers choose us over other hyperscalers or cloud data warehousing, I suggest you watch the Data Journey series we’ve just launched, which documents the stories of organizations modernizing to the cloud with us.

The Google Cloud Data Analytics portfolio has become a leading force in the industry and I couldn’t be more excited to have been part of it.  I do miss you, my customers and partners, and I’m frankly bummed that we didn’t get to meet in person like we’ve done so many times before (see a photo of my last in-person talk before the pandemic), but this Google Next was extra special, so let’s dive into the product innovation and their themes.

I hope that I will get to see you in person next time we run Google Next!

Source : Data Analytics Read More

Google Cloud’s data ingestion principles

Google Cloud’s data ingestion principles

Businesses around the globe are realizing the benefits of replacing legacy data silos with cloud-based enterprise data warehouses, including easier collaboration across business units and access to insights within their data that were previously unseen. However, bringing data from numerous disparate data sources into a single data warehouse requires you to develop pipelines that ingest data from these various sources into your enterprise data warehouse. Historically, this has meant that data engineering teams across the organization procure and implement various tools to do so. But this adds significant complexity to managing and maintaining all these pipelines and makes it much harder to effectively scale these efforts across the organization. Developing enterprise-grade, cloud-native pipelines to bring data into your data warehouse can alleviate many of these challenges. But, if done incorrectly, these pipelines can present new challenges that your teams will have to spend their time and energy addressing. 

Developing cloud-based data ingestion pipelines that replicate data from various sources into your cloud data warehouse can be a massive undertaking that requires significant investment of staffing resources. Such a large project can seem overwhelming and it can be difficult to identify where to begin planning such a project. We have defined the following principles for data pipeline planning to begin the process. These principles are intended to help you answer key business questions about your effort and begin to build data pipelines that address your business and technical needs. Each section below details a principle of data pipelines and certain factors your teams should consider as they begin developing their pipelines.

Principle 1: Clarify your objectives

The first principle to consider for pipeline development is clarify your objectives. This can be broadly defined as taking a holistic approach to pipeline development that encompasses requirements from several perspectives: technical teams, regulatory or policy requirements, desired outcomes, business goals, key timelines, available teams and their skill sets, and downstream data users. Clarifying your objectives clearly identifies and defines requirements from each key stakeholder at the beginning of the process and continually checks development against these requirements to ensure the pipelines built will meet these requirements.

This is done by first clearly defining the desired end state for each project in a way that addresses a demonstrated business need of downstream data users. Remember that data pipelines are almost always the means to accomplish your end state, rather than the end state itself. An example of an effectively defined end-state is “enabling teams to gain a better understanding of our customers by providing access to our CRM data within our cloud data warehouse” rather than “move data from our CRM to our cloud data warehouse”. This may seem like a merely semantic difference, but framing the problem in terms of business needs helps your teams make technical decisions that will best meet these needs. 

After clearly defining the business problem you are trying to solve, you should facilitate requirement gathering from each stakeholder and use these requirements to guide the technical development and implementation of your ingestion pipelines. We recommend gathering stakeholders from each team, including downstream data users, prior to development to gather requirements for the technical implementation of the data pipeline. These will include critical timelines, uptime requirements, data update frequency, data transformation, DevOps needs, and security, policy, or regulatory requirements by which a data pipeline must meet.

Principle 2: Build your team

The second principle to consider for pipeline development is build your team. This means ensuring you have the right people with the right skills available in the right places to develop, deploy, and maintain your data pipelines. After you have gathered your pipeline requirements, you can begin to develop a summary architecture that will be used to build and deploy your data pipelines. This will help you identify the human talent you will need to successfully build, deploy, and manage these data pipelines and identify any potential shortfalls that would require additional support from either third-party partners or new team members.

Not only do you need to ensure you have the right people and skill sets available in aggregate, but these individuals need to be effectively structured to empower them to maximize their abilities. This means developing team structures that are optimized for each team’s responsibilities and their ability to support adjacent teams as needed.

This also means developing processes that prevent blockers to technical development whenever possible, such as ensuring that teams have all of the appropriate permissions they need to move data from the original source to your cloud data warehouse without violating the concept of least privilege. Developers need access to the original data source (depending on your requirements and architecture) in addition to the destination data warehouse. Examples of this are ensuring that developers have access to develop and/or connect to a Salesforce Connected App or read access to specific Search Ads 360 data fields.

Principle 3: Minimize time to value

The third principle to consider for pipeline development is minimize time to value. This means considering the long-term maintenance burden of a data pipeline prior to developing and deploying it in addition to being able to deploy a minimum viable pipeline as quickly as possible. Generally speaking, we recommend the following approach to building data pipelines to minimize their maintenance burden: Write as little code as possible. Functionally, this can be implemented by:

1. Leveraging interface-based data ingestion products whenever possible. These products minimize the amount of code that requires ongoing maintenance and empower users who aren’t software developers to build data pipelines. They can also reduce development time for data pipelines, allowing them to be deployed and updated more quickly. 

Products like Google Data Transfer Service and Fivetran allow for managed data ingestion pipelines by any user to centralize data from SaaS applications, databases, file systems, and other tooling. With little to no code required, these managed services enable you to connect your data warehouse to your sources quickly and easily.For workloads managed by ETL developers and data engineers, tools like Google Cloud’s Data Fusionprovide an easy-to-use visual interface for designing, managing and monitoring advanced pipelines with complex transformations.

2. Whenever interface-based products or data connectors are insufficient, use pre-existing code templates. Examples of this include templates available for Dataflow that allow users to define variables and run pipelines for common data ingestion use cases, and the Public Datasets pipeline architecture that our Datasets team uses for onboarding.

3. If neither of these options are sufficient, utilize managed services to deploy code for your pipelines. Managed services, such as Dataflow or Dataproc, eliminate the operational overhead of managing pipeline configuration by automatically scaling pipeline instances within predefined parameters.

Principle 4: Increase data trust and transparency

The fourth principle to consider for pipeline development is increase data trust and transparency. For the purposes of this document, we define this as the process of overseeing and managing data pipelines across all tools. Numerous data ingestion pipelines that each leverage different tools or are not developed under a coordinated management plan can result in “tech sprawl”, which significantly increases the management overhead of data ingestion pipelines as the quantity of data pipelines increases. This becomes especially cumbersome if you are subject to service-level agreements, or legal, regulatory, or policy requirements for overseeing data pipelines. Preventing tech sprawl is, by far, the best strategy for dealing with it by developing streamlined pipeline management processes that automate reporting. Although this can theoretically be achieved by building all of your data pipelines using a single cloud-based product, we do not recommend doing so because it prevents you from taking advantage of features and cost optimizations that come with choosing the best product for your use case. 

A monitoring service such as Google Cloud Monitoring Service or Splunk that automates metrics, events, and metadata collection from various products, including those hosted in on-premise and hybrid computing environments, can help you centralize reporting and monitoring of your data pipelines. A metadata management tool such as Google Cloud’s Data Catalog or Informatica’s Enterprise Data Catalog can help you better communicate the nuances of your data so users better understand which data resources are best fit for a given use case. This significantly reduces your pipeline’s governance burden by eliminating manual reporting processes that often result in inaccuracies or lagging updates.

Principle 5: Manage costs

The fifth principle to consider for pipeline development is manage costs. This encompasses both the cost of cloud resources and the staffing costs necessary to design, develop, deploy, and maintain your cloud resources. We believe that your goal should not necessarily be to minimize cost, but rather maximizing the value of your investment. This means maximizing the impact of every dollar spent by minimizing waste in cloud resource utilization and human time. There are several factors to consider when it comes to managing costs:

Use the right tool for the job – Different data ingestion pipelines will have different requirements for latency, uptime, transformations, etc. Similarly, different data pipeline tools have different strengths and weaknesses. Choosing the right tool for each data pipeline can help your pipelines operate significantly more efficiently. This can reduce your overall cost, free up staffing time to focus on the most impactful projects, and make your pipelines much more efficient.

Standardize resource labeling –  Implement and utilize a consistent labeling schema across all tools and platforms to have the most comprehensive view of your organization’s spending. One example is requiring all resources to be labeled by the cost center or team at time of creation. Consistent labeling allows you to monitor your spend across different teams and calculate the overall value of your cloud spending.

Implement cost controls – If available, leverage cost controls to prevent errors that result in unexpectedly large bills. 

Capture cloud spend – Capture your spend on all cloud resource utilization for internal analysis using a cloud data warehouse and a data visualization tool. Without it, you won’t understand the context of changes in cloud spend and how they correlate with changes in business.

Make cost management everyone’s job – Managing costs should be part of the responsibilities of everyone who can create or utilize cloud resources. To do this well, we recommend making cloud spend reporting more transparent internally and/or implementing chargebacks to internal cost centers based on utilization.

Long-term, the increased granularity in cost reporting available within Google Cloud can help you better measure your key performance indicators. You can shift from cost-based reporting (i.e. – “We spent $X on BigQuery storage last month”) to value-based reporting (i.e. – “It costs $X to serve customers who bring in $Y revenue”). 

To learn more about managing costs, check out Google Cloud’s “Understanding the principles of cost optimization” white paper.

Principle 6: Leverage continually improving services

The sixth principle is leverage continually improving services. Cloud services are consistently improving their performance and stability, even if some of these improvements are not obvious to users. These improvements can help your pipelines run faster, cheaper, and more consistently over time. You can take advantage of the benefits of these improvements by:

Automating both your pipelines and pipeline management: Not only should data pipelines be automated, but almost all aspects of managing your pipelines can also be automated. This includes pipeline/data lineage tracking, monitoring, cost management, scheduling, access management and more. This helps reduce long-term operational costs of each data pipeline that can significantly alter your value proposition and prevent any manual configurations from negating the benefits of later product improvements.

Minimizing pipeline complexity whenever possible: While ingestion pipelines are relatively easy to develop using UI-based or managed services, they also require continued maintenance as long as they are in use. The most easily maintained data ingestion pipelines are typically the ones that minimize complexity and leverage automatic optimization capabilities. Any transformation in a data ingestion pipeline is a manual optimization of the pipeline that may struggle to adapt or scale as the underlying services improve. You can minimize the need for such transformations by building ELT (extract, load, transform) pipelines rather than ETL (extract, transform, load) pipelines. This pushes transformations down to the data warehouse that is use a specifically optimized query engine to transform your data rather than manually configured pipelines.

Next steps

If you’re looking for more information about developing your cloud-based data platform, check out our Build a modern, unified analytics data platform whitepaper. You can also visit our data integration site to learn more and find ways to get started with your data integration journey.

Once you’re ready to begin building your data ingestion pipelines, learn more about how Cloud Data Fusion and Fivetran can help you make sure your pipelines address these principles.

Source : Data Analytics Read More

How geospatial insights can help meet business goals

How geospatial insights can help meet business goals

Organizations that collect geospatial data can use that information to understand their operations, help make better business decisions, and power innovation. Traditionally, organizations have required deep GIS expertise and tooling in order to deliver geospatial insights. In this post, we outline some ways that geospatial data can be used in various business applications. 

Assessing environmental risk 

Governments and businesses involved in insurance underwriting, property management, agriculture technology, and related areas are increasingly concerned with risks posed by environmental conditions. Historical models that predict natural disasters like pollution, flooding, and wildfires are becoming less accurate as real-world conditions change. Therefore, organizations are incorporating real-time and historical data into a geospatial analytics platform and using predictive modeling to more effectively plan for risk and to forecast weather.

Selecting sites and planning expansion

Businesses that have storefronts, such as retailers and restaurants, can find the best locations for their stores by using geospatial data like population density to simulate new locations and to predict financial outcomes. Telecom providers can use geospatial data in a similar way to determine the optimal locations for cell towers. A site selection solution can combine proprietary site metrics with publicly-available data like traffic patterns and geographic mobility to help organizations make better decisions about site selection, site rationalization, and expansion strategy.

Planning logistics and transport

For freight companies, courier services, ride-hailing services, and other companies that manage fleets, it’s critical to incorporate geospatial context into business decision-making. Fleet management operations include optimizing last-mile logistics, analyzing telematics data from vehicles for self-driving cars, managing precision railroading, and improving mobility planning. Managing all of these operations relies extensively on geospatial context. Organizations can create a digital twin of their supply chain that includes geospatial data to mitigate supply chain risk, design for sustainability, and minimize their carbon footprint. 

Understanding and improving soil health and yield

AgTech companies and other organizations that practice precision agriculture can use a scalable analytics platform to analyze millions of acres of land. These insights help organizations understand soil characteristics and help them analyze the interactions among variables that affect crop production. Companies can load topography data, climate data, soil biomass data, and other contextual data from public data sources. They can then combine this information with data about local conditions to make better planting and land-management decisions. Mapping this information using geospatial analytics not only lets organizations actively monitor crop health and manage crops, but it can help farmers determine the most suitable land for a given crop and to assess risk from weather conditions.

Managing sustainable development

Geospatial data can help organizations map economic, environmental, and social conditions to better understand the geographies in which they conduct business. By taking into account environmental and socio-economic phenomena like poverty, pollution, and vulnerable populations, organizations can determine focus areas for protecting and preserving the environment, such as reducing deforestation and soil erosion. Similarly, geospatial data can help organizations design data-driven health and safety interventions. Geospatial analytics can also help an organization meet its commitments to sustainability standards through sustainable and ethical sourcing. Using geospatial analytics, organizations can track, monitor, and optimize the end-to-end supply chain from the source of raw materials to the destination of the final product.

What’s next

Google Cloud provides a full suite of geospatial analytics and machine learning capabilities that can help you make more accurate and sustainable business decisions without the complexity and expense of managing traditional GIS infrastructure. Get started today by learning how you can use Google Cloud features to get insights from your geospatial data, see Geospatial analytics architecture.

Acknowledgements: We’d like to thank Chad Jennings, Lak Lakshmanan, Kannappan Sirchabesan, Mike Pope, and Michael Hao for their contributions to this blog post and the Geospatial Analytics architecture.

Related Article

Leveraging Google geospatial AI to prepare for climate resilience

While there is uncertainty about how much the climate will change in the future, we know it won’t look like the past. Extreme weather eve…

Read Article

Source : Data Analytics Read More

Spark on Google Cloud: Serverless Spark jobs made seamless for all data users

Spark on Google Cloud: Serverless Spark jobs made seamless for all data users

Apache Spark has become a popular platform as it can serve all of data engineering, data exploration, and machine learning use cases. However, Spark still requires the on-premises way of managing clusters and tuning infrastructure for each job. Also, end to end use cases require Spark to be used along with technologies like TensorFlow, and programming languages like SQL and Python. Today, these operate in silos, with Spark on unstructured data lakes, SQL on data warehouses, and TensorFlow in completely separate machine learning platforms. This increases costs, reduces agility, and makes governance extremely hard; prohibiting enterprises from making insights available to the right users at the right time.

Announcing Spark on Google Cloud, now serverless and integrated

We are excited to announce Spark on Google Cloud, bringing industry’s first autoscaling serverless Spark, seamlessly integrated with the best of Google Cloud and open source tools, so you can effortlessly power ETL, data science, and data analytics use cases at scale. Google Cloud has been running large scale business critical Spark workloads for enterprise customers for 6+ years, using open source Spark in Dataproc. Today, we are furthering our commitment by enabling customers to:

Eliminate time spent managing Spark clusters: With serverless Spark, users submit their Spark jobs, and let them do auto-provision, and autoscale to finish.

Enable data users of all levels: Connect, analyze, and execute Spark jobs from the interface of users’ choice including BigQuery, Vertex AI or Dataplex, in 2 clicks, without any custom integrations.

Retain flexibility of consumption: No one size fits all. Use Spark as serverless, deploy on Google Kubernetes Engine (GKE), or on compute clusters based on the requirements.

With Spark on Google Cloud, we are providing a way for customers to use Spark in a cloud native manner (serverless), and seamlessly with tools used by data engineers, data analysts, and data scientists for their use cases. These tools will help customers on their way to realize the data platform redesign they have embarked on.

“Deutsche Bank is using Spark for a variety of different use cases. Migrating to GCP and adopting Serverless Spark for Dataproc allows us to optimize our resource utilization and reduce manual effort so our engineering teams can focus on delivering data products for our business instead of managing infrastructure. At the same time we can retain the existing code base and knowhow of our engineers, thus boosting adoption and making the migration a seamless experience.”—Balaji Maragalla, Director Big Data Platform, Deutsche Bank

“We see serverless Spark playing a central role in our data strategy. Serverless Spark will provide an efficient, seamless solution for teams that aren’t familiar with big data technology or don’t need to bother with idiosyncrasies of Spark to solve their own processing needs. We’re excited about the serverless aspect of the offering, as well as the seamless integration with BigQuery, Vertex AI, Dataplex and other data services.” —Saral Jain, Director of Engineering, Infrastructure and Data, Snap Inc.

Dataproc Serverless for Spark

Per IDC, developers spend 40% time writing code, and 60% of the time tuning infrastructure and managing clusters. Furthermore, not all Spark developers are infrastructure experts, resulting in higher costs and productivity impact. With serverless Spark, developers can spend all their time on the code and logic. They do not need to manage clusters or tune infrastructure. They submit Spark jobs from their interface of choice, and processing is auto-scaled to match the needs of the job. Furthermore, while Spark users today pay for the time the infrastructure is running, with serverless Spark they only pay for the job duration.

Spark through BigQuery

BigQuery, the leading data warehouse, now provides a unified interface for data analysts to write SQL or PySpark. The code is executed using serverless Spark seamlessly, without the need for infrastructure provisioning. BigQuery has been the pioneer for serverless data warehousing, and now supports serverless Spark for Spark-based analytics.

Spark through Vertex AI

Data scientists no longer need to go through custom integrations to use Spark with their notebooks. Through Vertex AI Workbench, they can connect to Spark with a single click, and do interactive development. With Vertex AI, Spark can easily be used together with other ML frameworks like TensorFlow, Pytorch, Sci-kit learn, and BigQuery ML. All the Google Cloud security, compliance, and IAM are automatically applied across Vertex AI and Spark. Once you are ready to deploy the ML models, the notebook can be executed as a Spark job in Dataproc, and scheduled as part of Vertex AI Pipelines.

Spark through Dataplex

Dataplex is an intelligent data fabric that enables organizations to centrally manage, monitor, and govern their data across data lakes, data warehouses, and data marts with consistent controls, providing access to trusted data and powering analytics at scale. Now, you can use Spark on distributed data natively through Dataplex. Dataplex provides a collaborative analytics interface, with 1-click access to SparkSQL, Notebooks, or PySpark, and the ability to save, share, search notebooks and scripts alongside data.

Flexibility of consumption

We understand one size does not fit all. Spark is available for consumption in 3 different ways based on your specific needs. For customers standardizing on Kubernetes for infrastructure management, run Spark on Google Kubernetes Engine (GKE) to improve resource utilization and simplify infrastructure management. For customers looking for Hadoop style infrastructure management, run Spark on Google Compute Engine (GCE). For customers, who’re looking for no-ops Spark deployment, use serverless Spark! 

ESG Senior Analyst Mike Leone commented, “Google Cloud is making Spark easier to use and more accessible to a wide range of users through a single, integrated platform. The ability to run Spark in a serverless manner, and through BigQuery and Vertex AI will create significant productivity improvement for customers. Further, Google’s focus on security and governance makes this Spark portfolio useful to all enterprises as they continue migrating to the Cloud.”

Getting started

Dataproc Serverless for Spark will be Generally Available within a few weeks. BigQuery and Dataplex integration is in Private Preview. Vertex AI workbench is available in Public Preview, you can get started here. For all capabilities, you can request for Preview access through this form.

You can work with Google Cloud partners to get started as well.

“We are excited to partner with Google Cloud as we look to provide our joint customers with the latest innovations on Spark. We see Spark being used for a variety of analytics and ML use cases. Google is taking Spark a step further by making it serverless, and available through BigQuery, Vertex AI and Dataplex for a wide spectrum of users.” —Sharad Kumar, Cloud First data and AI Lead at Accenture

For more information, visit our website or the watch announcement video and our conversation with Snap at Next 2021.

Source : Data Analytics Read More

Here’s what you missed at Next ’21

Here’s what you missed at Next ’21

Google Cloud Next ‘21 is over, but the learning is just beginning. With three days of keynotes, deep dives, and announcements, there was a lot to take in! But don’t worry if you missed something—the Google Cloud Blog team is here to round up our favorite announcements of Next ‘21.

The biggest announcements

You can catch up on all the Next announcements in this comprehensive list, but we know that’s a lot! Here are the standouts.

Living on the edge

You get a cloud … and you get a cloud! We think Oprah would approve of Google Distributed Cloud, announced during Monday’s Thomas Kurian keynote: a portfolio of fully managed hardware and software solutions that extend Google Cloud’s infrastructure and services to data centers and the edge.  Distributed Cloud is powered by Anthos, which also got a slate of upgrades this week including VM support, and you’ll find it useful in all sorts of situations from running low-latency edge workloads or private 5G/LTE solutions to meeting local sovereignty requirements. Reality at the edge is messy, but managing it doesn’t have to be.

Google security on your side

The Google Cybersecurity Action Team (GCAT) might sound like a cult-classic 80s Saturday morning cartoon lineup, but it’s also a group of security experts we’ve assembled to bring Google-grade security chops to governments and businesses around the world. You can rely on them for threat briefings, proven security blueprints, and strategic sessions designed to help you build a trusted cloud. To get things started, GCAT has released a Security and Resilience Framework using Google Cloud and partner technologies. Now we just need to work on a theme song.

AI breakthroughs for industry

Buzzwords begone. The whole point of machine learning and AI is to do something with it, something that helps your business. So we’ve made Contact Center AI (CCAI) Insights generally available, and also added Contract DocAI to our DocAI lineup. CCAI Insights helps you mine contact center interactions to create better customer experiences—whether your call center is staffed by humans or virtual agents. Contract DocAI makes it faster and less expensive to analyze contracts, the most critical documents of all. Both are business tools that solve real problems—no buzzwords necessary.

Sprucing up with the cleanest cloud

Google Cloud is proud of our sustainability track record as the cleanest cloud in the industry. But we want to help you go even further. With the newly-announced Carbon Footprint tool, every Google Cloud user—that means you!—can access the gross carbon emissions associated with the services you use in Google Cloud. Now you can measure, track, and report your carbon footprint. Plus we’ve integrated sustainability into Unattended Project Recommender, so you can reduce your footprint even further by deleting unattended projects.

Data analytics unite!

Unification was the theme of this year’s data announcements. Vertex AI Workbench launched in public preview—a single Jupyter-based environment for data scientists to complete all of their ML work, from experimentation, to deployment, to managing and monitoring models. But it’s not just for Vertex AI—you can also analyze data from BigQuery, Dataproc, Spark, and Looker in one interface.

And don’t sleep on the private alpha launch of BigQuery Omni, which takes that theme of unification even further by allowing you to analyze data from other clouds using BigQuery. Later this month, you’ll be able to securely query S3 data in AWS or Azure Blob Storage data in Azure directly through the familiar BigQuery user interface, bringing the power of BigQuery to where the data resides. 

Collaboration gets cloudier

Google Workspace (the artist formerly known as G Suite) is a core part of Google Cloud, and we announced all sorts of exciting updates and integrations to our collaboration products this year. Client-side encryption for Google Meet, Data Loss Prevention (DLP) for Chat, and Drive labels for sensitive files are all new at Next. Perhaps most critical for organizations is our just-announced Work Safer Program, which helps protect your Google Workspace users against rising cybersecurity threats with industry-leading solutions from Google and our partners.

Keeping the band together

Throughout Next, we were proud to celebrate our most innovative customers, partners, and community members. And we want to keep the good vibes going all year long with our new Innovators community. You’ll get the inside scoop on our roadmap, get access to exclusive events, and much more. Everyone is welcome to join, and we’ll have all sorts of cool opportunities for Innovators coming up. Join the program today to stay informed and come along on the journey. 

Keynotes and sessions

The live sessions are over, but you can still register to view sessions on demand through November 5th. We’ve created a collection of themed playlists to guide you—whether you’re a developer, an executive, or an industry expert, you’ll find something helpful here. If you’ve got time for just one session, we recommend CEO Thomas Kurian’s keynote, which covers many of this year’s biggest announcements.

Thanks again for learning and growing with us in 2021. We’ll have more Next recaps and breakdowns coming up on the blog in the weeks to come—stay tuned!

Related Article

Solving for What’s Next

Exciting announcements, customer stories, and technical deep dives headline this year’s Google Cloud Next. Thomas Kurian reveals the late…

Read Article

Source : Data Analytics Read More

Accelerate SAP innovation with Google Cloud Cortex Framework

Accelerate SAP innovation with Google Cloud Cortex Framework

Digital transformation is about gaining speed, agility, and efficiency. The faster and more easily your organization operates on a modern cloud platform, the sooner it can experience the benefits.

Today, we are excited to introduce Google Cloud Cortex Framework, a foundation of endorsed solution reference templates and content for customers to accelerate business outcomes with less risk, complexity, and cost. Google Cloud Cortex Framework allows you to kickstart insights and reduce time-to-value with reference architectures, packaged services, and deployment accelerators that guide you from planning to delivery so you can get up and running quickly. You can deploy templatized solutions from Google Cloud and our trusted partners for specific use cases and business scenarios in a faster, more cost-effective way.

Our data foundation release

In our first release, customers can take advantage of a rich data foundation of building blocks and templates for SAP environments. Customers can leverage our:

Scalable data cloud foundation to combine the best of SAP and non-SAP data to drive new insights; 

Pre-defined BigQuery operational data marts and change data capture (CDC) processing scripts to take the guesswork out of modeling and data processing; and 

BigQuery ML templates, which provide advanced machine-learning capabilities for common business scenarios such as Product Recommendations and Customer Segmentation. 

See below for an example of some of these templates within BigQuery.

Together with plug-and-play Looker dashboard templates, customers can gain fast insights into sales, orders, products, customers, and much more. But this is just the beginning. We see Google Cloud Cortex Framework as a “content factory” that will expand to address new use cases, incorporate best practices, industry scenarios, and build on our cumulative experiences in enterprise environments.

“At Google Cloud, our goal is to make it as easy as possible for SAP customers to modernize in the cloud,” says Abdul Razack, VP, Solutions Engineering, Technology Solutions and Strategy, Google Cloud. “Google Cloud Cortex Framework is our latest innovation to that end. With readily available reference architectures and other tools, SAP customers now have what they need to design, build, and deploy advanced cloud solutions and accelerate business outcomes.”

Get up and running quickly

The Google Cloud Cortex Framework helps us answer a common question we hear from our customers: “How do I get started?” Google Cloud Cortex Framework can help customers with an off-the-shelf packaged approach that they can implement and customize to their own specifications and provides multiple benefits:

Accelerate business outcomes with easy-to-leverage, scenario-driven reference architectures and content that remove the guesswork from deployments. Expedite value with line-of-business and industry example solutions and packaged services from Google Cloud and partners.

Reduce risk, complexity, and cost with proven deployment templates. Deploy the industry’s most advanced cloud-native capabilities at a fraction of the time and cost of from-scratch, in-house efforts. Support business process improvement with accurate and relevant insights to quickly deliver differentiating capabilities to your customers.

Leverage a scalable technology strategy for future innovation by standardizing on a reusable data and analytics architecture. Easily identify and support the innovative technologies required to deliver a full range of current and future scenarios. Provide the building blocks and blueprints you need to prepare for the future, and upskill your team so they can deploy the technology you need to support your business objectives today and tomorrow.

Our partner ecosystem makes Google Cloud Cortex Framework possible

Today’s launch of Google Cloud Cortex Framework includes support from a large ecosystem of partners such as Accenture, Infosys, Palantir, C3.AI, Informatica, HVR, Qlik, Pluto7, ATOS, CapGemini, Cognizant, Deloitte, HCL, Lemongrass, NIMBL, PwC, SpringML and TCS who will be offering solutions and services to accelerate customer innovation. These partners are adopting and augmenting Google Cloud Cortex Framework to enable customers to more rapidly deploy and drive value for their organizations. With vast customer and partner interest in advancing data landscapes leveraging Google Cloud, we will continue to develop the ecosystem of Google Cloud Cortex Framework partners.

As foundational partners, Accenture and Infosys have been instrumental in our solution engineering efforts, leveraging their strengths in the data and analytics space.  

“Organizations today rely on increasing volumes of data to quickly react and respond to change. To handle the high volume and variety of data from disparate sources, our clients need a modern data foundation that can respond rapidly to those growing demands. Google’s Cortex enables us to align our assets and industry solution models into a consistent architecture for our clients to drive business agility, customer intimacy, and real-time decision-making.” – Tom Stuermer, global lead of Accenture Google Business Group at Accenture.

“Infosys is excited to partner with Google Cloud to drive the adoption of Google Cloud Cortex Framework, unlocking value from SAP and non-SAP data and enabling insights-driven digital enterprises across multiple industry domains within our large SAP customer base. Google Cloud Cortex Framework complements Infosys Cobalt that brings together our extensive SAP, data analytics and Google Cloud capabilities to help clients fast-track their cloud adoption and accelerate their business transformation.” – Sunil Senan, SVP and Business Head – Data & Analytics, Infosys

Building on decades of innovation with Google Cloud Cortex Framework

To illustrate the opportunities that Google Cloud Cortex Framework will bring to our customers, we developed an initial release that combines multiple Google data sets with SAP enterprise data. By leveraging machine learning and other Google technologies, companies can deliver new analytics and gain new insights. An example of this is demand shaping.

Demand shaping will benefit line-of-business executives and supply-chain professionals, who can leverage the Google Cloud Cortex Framework reference architecture to improve supply-chain operations by improving business processes or accelerating time-to-insight with analytics. Chief data officers (or any executive responsible for data and analytics) will also benefit by saving time, building on reusable components, and following best practices to get innovative cloud solutions up and running as quickly, effectively, and efficiently as possible. Today’s enterprises can use Google Cloud Cortex Framework to create a reusable architecture that can adapt and expand to new scenarios to gain better visibility into signals that influence demand forecasts. 

Of course, Google Cloud customers aren’t just interested in scenarios that apply to the data and analytics space. Future Google Cloud Cortex Framework offerings will help provide recommended approaches to better implement use cases in consumer-facing industries, including consumer packaged goods and supply chain and the delivery of improved customer experiences, as well as infrastructure and application workload management integration—all to drive insights to execution and improve automation of business processes. The common denominator will always be the ability to not only reduce the time and effort to deploy and manage each solution, but also to develop a technology strategy that can scale above and beyond an individual scenario or use case. 

Are you interested in learning more? Watch our session at Google Cloud Next ’21 and fill out this form to connect with our solution experts on the latest content, deployment options and free tailor-made innovation discovery workshops.

Related Article

Read Article

Source : Data Analytics Read More

Turn data into value with a unified and open data cloud

Turn data into value with a unified and open data cloud

Today at Google Cloud Next we are announcing innovations that will enable data teams to simplify how they work with data and derive value from it faster. These new solutions will help organizations build modern data architectures with real-time analytics to power innovative, mission-critical, data-driven applications. 

Too often, even the best minds in data are constrained by ineffective systems and technologies. A recent study showed that only 32%of companies surveyed gained value from their data investments. Previous approaches have resulted in difficult to access, slow, unreliable, complex, and fragmented systems. 

At Google Cloud, we are committed to changing this reality by helping customers simplify their approach to data to build their data clouds. Google Cloud’s data platform is simply unmatched for speed, scale, security, and reliability for any size organization with built-in, industry-leading machine learning (ML) and artificial intelligence (AI), and an open standards-based approach.

Vertex AI and data platform services unlock rapid ML modeling 

With the launch of Vertex AI in May 2021, we empowered data scientists and engineers to build reliable, standardized AI pipelines that take advantage of the power of Google Cloud’s data pipelines. Today, we are taking this a step further with the launch of Vertex AI Workbench, a unified user experience to build and deploy ML models faster, accelerating time-to-value for data scientists and their organizations. We’ve integrated data engineering capabilities directly into the data science environment, which lets you ingest and analyze data, and deploy and manage ML models, all from a single interface.

Data scientists can now build and train models 5X faster on Vertex AI than on traditional notebooks. This is primarily enabled by integrations across data services (like Dataproc, BigQuery, Dataplex, and Looker), which significantly reduce context switching. The unified experience of Vertex AI let’s data scientists coordinate, transform, secure and monitor Machine Learning Operations (MLOps) from within a single interface, for their long-running, self-improving, and safely-managed AI services.

“As per IDC’s AI StrategiesView 2021, model development duration, scalable deployment, and model management are three of the top five challenges in scaling AI initiatives,” said Ritu Jyoti, Group Vice President, AI and Automation Research Practice at IDC. “Vertex AI Workbench provides a collaborative development environment for the entire ML workflow – connecting data services such as BigQuery and Spark on Google Cloud, to Vertex AI and MLOps services. As such, data scientists and engineers will be able to deploy and manage more models, more easily and quickly, from within one interface.”

Ecommerce company, Wayfair, has transformed its merchandising capabilities with data and AI services. “At Wayfair, data is at the center of our business. With more than 22 million products from more than 16,000 suppliers, the process of helping customers find the exact right item for their needs across our vast ecosystem presents exciting challenges,” said Matt Ferrari, Head of Ad Tech, Customer Intelligence, and Machine Learning; Engineering and Product at Wayfair. “From managing our online catalog and inventory, to building a strong logistics network, to making it easier to share product data with suppliers, we rely on services including BigQuery to ensure that we are able to access high-performance, low-maintenance data at scale. Vertex AI Workbench and Vertex AI Training accelerate our adoption of highly scalable model development and training capabilities.”

BigQuery Omni: Breaking data silos with cross-cloud analytics and governance

Businesses across a variety of industries are choosing Google Cloud to develop their data cloud strategies and better predict business outcomes — BigQuery is a key part of that solution portfolio. To address complex data management across hybrid and multicloud environments, this month we are announcing the general availability of BigQuery Omni, which allows customers to analyze data across Google Cloud, AWS, and Azure. Healthcare provider, Johnson and Johnson was able to combine data in Google Cloud and AWS S3 with BigQuery Omni without needing data to migrate. 

This flexible, fully-managed, cross-cloud analytics solution allows you to cost-effectively and securely answer questions and share results from a single pane of glass across your datasets, wherever you are. In addition to these multicloud capabilities, Dataplex will be generally available this quarter to provide an intelligent data fabric that enables you to keep your data distributed while making it securely accessible to all your analytics tools.

Spark on Google Cloud simplifies data engineering 

To help make data engineering even easier, we are announcing the general availability of Spark on Google Cloud, the world’s first autoscaling and serverless Spark service for the Google Cloud data platform. This allows data engineers, data scientists, and data analysts to use Spark from their preferred interfaces without data replication or custom integrations. Using this capability, developers can write applications and pipelines that autoscale without any manual infrastructure provisioning or tuning. This new service makes Spark a first class citizen on Google Cloud, and enables customers to get started in seconds and scale infinitely, regardless if you start in BigQuery, Dataproc, Dataplex, or Vertex AI.

Spanner meets PostgreSQL: global, relational scale with a popular interface

We’re continuing to make Cloud Spanner, our fully managed, globally scalable, relational database, available to more customers now with a PostgreSQL interface, now in preview. With this new PostgreSQL interface, enterprises can take advantage of Spanner’s unmatched global scale, 99.999% availability, and strong consistency using skills and tools from the popular PostgreSQL ecosystem. 

This interface supports Spanner’s rich feature set that uses the most popular PostgreSQL data types and SQL features to reduce the barrier to entry for building transformational applications. Using the tools and skills they already have, developer teams gain flexibility and peace of mind because the schemas and queries they build against the PostgreSQL interface can be easily ported to another Postgres environment. Complete this form to request access to the preview.

Our commitment to the PostgreSQL ecosystem has been long standing. Customers choose Cloud SQL for the flexibility to run PostgreSQL, MySQL and SQL Server workloads. Cloud SQL provides a rich extension collection, configuration flags, and open ecosystem, without the hassle of database provisioning, storage capacity management, or other time-consuming tasks.

Auto Trader has migrated approximately 65% of their Oracle footprint to Cloud SQL, which remains a strategic priority for the company. Using Cloud SQL, BigQuery, and Looker to facilitate access to data for their users, and with Cloud SQL’s fully managed services, Auto Trader’s release cadence has improved by over 140% (year-over-year), enabling an impressive peak of 458 releases to production in a single day.

Looker integrations make augmented analytics a reality

We are announcing a new integration between Tableau and Looker that will allow customers to operationalize analytics and more effectively scale their deployments with trusted, real-time data, and less maintenance for developers and administrators. Tableau customers will soon be able to leverage Looker’s semantic model, enabling new levels of data governance while democratizing access to data. They will also be able to pair their enterprise semantic layer with Tableau’s leading analytics platform. The future might be uncertain, but together with our partners we can help you plan for it. 

We remain committed to developing new ways to help organizations go beyond traditional business intelligence with Looker. In addition to innovating within Looker, we’re continuing to integrate within other parts of Google Cloud. Today, we are sharing new ways to help customers deliver trusted data experiences and leverage augmented analytics to take intelligent action. 

First, we’re enabling you to democratize access to trusted data in tools where you are already familiar. Connected Sheets already allows you to interactively explore BigQuery data in a familiar spreadsheet interface and will soon be able to leverage the governed data and business metrics in Looker’s semantic model. It will be available in preview by the end of this year. 

Another integration we’re announcing is Looker’s Solution for Contact Center AI, which helps you gain a deeper understanding and appreciation of your customers’ full journey by unlocking insights from all of your company’s first-party data, such as contextualizing support calls to make sure your most valuable customers receive the best service. 

We’re also sharing the new Looker Block for Healthcare NLP API, which provides simplified access to intelligent insights from unstructured medical text. Compatible with Fast Healthcare Interoperability Resources (FHIR), healthcare providers, payers, and pharma companies can quickly understand the context and relationships of medical concepts within the text, and in turn, can begin to link this to other clinical data sources for additional AI and ML actions. 

Bringing the best of Google together with Google Earth Engine and Google Cloud

We are thrilled to announce the preview of Google Earth Engine on Google Cloud. This launch makes Google Earth Engine’s 50+ petabyte catalog of satellite imagery and geospatial data sets available for planetary-scale analysis. Google Cloud customers will be able to integrate Earth Engine with BigQuery, Google Cloud’s ML technologies, and Google Maps Platform. This gives data teams a way to better understand how the world is changing and what actions they can take — from sustainable sourcing, to saving energy and materials costs, to understanding business risks, to serving new customer needs. 

For over a decade, Earth Engine has supported the work of researchers and NGOs from around the world, and this new integration brings the best of Google and Google Cloud together to empower enterprises to create a sustainable future for our planet and for your business.

At Google Cloud, we are deeply grateful to work with companies of all sizes, and across industries, to build their data clouds. Join my keynote session to hear how organizations are leveraging the full power of data, from databases to analytics that support decision making to AI and ML that predict and automate the future. We’ll also highlight our latest product innovations for BigQuery, Spanner, Looker, and Vertex AI.

I can’t wait to hear how you will turn data into intelligence and look forward to connecting with you.

Related Article

New Google Cloud innovations to unify your data cloud

Google Cloud unveils news data analytics products and services to support open data cloud.

Read Article

Source : Data Analytics Read More

Introducing Intelligent Products Essentials: helping manufacturers build AI-powered smart products, faster

Introducing Intelligent Products Essentials: helping manufacturers build AI-powered smart products, faster

Expectations for both consumer and commercial products have changed. Consumers want products that evolve with their needs, adapt to their preferences, and stay up-to-date over time. Manufacturers, in turn, need to create products that provide engaging customer experiences not only to better compete in the marketplace, but also to provide new monetization opportunities. 

However, embedding intelligence into new and existing products is challenging. Updating hardware is costly, and existing connected products do not have the capability to add new features. Furthermore, manufacturers do not have sufficient customer insights due to product telemetry and customer data silos, and may lack the AI expertise to quickly develop and deploy these features. 

That’s why today we’re launching Intelligent Products Essentials, a solution that allows manufacturers to rapidly deliver products that adapt to their owners, update features over-the-air using AI at the edge, and provide customer insights using analytics in the cloud. The solution is designed to assist manufacturers in their product development journeys—whether developing a new product or enhancing existing ones. 

With Intelligent Products Essentials, manufacturers can:

Personalize customer experiences: Provide a compelling ownership experience that evolves over the lifetime of the product. For example, a chatbot that contextualizes responses based on product status and customer profile.

Manage and update products over-the-air: Deploy updates to products in the field,  gather performance insights and evolve capabilities over time with monetization opportunities.

Predict parts and service issues: Detect operating thresholds, anomalies and predict failures to proactively recommend service using AI, reducing warranty claims, decreasing parts shortages and increasing customer satisfaction.

In order to help manufacturers quickly deploy these use cases and many more, Intelligent Products Essentials provides the following:

Edge connections: Connect and ingest raw or time-series product telemetry from various device platforms utilizing IoT Core or Pub/Sub and enable deployment and management of firmware over-the-air and machine learning models with Vertex AI at the edge.

Ownership App Template: Easily build connected product companion apps that work on smartphones, tablets, and computers. Use a pre-built API and accompanying sample app that can incorporate product or device registration, identity management, and provide application behavior analytics using Firebase.

Product fleet management: Manage, update and analyze fleets of connected products via APIs, Google Kubernetes Engine, and Looker.

AI services: Create new features or capabilities for your products using AI and machine learning products such as DialogFlow, Vision AI, AutoML, all from Vertex AI.

Enterprise data integration: Integrate data sources such as Enterprise Asset Management (EAM), Enterprise Resource Planning (ERP), Customer Relationship Management (CRM) systems and others using Dataflow and BigQuery.

Intelligent Products Essentials helps manufacturers build new features across consumer, industrial, enterprise, and transportation products. Manufacturers can implement the solution in-house, or work with one of our certified solution integration partners like Quantifi and Softserve

“The focus on intelligent products that Google Cloud is deploying provides a digital option for manufacturers and users. At its heart, systems like Intelligent Product Essentials are all about decision making. IDC sees faster and more effective decision-making as the fundamental reason for the drive to digitize products and processes. It’s how you can make faster and more effective decisions to meet heightened customer expectations, generate faster cash flow, and better revenue realization,” said Kevin Prouty, Group Vice President at IDC. “Digital offerings like Google’s Intelligent Product Essentials potentially go the last mile with the ability to connect the digital thread all the way through to the final user.”

Customers adopting Intelligent Products Essentials

GE Appliances, a Haier company, are enhancing their appliances using new AI-powered intelligent features to enable:

Intelligent cooking: Help cook the perfect meal to personal preferences, regardless of your expertise and abilities in the kitchen.

Frictionless service: Build smart appliances that know when they need maintenance and make it simple to take action or schedule services.

Integrated digital lifestyle: Make appliances useful at every step of the way by integrating them with digital lifestyle services – for example, automating appliance behaviors according to customer calendars, such as oven preheating or scheduling the dishwasher to run in the late evening.

“Intelligent Products Essentials enhances our smart appliances ecosystem, offering richer consumer habit insights. This enables us to develop and offer new features and experiences to integrate with their digital lifestyle.“ —Shawn Stover, Vice-president Smart Home Solutions at GE Appliances.

Serial 1, Powered by Harley-Davidson, is using Intelligent Product Essentials to manage and update its next generation eBicycles, and personalize its customers’ digital ownership experiences. 

“At Serial 1, we are dedicated to creating the easiest and most intuitive way to experience the fun, freedom, and adventure of riding a pedal-assist electric bicycle. Connectivity is a key component of delivering that mission, and working together to integrate Intelligent Product Essentials into our eBicycles will ensure that our customers enjoy the best possible user experience.”— Jason Huntsman, President, Serial 1. 

Magic Leap, an augmented reality pioneer with industry-leading hardware and software, is building field service solutions with Intelligent Products Essentials with the goal of connecting manufacturers, dealers, and customers to more proactive and intelligent service.

“We look forward to using Intelligent Products Essentials to enable us to rapidly integrate manufacturers’ product data with dealer service partners into our field service solution. We’re excited to partner with Google Cloud as we continue to push the boundaries of physical interaction with the digital world.” — Walter Delph, Chief Business Officer, Magic Leap

Intelligent Product Essentials is available today. To learn more, visit our website.

Related Article

What is Cloud IoT Core?

Cloud IoT Core is a managed service to securely connect, manage, and ingest data from global device fleets

Read Article

Source : Data Analytics Read More

Analyzing Twitter sentiment with new Workflows processing capabilities

Analyzing Twitter sentiment with new Workflows processing capabilities

The Workflows team recently announced the general availability of iteration syntax and connectors

Iteration syntax supports easier creation and better readability of workflows that process many items. You can use a for loop to iterate through a collection of data in a list or map, and keep track of the current index. If you have a specific range of numeric values to iterate through, you can also use range-based iteration

Click to enlarge

Connectors have been in preview since January. Think of connectors like client libraries for workflows to use other services. They handle authentication, request formats, retries, and waiting for long-running operations to complete. Check out our previous blog post for more details on connectors. Since January, the number of available connectors has increased from 5 to 20.

The combination of iteration syntax and connectors enables you to implement robust batch processing use cases. Let’s take a look at a concrete sample. In this example, you will create a workflow to analyze sentiments of the latest tweets for a Twitter handle. You will be using the Cloud Natural Language API connector and iteration syntax.

APIs for Twitter sentiment analysis

The workflow will use the Twitter API and Natural Language API. Let’s take a closer look at them.

Twitter API 

To use the Twitter API, you’ll need a developer account. Once you have the account, you need to create an app and get a bearer token to use in your API calls. Twitter has an API to search for Tweets. 

Here’s an example to get 100 Tweets from the @GoogleCloudTech handle using the Twitter search API:

Natural Language API

Natural Language API uses machine learning to reveal the structure and meaning of text. It has methods such as sentiment analysis, entity analysis, syntactic analysis, and more. In this example, you will use sentiment analysis. Sentiment analysis inspects the given text and identifies the prevailing emotional attitude within the text, especially to characterize a writer’s attitude as positive, negative, or neutral.

You can see a sample sentiment analysis response here. You will use the score of documentSentiment to identify the sentiment of each post. Scores range between -1.0 (negative) and 1.0 (positive) and correspond to the overall emotional leaning of the text. You will also calculate the average and minimum sentiment score of all processed tweets.

Define the workflow

Let’s start building the workflow in a workflow.yaml file.

In the init step, read the bearer token, Twitter handle, and max results for the Twitter API as runtime arguments. Also initialize some sentiment analysis related variables:

In the searchTweets step, fetch tweets using the Twitter API:

In the processPosts step, analyze each tweet and keep track of the sentiment scores. Notice how each tweet is analyzed using the new for-in iteration syntax with its access to the current index.

Under the processPosts step, there are multiple substeps. The analyzeSentiment step uses the Language API connector to analyze the text of a tweet and the next two steps calculate the total sentiment and keep track of the minimum sentiment score and index:

Once outside the processPosts step, calculate the average sentiment score, and then log and return the results

Deploy and execute the workflow

To try out the workflow, let’s deploy and execute it.

Deploy the workflow:

Execute the workflow (don’t forget to pass in your own bearer token):

After a minute or so, you should see the see the result with sentiment scores:

Next

Thanks to the iteration syntax and connectors, we were able to read and analyze Tweets in an intuitive and robust workflow with no code. Please reach out to @meteatamel and krisabraun@ for questions and feedback.

Twitter sentiment analysis on GitHub.

Share feedback, interesting use cases and customer requests

Related Article

Introducing Workflows callbacks

Introducing Workflows callbacks. Thanks to callbacks, you can put a human being or autonomous system into the loop. If your processes req…

Read Article

Source : Data Analytics Read More

Dataflow Pipelines, deploy and manage data pipelines at scale

Dataflow Pipelines, deploy and manage data pipelines at scale

We see data engineers use Dataflow for a wide variety of their data processing needs, ranging from ingesting data into their data warehouses and data lakes to processing data for machine learning use cases to implementing sophisticated streaming analytics applications. While the use cases and what customers do varies, there is one common need that all of these users have: the need to create, monitor and manage dozens, if not hundreds of, Dataflow jobs. As a result, users have asked us for a scalable way to schedule, observe and troubleshoot Dataflow jobs. 

We are excited to announce a new capability – Dataflow Pipelines – into Preview that address the problem of managing Dataflow jobs at scale. Dataflow Pipelines introduces a new management abstraction – Pipelines – that map to the logical pipelines that users care about and provides a single pane of glass view for observation and management.

With Data Pipelines, data engineers can easily perform tasks such as the following. 

Running jobs on a recurring schedule: With Data Pipelines, users can “schedule” recurrent batch jobs by just providing a schedule in cron format. The pipeline will then automatically create Dataflow jobs as per the schedule. The input file names can be parameterized for incremental batch pipeline processing. Dataflow uses Cloud Scheduler to schedule the jobs.

Creating and tracking SLO: One of the key monitoring goals is to ensure that data pipelines are delivering data that the downstream business teams need. In the past, it was not easy to define SLOs and set up alerts on those. With Data Pipelines, SLO configuration and alerting is natively supported and users can define them easily at Pipeline level.

Health monitoring & Tracking: Data Pipelines makes it easy to monitor and reason about your pipelines by providing aggregated metrics on a project and at a pipeline level. These metrics (both batch and streaming) along with history of previous execution runs provide a detailed overview of the pipelines at a glance. In addition, the ability to easily identify problematic jobs and dive into the job level pages makes troubleshooting easier.

Here is a short video that provides an overview of Data Pipelines.

If you have any feedback or questions, please write to us at google-data-pipelines-feedback@googlegroups.com.

Source : Data Analytics Read More