Blog

Pro tools for Pros: Industry leading observability capabilities for Dataflow

Pro tools for Pros: Industry leading observability capabilities for Dataflow

Dataflow is the industry-leading unified platform offering batch and stream processing. It is a fully managed service that comes with flexible development options (from Flex Templates & Notebooks to Apache Beam SDKs for Java, Python and Go) and a rich set of built-in management tools. It comes with seamless integrations with all Google Cloud products, such as Pub/Sub, BigQuery, VertexAI, GCS, Spanner, and BigTable, as well as third-party services and products, such as Kafka and AWS S3, to best meet your data movement use cases.

While our customers value these capabilities, they continue to push us to innovate and provide more value as the best batch and streaming data processing service to meet their ever-changing business needs. 

Observability is a key area where the Dataflow team continues to invest more based on customer feedback. Adequate visibility into the state and performance of the Dataflow jobs is essential for business critical production pipelines. 

In this post, we will review Dataflow’s  key observability capabilities:

Job visualizers – job graphs and execution details

New metrics & logs

New troubleshooting tools – error reporting, profiling, insights

New Datadog dashboards & monitors

Dataflow observability at a glance

There is no need to configure or manually set up anything; Dataflow offers observability out of the box within the Google Cloud Console, from the time you deploy your job. Observability capabilities are seamlessly integrated with Google Cloud Monitoring and Logging along with other GCP products. This integration gives you a one-stop shop for observability across multiple GCP products, which you can use to meet your technical challenges and business goals.

Understanding your job’s execution: job visualizers

Questions: What does my pipeline look like? What’s happening in each step? Where’s the time spent?

Solution: Dataflow’s Job graph and Execution details tabs answer these questions to help you understand the performance of various stages and steps within the job

Job graph illustrates the steps involved in the execution of your job, in the default Graph view. The graph gives you a view of how Dataflow has optimized your pipeline’s code for execution, after fusing  (optimizing) steps to stages. TheTable view informs you more about each step and the associated fused stages and time spent in each step and their statuses as the pipeline continues execution. Each step in the graph displays more information, such as the input and output collections and output data freshness; these help you analyze the amount of work done at this step (elements processed) and the throughput for it.

Fig 1. Job graph tab showing the DAG for a job and the key metrics for each stage on the right.

Execution Details has all the information to help you understand and debug the progress of each stage within your job. In the case of streaming jobs, you can view the data freshness of each stage. The Data freshness by stages chart includes anomaly detection: it highlights “potential slowness” and “potential stuckness” to help you narrow down your investigation to a particular stage. Learn more about using the Execution details tab for batch and streaming here.

Fig 2. The execution details tab showing data freshness by stage over time, providing anomaly warnings in data freshness.

Monitor your job with metrics and logs

Questions:  What’s the state and performance of my jobs? Are they healthy? Are there any errors? 

Solution:  Dataflow offers several metrics to help you monitor your jobs. 

A full list of Dataflow job metrics can be found in our metrics reference documentation. In addition to the Dataflow service metrics, you can view worker metrics, such as CPU utilization and memory usage. Lastly, you can generate Apache Beam custom metrics from your code.

Job metrics is the one-stop shop to access the most important metrics for reviewing the performance of a job or troubleshooting a job. Alternatively, you can access this data from the Metrics Explorer to build your own Cloud Monitoring dashboards and alerts. 

Job and worker Logs are one of the first things that you can look at when you deploy a pipeline. You can access both these log types in the Logs panel on the Job details page. 

Job logs include information about startup tasks, fusion operations, autoscaling events, worker allocation, and more. Worker logs include information about work processed by each worker within each step in your pipeline.

You can configure and modify the logging level and route the logs using the guidance provided in our pipeline log documentation. 

Logs are seamlessly integrated into Cloud Logging. You can write Cloud Logging queries, create log-based metrics, and create alerts on these metrics. 

New: Metrics for streaming Jobs

Questions: Is my pipeline slowing down or getting stuck? I want to understand how my code is impacting the job’s performance. I want to see how my sources and sinks are performing with respect to my job

Solution: We have introduced several new metrics for Streaming Engine jobs that help answer these questions. Notable metrics are listed below. All of these are now instantly accessible from the Job metrics tab.

The engineering teams at the Renault Group have been using Dataflow for their streaming pipelines as a core part of their digital transformation journey.

“Deeper observability of our data pipelines is critical to track our application SLOs,”said Elvio Borrelli, Tech Lead – Big Data at the Renault Digital Transformation & Data team. “The new metrics, such as backlog seconds and data freshness by stage, now provide much better visibility about our end-to-end pipeline latencies and areas of bottlenecks. We can now focus more on tuning our pipeline code and data sources for the necessary throughput and lower latency”.

To learn more about using these metrics in the Cloud console, please see the Dataflow monitoring interface documentation.

Fig 3. The Job metrics tab showing the autoscaling chart and the various metrics categories for streaming jobs.

To learn how to use these metrics to troubleshoot common symptoms within your jobs, watch this webinar on Dataflow Observability: Dataflow Observability, Monitoring, and Troubleshooting 

Debug job health using Cloud Error Reporting

Problem: There are a couple of errors in my Dataflow job. Is it my code, data, or something else? How frequently are these happening?

Solution: Dataflow offers native integration with Google Cloud Error Reporting to help you identify and manage errors that impact your job’s performance.

In the Logs panel on the Job details page, the Diagnostics tab tracks the most frequently occurring errors. This is integrated with Google Cloud Error Reporting, enabling you to manage errors by creating bugs or work items or by setting up notifications. For certain types of Dataflow errors, Error Reporting provides a link to troubleshooting guides and solutions.

Fig 4. The diagnostics tab in the Log panel displaying top errors and their frequency.

New: Troubleshoot performance bottlenecks using Cloud Profiler

Problem: What part of my code is taking more time to process the data? What operations are consuming more CPU cycles or memory?

Solution: Dataflow offers native integration with Google Cloud Profiler, which lets you profile your jobs to understand the performance bottlenecks using CPU, memory, and I/O operation profiling support.

Is my pipeline’s latency high? Is it CPU intensive or is it spent time waiting for I/O operations? Or is it memory intensive? If so, which operations are driving this up? The flame graph helps you find answers to these questions. You can enable profiling for your Dataflow jobs by specifying a flag during job creation or while updating your job. To learn more see the Monitor pipeline performance documentation.

Fig 5. The CPU time profiler for showing the flame graph for the Dataflow job.

New: Optimize your jobs using Dataflow insights

Problem: What can Dataflow tell me about improving my job performance or reducing its costs?

Solution: You can review Dataflow Insights to improve performance or to reduce costs. Insights are enabled by default on your batch and streaming jobs; they are generated by auto-analyzing your jobs’ executions.

Dataflow insights is powered by the Google Active Assist’s Recommender service. It is automatically enabled for all jobs and is available free of charge. Insights include recommendations such as enabling autoscaling, increasing maximum workers, and increasing parallelism. Learn more about Dataflow insights in the Dataflow Insights documentation.

Fig 6. Dataflow Insights show up in  the Jobs overview page next to the active jobs.

New: Datadog Dashboards & Recommended Monitors

Problem: I would like to monitor Dataflow in my existing monitoring tools, such as Datadog.

Solution: Dataflow’s metrics and logs are accessible in observability tools of your choice, via Google Cloud Monitoring and Logging APIs. Customers using Datadog can now leverage the out of the box Dataflow dashboards and recommended monitors to monitor their Dataflow jobs alongside other applications within the Datadog console. Learn more about Dataflow Dashboards and Recommended Monitors in their blog post on how to monitor your Dataflow pipelines with Datadog.

Fig 7. Datadog dashboard monitoring Dataflow jobs across projects

ZoomInfo, a global leader in modern go-to-market software, data, and intelligence, is partnering with Google Cloud to enable customers to easily integrate their business-to-business data into Google BigQuery. Dataflow is a critical piece of this data movement journey.

“We manage several hundreds of concurrent Dataflow jobs,” said Hasmik Sarkezians, ZoomInfo Engineering Fellow. “Datadog’s dashboards and monitors allow us to easily monitor all the jobs at scale in one place. And when we need to dig deeper into a particular job, we leverage the detailed troubleshooting tools in Dataflow such as Execution details, worker logs and job metrics to investigate and resolve the issues.”

What’s Next

Dataflow is leading the batch and streaming data processing industry with the best in class observability experiences. 

But we are just getting started. Over the next several months, we plan to introduce more capabilities such as:

Memory observability to detect and prevent potential out of memory errors.

Metrics for sources & sinks, end-to-end latency, bytes being processed by a PTransform, and more.

More insights – quota, memory usage, worker configurations & sizes.

Pipeline validation before job submission.

Debugging user-code and data issues using data sampling.

Autoscaling observability improvements.

Project-level monitoring, sample dashboards, and recommended alerts.

Got feedback or ideas? Shoot them over, or take this short survey.

Getting Started

To get started with Dataflow see the  Cloud Dataflow quickstarts.

To learn more about Dataflow observability, review these articles:

Using the Dataflow monitoring interface

Building production-ready data pipelines using Dataflow: Monitoring data pipelines

Beam College: Dataflow Monitoring

Beam College: Dataflow Logging 

Beam College: Troubleshooting and debugging Apache Beam and GCP Dataflow

Source : Data Analytics Read More

AI Technology Helps eCommerce Brands Optimize for Mobile

AI Technology Helps eCommerce Brands Optimize for Mobile

Not unless you live in the most remote part of this world or somewhere underground, chances are that you have heard something about Artificial Intelligence (AI). But how does AI technology help eCommerce brands optimize for mobile?

Artificial Intelligence is becoming a big part of how different industries operate. The popularity of smart devices, security checks, research in the healthcare industry, and self-checkout registers are just a few examples of areas where AI is prominent.

The eCommerce industry has not been left behind. eCommerce business owners are looking for ways to use AI to improve their customers’ experience, increase sales, and streamline operations. 

Here are a few ways AI technology helps eCommerce brands optimize for mobile;

Consumer Data Analysis

AI technology allows eCommerce brands to develop personalized and targeted marketing messages by analyzing consumer data from their eCommerce apps. However, these messages are created to fit into the requirements of a mobile app.

Brands obtain consumer patterns and trends from their eCommerce apps using AI. They also gain insights into the preferences of their customers using their mobile apps. This allows them to design the apps to match these preferences.

With such data, they know the kind of ads and targeted messages to send to each of their customers. They are also able to identify the right marketing times for such messages, allowing them to have a constant flow of traffic into their eCommerce mobile applications.

Automation

Advancements in technology have played a major role in pushing businesses towards automation. Today, tasks that would take days only need a couple of minutes to be completed. This is because of automation.

With new trends such as dropshipping in the eCommerce industry, we are seeing companies such as Spark Shipping using technology for eCommerce dropshipping automation. This requires AI technology to identify and give insights into different metrics.

Using AI, eCommerce dropshipping business owners can identify what their customers want when they visit their mobile applications. This information can be used to display the products that a customer is most likely going to buy.

Voice Search

Voice search is reshaping digital marketing in different industries. There is a lot of potential for eCommerce brands that want to use AI to implement voice search in their eCommerce applications. Using AI, eCommerce brands can learn about customer preferences, instructions, requests, queries, and interactions.

Using this data, they can segment and profile all users who access their eCommerce mobile applications. Using emerging technologies, they can streamline voice search ensuring that customers’ voices are easily recognizable.

Immediately after a returning user introduces themselves, the app can bring products that that particular user wants to see. They (customers) can interact with the mobile app without having to type anything. This is all made possible by AI technology.

Adding a Personal Touch with Chatbots

A chatbot can be defined as a computer program that is used to streamline conversations between eCommerce applications (or any other web application) and their customers.

Powered by AI, eCommerce brands can use chatbots to handle multiple tasks in their eCommerce businesses. For instance, you can use chatbots to automate all order processes in your mobile application.

When it comes to customer service, AI has learned everything about the operations of your app. This means that these chatbots can answer any question from your customers. All this happens in your app, without your intervention.

Dynamic Pricing

Initially, running an eCommerce business meant that you had to manually change your product prices whenever the need arose. Today, you can use AI to automatically change these prices instead of keeping fixed ones.

When a customer visits your eCommerce mobile app, they expect reasonable prices depending on the market. If you decide to do this manually, you are going to waste a lot of time, and the chances of errors will be very high.

In addition to dynamic pricing, AI technology can also be used to determine the consumers who need a discount even before converting. This way, you will make sure that price cuts are only available to customers who will make a purchase.

Artificial Intelligence is going to change every other industry in the next few years. As you can see above, eCommerce brands can use this technology to optimize their operations for mobile.

The post AI Technology Helps eCommerce Brands Optimize for Mobile appeared first on SmartData Collective.

Source : SmartData Collective Read More

Cloud Computing Can Improve Human Resource Management

Cloud Computing Can Improve Human Resource Management

Cloud technology is changing the future of business in many different ways. Countless companies have discovered the benefits cloud computing has to offer. As a result, 60% of companies have migrated to the cloud.

One of the many benefits of cloud technology pertains to human resource management. A growing number of companies are storing employee data on the cloud, which makes it easier to handle certain HR tasks.

In order to appreciate the benefits of using the cloud for HR management, it is necessary to understand the importance of human resource management in general. Keep reading to learn more.

What is Human Resource Management?

Human resource management is the process involving recruiting and selecting candidates while also providing them with training and development. Human resource management also decides on the appraisals of the employees’ performance and the compensation for the same. They maintain the proper relations with the employees and ensure the safety and healthy environment for the employees, which is in compliance with the labor laws of the land.

Since HR is so important, companies use the latest technology to handle it. This is one of the reasons that cloud technology has become so important in HR. The cloud helps with workforce planning and HR analytics, although there are still some challenges here that companies need to avoid.

HR management mainly deals with essential workplace functions like organizing, planning, directing, and controlling. It focuses on managing human resources for the organization and is responsible for their training, development, and maintenance. The HR department helps the organization achieve its social objectives.

What is human resource management?

HR management is a multidisciplinary subject that studies various fields like psychology, management, communication, and sociology. In recent years, it also involves new technology investments such as cloud computing.

It also helps in promoting teamwork. HR department handles every factor surrounding employees and manages the functions like job analysis, selection of the candidates, training, providing benefits and incentives, career planning, and maintaining discipline among the employees. It also communicates with the employees at all levels and maintains compliance with local as well as state laws.

What is the importance of Human resources?

Human beings have a great capacity, and no product or service can be produced without participation. It is a basic resource for the production of anything. Every business or organization desires to have skilled and professional people to make their business a success. It should be no surprise that many startups are investing in cloud HR technology to deal with these issues.

The management basics include the five ‘M’s, which are men, machines, money, methods, and materials. Human resource management is the branch that deals with the men. Humans are not easy to manage as each person is different from the other. In the other ‘M’s, men play a key role and have the power to manipulate the other ‘M’s.

People run businesses; a business cannot run by itself. The success of a business lies in its employees and managing them is what human resource management is all about.

Scope of human resource management

Human resource management includes a wide range of factors, so it’s necessary to classify it under the following subheads:

Personnel management – It is also called ‘direct manpower management,’ and it includes the most basic functions of human resource management like:HiringTrainingInduction and orientationTransfer, compensation, and benefitsLayoff and Termination

Labour relations – HRM focuses on improving the relationship between labour and organization. It addresses the grievances and settles the disputes to maintain harmony.Employee welfare – This mainly focuses on the working condition of an employee and includes factors like the safety and health of the employees.

As human resource management play a key role in recruiting and training professionals, an organization needs to increase its skills and expertise through senior management courses. The growth of an organization depends on how well the employees are assigned to their roles, and human resource management is responsible for the same. A well-organized and skilled human resource management department ensures an organization’s steady and consistent growth.

What Benefits Does the Cloud Bring to HR Management?

There are a lot of great reasons companies are investing in cloud technology for HR management. Here are some of the biggest benefits:

You can take advantage of SaaS human resource management tools that are only available on the cloud. Many cloud-based tools help with payroll administration, recruiting, benefits administration and talent management. You can track performance of employees more easily by storing records of their work on the cloud.You can use cloud-based tools to automate many workflows.

The benefits of using cloud computing in human resource management cannot be overstated. A report by Deloitte highlights some of the most pressing benefits of cloud HR.

The post Cloud Computing Can Improve Human Resource Management appeared first on SmartData Collective.

Source : SmartData Collective Read More

Roles of Python Developer in Data Science Teams

Roles of Python Developer in Data Science Teams

Data science is a very complex field that requires the insights of professionals from many different disciplines. One of the fields of professionals that are so important for data science projects are Python developers.

What is the Python programming language? Why is it so important in the data science profession?

What Is Python?

Python is a powerful programming language that is widely used in many different industries today. There are 8.2 million Python developers in the world today! That figure is growing as more teams need them to work on projects involving data analytics, AI and similar technologies.

Python developers are in high demand, and as a recruiter, knowing the roles and responsibilities of a Python developer is essential to finding the best candidates for your open positions. You will have a better understanding of the importance of using Python to create data science applications, which will make it easier to hire the right candidates.

In this blog post, we will outline the key roles and responsibilities of a Python developer and provide tips for recruiting them. So, if you’re looking to add a Python developer to your team, read on!

Python is a versatile scripting language that was first released in 1991. Python is used in many different fields today, including web development, software development, scientific computing, artificial intelligence, and more. Python is known for being easy to read and write, as well as being very reliable. Due to these benefits, it is an ideal programming language for the data science profession.

What Does a Python Developer Do?

A Python developer is responsible for writing code in the Python programming language. They may work on web applications, desktop applications, or back-end systems. Python developers typically work in a team of developers, and their job may also include working with databases, debugging code, and providing support to end users.

Python Developer Roles and Responsibilities

Let not waste any more of your time and get straight to some of the most common Python developer work roles and responsibilities.

Common roles and responsibilities of a Python developer include:

Developing back-end components for data science applicationsConnecting applications with third-party web servicesCreating scalable, testable, and efficient code which is necessary for handling programs that compile large datasetsIdentifying and fixing bugs and performance issuesWriting documentationCoordinating with other developers and data scientists

You can probably understand how these functions make Python the perfect programming language for creating AI and big data applications.

What are some of the requirements a Python developer working on big data applications should have? Here are the most common ones:

Strong experience with Python programming and an understanding of big data frameworks it will work withExperience with popular Python frameworks (Django, Flask, etc.)Experience with object-oriented programmingStrong problem-solving skillsExcellent communication and collaboration skillsExperience with version control systems (Git, Mercurial, etc.)

Python Developer Interview Questions for Data Science Teams

Data science projects are very complex. You can’t afford to hire the wrong team members. Therefore, you have to interview your candidates carefully.

What to ask your Python developer during an interview? We have collected a list of technical and cultural interview questions to ask your python developer. 

Python Developer: Technical Interview Questions

What is Python?What are the benefits of using Python?What is your background on big data applications?What are some of the key features of Python?What is your experience with Python?What are some of the most popular Python frameworks?What is your experience with object-oriented programming in Python?

Python Developer: Cultural Interview Questions

Tell me about a time when you had to solve a difficult problem?What is your approach to problem-solving?Tell me about a time when you had to work with a difficult codebase?What is your experience with writing documentation?Tell me about a time when

That’s it for this article! Hope we’ve helped you figure out what are some of the common roles and responsibilities for a Python developer helping create big data projects. Good luck in hiring the best candidate!

The post Roles of Python Developer in Data Science Teams appeared first on SmartData Collective.

Source : SmartData Collective Read More

Data governance building blocks on Google Cloud for financial services

Data governance building blocks on Google Cloud for financial services

Data governance includes people, processes, and technology. Together, these principles enable organizations to validate and manage across dimensions such as:

Data management, including data and pipelines lifecycle management and master data management.

Data protection, spanning data access management, data masking and encryption, along with audit and compliance.  

Data discoverability, including data cataloging, data quality assurance, and data lineage registration and administration.

Data accountability, with data user identification and policies management requirements.

While prioritizing investment in their people to achieve the desired cultural transformation and processes to increase operational effectiveness and efficiency will help enterprises, the technology pillar is the critical enabler for people to interact with data and for organizations to truly govern their data initiatives.

Financial services organizations are faced with particularly stringent data governance requirements regarding security, regulatory compliance, and general robustness. Once people are aligned and processes are defined, the challenge for technology comes to the picture: solutions should be flexible enough to complement existing governance processes and be cohesive across data assets to help make data management simpler.

In the following sections, starting with standard requirements for data governance implementations in financial services, we will cover how these correspond to Google Cloud services, open-source resources, and third-party offerings. We will share an architecture capable of supporting the entire data lifecycle, based on our experience implementing data governance solutions with world-class financial services organizations.

Data management

Looking first at the data management dimension, we have compiled some of the most common requirements, along with the relevant Google Cloud services and capabilities from the technology perspective.

Data Management Requirements

Services & Capabilities

Data and pipelines lifecycle management

Batch ingestions: Data pipelines management, scheduling, and data pipelines processing logging

Streaming Pipelines: Metadata

Data lifecycle management

Operational metadata including both state and statistical metadata

A comprehensive end-to-end data platform

GCS Object Lifecycle

BigQuery data lifecycle

Data Fusionpipeline lifecycle management, orchestration, coordination, and metadata management

Dataplex Intelligent automation data lifecycle management

Cloud Logging, Cloud Monitoring 

Informatica Axon Data Governance

Compliance

Facilitate regulatory compliance requirements

Easily expandable to help comply with CCPA, HIPAA, PCI, SOX, and GDPR, through security controls implementation using IAM, CMEKs, BQ column-level access control, BQ Table ACL, Data Masking, Authorized views, DLP PII data 

Identification, and Policy tags

DCAM data and analytics assessment framework

CDMC best practice assessment and certification

Master Data Management

Duplicate Suspect Processing rules

Solution and department scope

Enterprise Knowledge Graph

KG Entity Resolution/reconciliation and Financial Crime Record matching MDM + ML

Tamr Cloud-Native Master Data Management

Site Reliability

Data Pipelines SLA

Data at Rest SLA

SLAs applied to data pipeline

SLAs applied to services managing data

DR strategies for data

Registering, creating, and scheduling data pipelines is a recurring challenge that organizations face. Similarly, data lifecycle management is a key part of a comprehensive data governance strategy.

This is where Google Cloud can help, offering multiple data processing engines and data storage options tailored for each need, but that are integrated and make orchestration and cataloging easy.

Data protection

Financial organizations demand world-class data protection services and capabilities to support their defined internal processes and help meet regulatory compliance requirements.

Data Protection Requirements

Services & Capabilities

Data Access Management

Definition of access policies

Multi-cloud approval workflow integration*

Access Approvals

 IAM and ACL, fine grained GCS Access

 Row-Level, Column-Level permissions

BigQuery Security

Hierarchical resources & policies

Users, Authentication, Security (2FA), Authorization

Resources, Separation boundaries, Organization policies, Billing and quota, Networking, Monitoring

Event Threat Detection

Multi-cloud Approval workflow by 3rd Party – Collibra*

Data Audit & Compliance

Operational metadata logs capture

Failing process alerting and root cause identification

Cloud Audit Logs

Security Command Center

Access Transparency & Access Approval

StackDriver Logging

Collibra Audit Logging

Security Health

Data vulnerabilities identification

Security health checks

Security Health Analytics

Security Health Analytics

Data Masking and Encryption

Storage-level encryption metadata

Application-level encryption metadata

PII data identification and tagging

Encryption at rest, Encryption in transit, KMS

Cloud DLP Transformations, De-identification

Access management, along with data and pipeline audit, is a common requirement that should be managed across the board for all data assets. These security requirements are usually supported by security health checks and automatic remediation processes.

Specifically on data protection, capabilities like data masking, data encryption, or PII data management should be available as an integral part for processing pipelines, and be defined and managed as policies.

Data discoverability

Data describes what an organization does, how it relates to its users, competitors, and regulatory institutions. This is why data discoverability capabilities are crucial for financial organizations.

Data Discoverability Requirements

Services & Capabilities

Data Cataloging

Data catalog storage

Metadata tags association with fields

Data classification metadata registration

Schema Versions control

Schema definition before data loading

Data Catalog

Column level tags

Dataplex logical aggregations (Lakes,  Zones and Assets)

DLP

Collibra Catalog

Collibra Asset versioncontrol

Collibra Asset Type creation and Asset pre-registration

Alation Data Catalog

Informatica Enterprise Data Catalog

Data Quality

On ingestion data quality rules definition (like regex validations for each column)

Issues remediation lifecycle management

BigQuery DQ

Dataplex

Data quality with Dataprep

Collibra DQ

Alation Data Quality

CloudDQ declarative Data Quality validation (CLI)*

Informatica Data Quality

Data Lineage

Storage and Attribute level Data Lineage

Multi-cloud/on-premises  lineage

Cloud Data Fusion Data Lineage

Understand the flow

Granular visibility into flow of data

Operational View

Openess or share lineage

Data Catalog & BigQuery

Collibra lineage

multi-cloud/on-premises management

Alation Data Lineage

Data Classification

Data Discovery and Data Classification metadata registration 

DLP Discovery and classification

90+ built-in classifiers: Including PII

Custom classifiers

A data catalog is the foundation on which a large part of a data governance strategy is built. You need automatic classification options and data lineage registration and administration capabilities to make data discoverable. Dataplex is a fully managed data discovery and metadata management service that offers unified data discovery of all data assets, spread across multiple storage targets. Dataplex empowers users to annotate business metadata, providing necessary data governance foundation within Google Cloud, and providing metadata that can be integrated later with external metadata by a multi-cloud or enterprise-level catalog. The Collibra Catalog is an example of an enterprise data catalog on Google Cloud that complements Dataplex by providing enterprise functionality such as an operating model that includes the business and logical layer of governance, federation and the ability to catalog across multi-cloud and on-premises environments.

Data quality assurance and automation is the second foundation of data discoverability. To help with that effort Dataprep is another tool for assessing, remediating, and validating processes, and can be used in conjunction with customized data quality libraries like Cloud Data Quality Engine, a declarative and scalable data quality validation command-line Interface. Collibra DQ is another data quality assurance tool, and uses machine learning to identify data quality issues, recommend data quality rules and allow for enhanced discoverability.

Data accountability

Identifying data owners, controllers, stewards, or users, and effectively managing the related metadata, provides organizations with a way to ensure trusted and secure use of the data. Here we have the most commonly identified data accountability requirements and some tools and services you can use to meet them.

Data Accountability Requirements

Services & Capabilities

Data User Identification

Data owner and dataset linked registration

Data steward and dataset linked registration

Users role based data usage logging 

Dataplex

Data Catalog

Analytics Hub 

Collibra Data Stewardship

Alation Data Stewardship

Policies Management 

Domain based policies management

Column level policies management

Cloud DLP

Dataplex

Policy Tags 

BigQuery Column LevelSecurity 

Collibra Policy Management

Domain Based Accountability

Governed data sharing

IAM and ACL role based access

Analytics Hub

Having a centralized identity and access management solution across the data landscape is a key accelerator to defining a data security strategy. Core capabilities should include user identification, role- and domain-based access policy management, and a policy-managed data access authorization workflows.

Data governance building blocks to meet industry standards 

Given these capabilities, we provide a reference architecture for a multi-cloud and centralized governance environment that enables a financial services organization to meet its requirements. While here we focus on the technology pillar of data governance, it is essential that people and processes are also aligned and well-defined.

The following architecture does not intend to cover each and every requirement presented above, but provides core building blocks for data governance implementation to meet industry standards as far as the technology pillar is concerned at the time of writing this blog.

1. Data cataloging is a central piece in any data governance technology journey. Finance enterprises often need to deal with several storage systems residing in multiple cloud providers and also on-premises. As such, an enterprise-level catalog, a “catalog of catalogs”, that centralizes and makes discoverable all the data assets in the organization, is a helpful capability to helping the business get the most from its data, wherever it sits.

Even when Google Data Catalog supports non-Google Cloud data assets through open-source connectors, a third-party cataloging solution (such as Collibra) may be well-suited to help with this, providing connection capabilities to several storage systems and additional layers of metadata administration. For example, this could enable having the ability to pre-register data assets even before they are available in storage, and to integrate those once actual tables or filesets are created, including schema evolution tracking.

2. From a Google cloud perspective, data to be discovered, cataloged, or protected can reside in a data lake or a landing zone in Cloud Storage, an enterprise data warehouse in BigQuery, a high-throughput low-latency datastore like BigTable, or even in relational or NoSQL databases supported by Spanner, CloudSQL or Firestore, for example.  

Gathering Cloud Data Catalog metadata such as tags is a multi-step process. Financial enterprises should standardize and automate as much as possible to have reliable and complete metadata. To populate the Data Catalog with labels, the Cloud Data Loss Prevention API (DLP) is a key player. DLP inspection templates and inspection jobs can be used to standardize tagging, sampling, and discovering data, and finally to tag tables and filesets. 

Security and access control is another big concern for finance organizations given the sensitivity of the data they handle. Several encryption and masking layers are usually applied to the data. In these scenarios, sampling and reading data to determine which labels to add is a slightly more complex process, requiring decryption along the way.

In order to be able to do things like apply column-level policy tags to BigQuery, the DLP inspection job findings need to be published to an intermediate storage location accessible to a tagging job using Cloud Data Catalog. In these contexts, a Dataflow job could help handle the required decryption and tagging. There is a step by step community tutorial on that here.

Ensuring the right people accessing the right data across numerous datasets can be challenging. Policy Taxonomy tags, in conjunction with IAM access management, covers that need.

Google Cloud’s Dataplex service (discussed more below) will also help to automate data discovery and classification using dynamic schema detection, such that metadata can be automatically registered in a Dataproc Metastore or in BigQuery before finally being used by Data Catalog.

3. To understand the origin, movement, and transformation of data over time, data lineage systems are fundamental. These allow users to store and access lineage records and provide reliable traceability to identify data pipeline errors. Given the large volume of data in a finance enterprise data warehouse environment, an automated data lineage recording system can simplify data governance for users.

Finance organizations have to meet compliance and auditability standards, enforce access policies, and perform root cause analysis on poor data or failing pipelines. To do that, Cloud Data Catalog Lineage and Cloud Data Fusion Lineage provide traceability capabilities that can help.

4. Dataplex is a fundamental part of Google Cloud’s vision for data governance. Dataplex is an intelligent data fabric that unifies and automates data management and allows easy and graphical control for analytics processing jobs. This helps financial organizations meet the complex requirements for data and pipeline lifecycle management.

Dataplex also provides a way to organize data into logical aggregations called lakes, zones and assets. Assets are directly related to Cloud Storage files or tables in BigQuery. Those assets are logically grouped into zones. Zones can be typical data lake implementation zones like raw zones, refined zones, or analytics zones, or can be based on business domains like sales or finance. On top of that logical organization, users can define security policies across your data assets, including granular access control. This way, data owners can grant permissions while data managers can monitor and audit the access granted.

Build a data governance strategy in the cloud

For financial data governance implementations to have trust in their data, and meet regulatory compliance requirements, they must have a solid and flexible technology pillar from which to build processes and align people. Google Cloud can help build that comprehensive data governance strategy, while allowing you to add third-party capabilities to meet specific industry needs.

To learn more: 

Listen to this podcast with Googlers Jessi Ashdown and Uri Gilad

See how Dataplex and Data catalog can become key pieces in your data governance strategy

Meet the authors of Data Governance – The Definitive Guide 

Review the principles and best practices for data governance in the cloud in this white paper.

Related Article

Data governance in the cloud – part 1 – People and processes

The role of data governance, why it’s important, and processes that need to be implemented to run an effective data governance program

Read Article

Source : Data Analytics Read More

Creative Ways to Leverage Big Data for an Optimal Marketing Plan

Creative Ways to Leverage Big Data for an Optimal Marketing Plan

Big data technology is becoming more important than ever for modern business owners. One study by the McKinsey Institute shows that data-driven organizations are 19 times more likely to be profitable.

There are many benefits of using big data to run a business. One of the most important advantages is that big data can help with marketing.

Big Data is Essential for Modern Marketing Strategies

Running a business isn’t easy, especially when it comes to marketing. However, if you want to continue to draw in new customers and clients, continuous marketing is a must. The good news is that big data can help with this. The McKinsey Institute report showed that data-driven businesses are 23 times more likely to acquire customers.

The good news is there are ways to use big data to simplify and boost your efforts to guarantee success. If you are looking to boost your marketing efforts as a data-driven organization, then you should follow these crucial tips.

Big data has revolutionized marketing. Giving you insights into how your current methods are working, your customers, and increasing brand awareness, big data can play a crucial role in your success.

The main types of big data you’ll want to capture include:

Customer dataOperational dataFinancial data

By collecting and analysing customer data, you’ll get a much better idea of who your target audience is. This can help you to figure out the best places to advertise and market your services, as well as determine your brands tone of voice. Having a strong understanding of your target audience is crucial in marketing. After all, if you don’t know who you are marketing to, how can you expect to receive results?

Operational data refers to the way the business runs, including shipping and logistics, and customer relationship management. Data has become very important for improving customer service. When you have a clear picture of the way the business is run, improvements can be made to improve performance. This in turn will boost customer satisfaction, leading to more word-of-mouth referrals.

Financial data such as pricing, sales, and margins, helps you to budget more effectively. You will also see where your budget is being wasted, allowing you to switch to more profitable marketing methods.

The more data you collect and analyse, the more targeted and effective your marketing will become.

Update Your Aesthetics – how things like flooring help with marketing (makes good impression etc)

In business, it’s important to make a great first impression. This is difficult to do if the aesthetics of your brand aren’t on point.

Start with your digital aesthetics such as your logo, web site, and social media presence. Does your branding match your business? Having clean, clear aesthetics can help you to appear more authoritative and professional.

It isn’t just your digital presence that you need to worry about. How your physical premises are laid out will also make a difference to your marketing efforts. Firstly, it determines a customer or client’s opinion of the business if they visit your premises. Secondly, the aesthetics of your business can impact morale, motivation, and productivity.

Everything from the type of flooring you have installed to how much light enters the premises can make a difference. When it comes to the flooring of your business it should be comfortable, practical, and aesthetically pleasing. It doesn’t have to cost a fortune to update the flooring in your business. There are companies that offer up to 65% off commercial flooring.

These are just some of the ways aesthetics matter in business. If you want to make a good impression, start by giving your online or offline a presence a makeover. 

Leverage big data for local community engagement

Giving back to your local community is a great way to boost your marketing efforts. Customers and clients generally love brands who use their profits for good.

It could be sponsoring a local sports team, organizing a charity fundraiser, or planting trees and greenery to help improve air quality and aesthetics. Don’t forget to advertise the ways you give back to the community on your social media platforms. Getting involved in your local community could help you to attract a lot of new customers, as well as keep existing ones coming back for more.

It might seem like big data wouldn’t help much with local community engagement. However, there are creative ways to tap data to learn more about your target consumers. This allows you to focus on identifying charities and engagement opportunities that allow you to be seen be your target customers.

Use Big Data for Reputation Management

You need to use data mining to improve reptation management. You can use data scraper tools to find positive statements customers and experts have made about your company. Then, you can showcase these testimonials on your website.

Do you have glowing testimonials you can show off to potential clients? These days consumers need to trust a business before they buy from them. Testimonials and positive reviews can help to put their mind at ease, making them more likely to make a purchase.

You should showcase your testimonials wherever you can, including on your website, social media pages, and in email signatures. Don’t forget to encourage your customers to leave them too. Having a constant stream of positive reviews will do wonders for your brand.

Use Data-Driven SEO

To continuously attract new clients and customers, you need to work on your SEO. Making it easier for you to be found by search engines, the right SEO tactics can boost website traffic, convert more leads into customers, and improve your bottom line.

It can take a lot of work to develop and implement a successful SEO strategy. If you need to, bring in the professionals. SEO companies and freelancers can help you to achieve better rankings with minimal effort on your part.

There are a lot of benefits of using big data in SEO. You can use data-mining tools to identify keywords that are likely to appeal to your target demographic. There are also data-mining tools that can help you identify links to competitor websites, so you can reverse engineer their linkbuilding strategies.

Offer competitions and giveaways

Giveaways and competitions tend to attract a lot of attention. If you are trying to bolster your marketing efforts, think of a giveaway or competition that your audience will love.

Advertise your offer on social media, asking participants to like, share, and comment on your post. This will boost its visibility to others, making it easier for customers to find you. Limited time giveaways and competitions work best, and you can offer everything from discounts to free products.

Make the most of social media

Social media provides a ton of opportunities for marketing your brand. However, it’s important to focus on just one platform at a time when you are just getting started.

Find out where your ideal audience hangs out, then focus on marketing your business on that channel. With social media you can run paid ads, post valuable content, gather fans and followers, and boost brand awareness.

There are over a billion people using social media sites, giving you access to a huge audience. If your business doesn’t yet have a strong social media presence, now is the time to build one up.

You can’t take an ad hoc approach to social media, though. You are going to need to invest in social media analytics tools that will help you make more nuanced insights. You can use your data to guide your decision-making process, so yo can create the best content, post at the right times and engage with the right networks.

Use Big Data to Think Outside of the Box

Like everything in business, the best results often come from thinking outside of the box. Big data technology will make this a lot easier. You need to use analytics tools to make observations that you can use to make more informed decisions. When you invest in big data, you can come up with innovative ways to market your business. Look at what your competitors are doing and identify ways to improve on their strategies.

There are tons of ways to boost your marketing efforts with big data technology. The above methods are some of the most effective things you can try out to start seeing bigger, better results. Consistency and continually tracking your efforts with data analytics tools are key to your success.

The post Creative Ways to Leverage Big Data for an Optimal Marketing Plan appeared first on SmartData Collective.

Source : SmartData Collective Read More

Expanding the Google Cloud Ready – Sustainability initiative with 12 new partners

Expanding the Google Cloud Ready – Sustainability initiative with 12 new partners

We introduced the Google Cloud Ready – Sustainability designation earlier this year to showcase those partners committed to help global businesses and governments accelerate their sustainability programs. These partners build solutions that enhance the capabilities and ease the adoption of powerful Google Cloud technologies, such as Google Earth Engine and BigQuery, allowing customers to leverage data-rich solutions that help reduce their carbon footprints.

Today, we’re pleased to announce growth of the Google Cloud Ready – Sustainability program, with 12 new partners joining the initiative and bringing their climate, ESG, and sustainability platforms to Google Cloud. These partners include: 

Aclima is pioneering an entirely new way to diagnose the health of our air and track climate-changing pollution. Powered by its network of roving and stationary sensors, Aclima measures air pollution and greenhouse gasses at unprecedented scales and with block-by-block resolution.

Sustainability at Airbus means uniting and safeguarding the world in a safe, ethical, and socially and environmentally responsible way. Airbus has a comprehensive sustainability strategy built on four core commitments, which guide the company’s approach to the way it does business and how it designs its products and services: Lead the journey toward clean aerospace, respect human rights and foster inclusion, build the business on the foundation of safety and quality, and exemplify business integrity.

Atlas AI is a predictive analytics platform that analyzes, monitors, and forecasts regions of growth, vulnerability, and opportunity around the world to offer insight into where organizations can grow most successfully, and where investment can boost historically underserved communities. Atlas AI’s platform has been used to expand water and sanitation infrastructure, promote new electrification, target community health services, and broaden internet access in countries across Sub-Saharan Africa and South Asia.

BlueSky Resources makes sense of sensors from both public and private sources by harmonizing inputs from ground, aerial and space based inputs.  Expertise in atmospheric science and cloud technology allows BlueSky to provide understanding and insights related to the correlation of emissions insights to assets and activities.   This powerful combination of data, climate science and delivery of insights is enabling focused sustainability impact across clients in various industries including energy, waste management, industry and natural resource management.

Electricity Maps provides companies with actionable data quantifying the carbon intensity and origin of electricity. This data is available on an hourly basis across 50+ countries and more than 160 regions. Electricity Maps’ mission is to organize the world’s electricity data to drive the transition toward a truly decarbonized electricity system.

FlexiDAO is a global climate tech company based in the Netherlands and Spain. The company works closely with other critical stakeholders to co-create the international standard around energy-related emissions compliance. Thanks to FlexiDAO’s end-to-end 24/7 Carbon-free Energy platform, companies can quantify and confidently showcase their contribution to society’s decarbonization. 

LevelTen Energy helps organizations achieve carbon-free energy usage targets (on an annual and 24/7 basis) by delivering access to the world’s largest clean energy marketplace, and the software, data, analytics, and expertise required for efficient transactions. The LevelTen Platform connects energy buyers and over 40 sustainability advisors with more than 1,800 carbon-free energy projects in 24 countries across North America and Europe.

Ren is a SaaS platform built on Google Cloud that enables companies with global supply chains to source the cleanest energy possible. Despite using country-sized amounts of energy, most companies have no idea how to transition to renewables due to complex financial, technical, and logistical challenges. Ren unlocks cost savings, provides the cleanest energy possible, and ensures companies meet their carbon commitments on time.

Sidewalk Labs, an urban innovation unit in Google, builds products to radically improve quality of life in cities for all. Delve is a product that helps real estate teams design more sustainable buildings and neighborhood blocks, faster. Mesaautomates building controls to deliver savings and comfort to commercial building owners and tenants. With these products and others, Sidewalk Labs helps commercial real estate developers, building owners and city planners make more sustainable choices for the built environment that are better for communities and the planet. 

Tomorrow.io is The World’s Weather and Climate Security Platform, helping countries, businesses, and individuals manage their weather and climate security challenges. The platform is fully customizable to any industry impacted by the weather. Customers around the world use Tomorrow.io to dramatically improve operational efficiency. Tomorrow.io was built from the ground up to help teams prepare for the business impact of weather by automating decision-making and enabling climate adaptation at scale. 

UP42 is a geospatial developer platform and marketplace bringing together industry-leading data and ready-to-use processing algorithms. The platform enables organizations to build, run, and scale geospatial products. With the ability to choose from a wide range of high-resolution commercial and open satellite data, aerial, weather, and others, solution providers can apply best-in-class machine learning and/or processing modules to gain valuable geospatial insights and streamline their processes.

Woza is a sustainable innovation platform that leverages deep geospatial knowledge and existing best-in-class technologies to develop a new generation of streamlined analytics workflows focused on sustainability. Companies in agri-food, energy, and public sector are partnering with Woza to accelerate their journey to Industry 4.0.

Adding expertise to accelerate sustainability use cases

New partners in the initiative join our existing Google Cloud Ready – Sustainability partners like Carto, Climate Engine, Geotab, NGIS, and Planet Labs PBC bringing a wealth of industry knowledge, offering solutions for sustainability challenges ranging from first-mile sustainable sourcing and spatial finance to fleet electrification and rich geospatial visualizations. 

CARTO is the world’s leading Location Intelligence platform, enabling organizations to use spatial data and analysis for more efficient delivery routes, better behavioral marketing, strategic store placements, and much more. The company’s solutions extend the geospatial capabilities available in BigQuery, while leveraging the near limitless scalability that Google Cloud provides. When it comes to sustainability, CARTO’s platform is trusted by a wide range of organizations, including Greenpeace, Vizzuality, Litterati, Indigo, WWF, the Marine Conservation Institute, The World Bank, and the Institute for Sustainable Cities.

Climate Engine leverages data from Google Earth Engine and other ecosystem partners to help organizations improve their climate change-related risk planning in areas such as water use, agriculture, storm risk, and wildfire spread. By linking the economy and the environment, organizations can understand how environmental risks are affecting their markets and discover opportunities to reduce their emissions and potential supply chain or operational disruptions from climate-related events. 

Geotab is advancing security, connecting commercial vehicles to the cloud, and providing data-driven analytics to help customers better manage their fleets. Processing billions of data points daily, Geotab helps businesses improve and optimize fleet productivity, enhance safety, and achieve sustainability goals and stronger compliance.

Geospatial solutions provider NGIS built a SaaS-based first-mile sustainable sourcing solution called TraceMark using Google Cloud’s geospatial platform and technologies from other ecosystem partners. Several global CPG firms have already used TraceMark to modernize their geospatial workflows and help facilitate the use of space-based data for supply chain sustainability transformation. 

Planet Labs PBC operates the largest fleet of Earth imaging satellites in history, with approximately 200 satellites in orbit. Planet’s mission is to image the whole earth’s landmass every day to make global change visible, accessible, and actionable. As a Public Benefit Corporation, Planet’s Public Benefit Purpose is to accelerate humanity toward a more sustainable, secure, and prosperous world by illuminating environmental and social change. 

How the Google Cloud Ready – Sustainability program works

If you are a Google Cloud partner with sustainability solutions and expertise to share, the  Google Cloud Ready – Sustainability program is open for applications. Entry into the program requires that the partner solution delivers quantifiable results for climate mitigation, adaptation, or reporting needs. To apply for the Google Cloud Ready – Sustainability designation, the solution must: 

Be available on Google Cloud

Address ESG risk, and assist customers in achieving ESG targets and/or support typical ESG goal frameworks, such as the United Nations’ SDGs

Demonstrate repeatability

Meet minimum Google Cloud application development best practices, including security, performance, scalability, availability, and carbon footprint reporting for available services

Have Google Cloud Carbon Footprint Reporting enabled

Have at least one public customer case study available. 

The selection process begins with an evaluation of the solution. If a partner meets the above criteria, Google Cloud provides a suggested roadmap for tier progression within the program and then issues a formal acknowledgement of participation in the Google Cloud Ready – Sustainability program. Together, Google Cloud sustainability partners can deliver platforms that are helping businesses and governments accelerate progress aligned to their environmental goals. 

Google Cloud will showcase the validated solutions on the Google Cloud Partner Directory Listing, Google Cloud Ready Sustainability Partner Advantage page, and — if applicable — via the Google Cloud Marketplace. We hope to help customers better understand how these technologies can help them meet their ESG goals, find the right solution for their particular challenge, and implement a solution faster. 

Prospective partners can visit the Partner Portal to learn more about the Google Cloud Ready – Sustainability program or complete an application.

Related Article

Google Cloud announces new products, partners and programs to accelerate sustainable transformations

In advance of the Google Cloud Sustainability Summit, we announced new programs and tools to help drive sustainable digital transformation.

Read Article

Source : Data Analytics Read More

Integrating ML models into production pipelines with Dataflow

Integrating ML models into production pipelines with Dataflow

Google Cloud’s Dataflow recently announced the General Availability support for Apache Beam’s generic machine learning prediction and inference transform, RunInference. In this blog, we will take a deeper dive on the transform, including:

Showing the RunInference transform used with a simple model as an example, in both batch and streaming mode.

Using the transform with multiple models in an ensemble.

Providing an end-to-end pipeline example that makes use of an open source model from Torchvision. 

In the past, Apache Beam developers who wanted to make use of a machine learning model locally, in a production pipeline, had to hand-code the call to the model within a user defined function (DoFn), taking on the technical debt for layers of boilerplate code. Let’s have a look at what would have been needed:

Load the model from a common location using the framework’s load method.

Ensure that the model is shared amongst the DoFns, either by hand or via the shared class utility in Beam.

Batch the data before the model is invoked to improve the model efficiency. The developer would set this up, either by hand or via one of the groups into batches utilities.

Provide a set of metrics from the transform.

Provide production grade logging and exception handling with clean messages to help that SRE out at 2 in the morning! 

Pass specific parameters to the models, or start to build a generic transform that allows the configuration to determine information within the model. 

And of course these days, companies need to deploy many models, so the data engineer begins to do what all good data engineers do and builds out an abstraction for the models. Basically, each company is building out their own RunInference transform!  

Recognizing that all of this activity is mostly boilerplate regardless of the model, the RunInference API was created. The inspiration for this API comes from the tfx_bsl.RunInference transform that the good folks over at TensorFlow Extended built to help with exactly the issues described above. tfx_bsl.RunInference was built around TensorFlow models. The new Apache Beam RunInference transform is designed to be framework agnostic and easily composable in the Beam pipeline. 

The signature for RunInference takes the form of RunInference(model_handler), where the framework-specific configuration and implementation is dealt with in the model_handler configuration object. 

This creates a clean developer experience and allows for new frameworks to be easily supported within the production machine learning pipeline, without disrupting the developer workflow.. For example, NVIDIA is contributing to the Apache Beam project to integrateNVIDIA TensorRTTM, an SDK that can optimize trained models for deployment with the highest throughput and lowest latency on NVIDIA GPUs within Google Dataflow (PullRequest).  

Beam Inference also allows developers to make full use of the versatility of Apache Beam’s pipeline model, making it easier to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for activities like A/B testing and building out ensembles. For example, doing natural language processing (NLP) analysis of text and then using the results within a domain specific model to drive a customer recommendation. 

In the next section, we start to explore the API using code from the public codelab with the notebook also available at github.com/apache/beam/examples/notebooks/beam-ml.

Using the Beam Inference API

Before we get into the API, for those who are unfamiliar with Apache Beam, let’s put together a small pipeline that reads data from some CSV files to get us warmed up on the syntax.

code_block[StructValue([(u’code’, u”import apache_beam as beamrnrnwith beam.Pipeline() as p:rn data = p | beam.io.ReadFromText(‘./file.csv’) rn data | beam.Map(print)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa26ab6d0>)])]

In that pipeline, we used the ReadFromText source to consume the data from the CSV file into a Parallel Collection, referred to as a PCollection in Apache Beam. In Apache Beam syntax, the pipe ‘|’ operator essentially means “apply”, so the first line applies the ReadFromText transform. In the next line, we use a beam.Map() to do element-wise processing of the data; in this case, the data is just being sent to the print function.

Next, we make use of a very simple model to show how we can configure RunInference with different frameworks. The model is a single-layer linear regression that has been trained on y = 5x data (yup, it’s learned its fives times table). To build this model, follow the steps in the codelab

The RunInference transform has the following signature: RunInference(ModelHandler). The ModelHandler is a configuration that informs RunInference about the model details and that provides type information for the output. In the codelab, the PyTorch saved model file is named ‘five_times_table_torch.pt’ and is output as a result of the call to torch.save() on the model’s state_dict. Let’s create a ModelHandler that we can pass to RunInference for this model:

code_block[StructValue([(u’code’, u”my_handler = PytorchModelHandlerTensor(rn state_dict_path=./five_times_table_torch.pt,rn model_class=LinearRegression,rn model_params={‘input_dim’: 1,rn ‘output_dim’: 1}”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa926196d0>)])]

The model_class is the class of the PyTorch model that defines the model architecture as a subclass of torch.nn.Module. The model_params are the ones that are defined by the constructor of the model_class. In this example, they are used in the notebook LinearRegression class definition:

code_block[StructValue([(u’code’, u’class LinearRegression(torch.nn.Module):rn def __init__(self, input_dim=1, output_dim=1):rn super().__init__()rn self.linear = torch.nn.Linear(input_dim, output_dim) rn def forward(self, x):rn out = self.linear(x)rn return out’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa25e92d0>)])]

The ModelHandler that is used also provides the transform information about the input type to the model, with PytorchModelHandlerTensor expecting torch.Tensor elements.

To make use of this configuration, we update our pipeline with the configuration. We will also do the pre-processing needed to get the data into the right shape and type for the model that has been created. The model expects a torch.Tensor of shape [-1,1] and the data in our CSV file is in the format 20,30,40.

code_block[StructValue([(u’code’, u”with beam.Pipeline() as p:rn raw_data = p | beam.io.ReadFromText(‘./file.csv’)rn shaped_data = raw_data | beam.FlatMap(lambda x : rn [numpy.float32(y).reshape(-1,1) rn for y in x.split(‘,’)]))rn results = shaped_data | beam.Map(torch.Tensor) | RunInference(my_handler)rn results | beam.Map(print)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa91962f50>)])]

This pipeline will read the CSV file, get the data into shape for the model, and run the inference for us. The result of the print statement can be seen here:

PredictionResult(example=tensor([20.]), inference=tensor([100.0047], grad_fn=<UnbindBackward0>))

The PredictionResult object contains both the example as well as the result, in this case 100.0047 given an input of 20. 

Next, we look at how composing multiple RunInference transforms within a single pipeline gives us the ability to build out complex ensembles with a few lines of code. After that, we will look at a real model example with TorchVision.

Multi model pipelines

In the previous example, we had one model, a source, and an output. That pattern will be used by many pipelines. However, business needs also require ensembles of models where models are used for pre-processing of the data and for the domain specific tasks. For example, conversion of speech to text before being passed to an NLP model. Though the diagram above is a complex flow, there are actually three primary patterns. 

1- Data is flowing down the graph.

2- Data can branch after a stage, for example after ‘Language Understanding’.

3- Data can flow from one model into another.

Item 1 means that this is a good fit for building into a single Beam pipeline because it’s acyclic. For items 2 and 3, the Beam SDK can express the code very simply. Let’s take a look at these.

Branching Pattern:

In this pattern, data is branched to two models. To send all the data to both models, the code is in the form:

code_block[StructValue([(u’code’, u’model_a_predictions = shaped_data | RunInference(configuration_model_a)rn model_b_predictions = shaped_data | RunInference(configuration_model_b)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa2307890>)])]

Models in Sequence:

In this pattern, the output of the first model is sent to the next model. Some form of post processing normally occurs between these stages. To get the data in the right shape for the next step, the code is in the form:

code_block[StructValue([(u’code’, u’model_a_predictions = shaped_data | RunInference(configuration_model_a)rnmodel_b_predictions = (model_a_predictions | beam.Map(postprocess) rn | RunInference(configuration_model_b))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa031ff50>)])]

With those two simple patterns (branching and model in sequence) as building blocks, we see that it’s possible to build complex ensembles of models. You can also make use of other Apache Beam tools to enrich the data at various stages in these pipelines. For example, in a sequential model, you may want to join the output of model a with data from a database before passing it to model b, bread and butter work for Beam. 

Using an open source model

In the first example, we used a toy model that was available in the codelab. In this section, we walk through how you could use an open source model and output the model data to a Data Warehouse (Google Cloud BigQuery) to show a more complete end-to-end pipeline.

Note that the code in this section is self-contained and not part of the codelab used in the previous section. 

The PyTorch model we will use to demonstrate this is maskrcnn_resnet50_fpn, which comes with Torchvision v 0.12.0. This model attempts to solve the image segmentation task: given an image, it detects and delineates each distinct object appearing in that image with a bounding box.

In general, libraries like Torchvision pretrained models download the pretrained model directly into memory. To run the model with RunInference, we need a different setup, because RunInference will load the model once per Python process to be shared amongst many threads. So if we want to use a pre-trained model from these types of libraries, we have a little bit of setup to do. For this PyTorch model we need to:

1- Download the state dictionary and make it available independently of the library to Beam.

2- Determine the model class file and provide it to our ModelHandler, ensuring that we disable the class’s ‘autoload’ features.

When looking at the signature for this model with version 0.12.0, note that there are two parameters that initiate an auto-download: pretrained and pretrained_backbone. Ensure these are both set to False to make sure that the model class does not load the model files:

model_params = {‘pretrained’: False, ‘pretrained_backbone’: False}

Step 1 – 

Download the state dictionary. The location can be found in the maskrcnn_resnet50_fpn source code:

code_block[StructValue([(u’code’, u’%pip install apache-beam[gcp] torch==1.11.0 torchvision==0.12.0′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa2873c10>)])]
code_block[StructValue([(u’code’, u’import os,iornfrom PIL import Imagernfrom typing import Tuple, Anyrnimport torch, torchvisionrnimport apache_beam as beamrnfrom apache_beam.io import fileiornfrom apache_beam.io.gcp.internal.clients import bigqueryrnfrom apache_beam.options.pipeline_options import PipelineOptionsrnfrom apache_beam.options.pipeline_options import SetupOptionsrnfrom apache_beam.ml.inference.base import KeyedModelHandlerrnfrom apache_beam.ml.inference.base import PredictionResultrnfrom apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa2873d10>)])]
code_block[StructValue([(u’code’, u”# Download the state_dict using the torch hub utility to a local models directoryrntorch.hub.load_state_dict_from_url(‘https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth’, ‘models/’)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa9371a9d0>)])]

Next, push this model from the local directory where it was downloaded to a common area accessible to workers. You can use utilities like gsutil if using Google Cloud Storage (GCS) as your object store:

code_block[StructValue([(u’code’, u”model_path = f’gs://{bucket}/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth'”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa916f93d0>)])]

Step 2 – 

For our Modelandler, we need to use the model_class, which in our case is torchvision.models.detection.maskrcnn_resnet50_fpn. 

We can now build our ModelHandler. Note that in this case, we are making a KeyedModelHandler, which is different from the simple example we used above. The KeyedModelHandler is used to indicate that the values coming into the RunInference API are a tuple, where the first value is a key and the second is the tensor that will be used by the model. This allows us to keep a reference of which image the inference is associated with, and it is used in our post processing step.

code_block[StructValue([(u’code’, u”my_cloud_model_handler = PytorchModelHandlerTensor(rn state_dict_path=model_path,rn model_class=torchvision.models.detection.maskrcnn_resnet50_fpn,rn model_params={‘pretrained’:False, ‘pretrained_backbone’ : False})rnrnmy_keyed_cloud_model_handler = KeyedModelHandler(my_cloud_model_handler)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa916f9990>)])]

All models need some level of pre-processing. Here we create a preprocessing function ready for our pipeline. One important note: when batching, the PyTorch ModelHandler will need the size of the tensor to be the same across the batch, so here we set the image_size as part of the pre-processing step. Also note that this function accepts a tuple with the first element being a string. This will be the ‘key’, and in the pipeline code, we will use the filename as the key.

code_block[StructValue([(u’code’, u’# In this function we can carry out any pre-processing steps that you need for the modelrnrndef preprocess_image(data: Tuple[str,Image.Image]) -> Tuple[str,torch.Tensor]:rn import torchrn import torchvision.transforms as transformsrn # Note RunInference will by default auto batch inputs for Torch modelsrn # Alternative to this is to create a wrapper class, and overriding the batch_elements_kwargsrn # function to return {max_batch_size=1}set max_batch_size=1rn image_size = (224, 224)rn transform = transforms.Compose([rn transforms.Resize(image_size),rn transforms.ToTensor(),rn ])rn return data[0], transform(data[1])’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa916f9310>)])]

The output of the model needs some post processing before being sent to BigQuery. Here we denormalise the label with the actual name, for example, person, and zip it up with the bounding box and score output:

code_block[StructValue([(u’code’, u”# The inference result is a PredictionResult object, this has two components the example and the inferencerndef post_process(kv : Tuple[str, PredictionResult]):rn # We will need the coco labels to translate the output from the modelrn coco_names = [‘unlabeled’, ‘person’, ‘bicycle’, ‘car’, ‘motorcycle’,rn ‘airplane’, ‘bus’, ‘train’, ‘truck’, ‘boat’, ‘traffic light’,rn ‘fire hydrant’, ‘street sign’, ‘stop sign’, ‘parking meter’,rn ‘bench’, ‘bird’, ‘cat’, ‘dog’, ‘horse’, ‘sheep’, ‘cow’,rn ‘elephant’, ‘bear’, ‘zebra’, ‘giraffe’, ‘hat’, ‘backpack’,rn ‘umbrella’, ‘shoe’, ‘eye glasses’, ‘handbag’, ‘tie’, ‘suitcase’,rn ‘frisbee’, ‘skis’, ‘snowboard’, ‘sports ball’, ‘kite’,rn ‘baseball bat’, ‘baseball glove’, ‘skateboard’, ‘surfboard’,rn ‘tennis racket’, ‘bottle’, ‘plate’, ‘wine glass’, ‘cup’, ‘fork’,rn ‘knife’, ‘spoon’, ‘bowl’, ‘banana’, ‘apple’, ‘sandwich’,rn ‘orange’, ‘broccoli’, ‘carrot’, ‘hot dog’, ‘pizza’, ‘donut’,rn ‘cake’, ‘chair’, ‘couch’, ‘potted plant’, ‘bed’, ‘mirror’,rn ‘dining table’, ‘window’, ‘desk’, ‘toilet’, ‘door’, ‘tv’,rn ‘laptop’, ‘mouse’, ‘remote’, ‘keyboard’, ‘cell phone’,rn ‘microwave’, ‘oven’, ‘toaster’, ‘sink’, ‘refrigerator’,rn ‘blender’, ‘book’, ‘clock’, ‘vase’, ‘scissors’, ‘teddy bear’,rn ‘hair drier’, ‘toothbrush’]rn # Extract the outputrn output = kv[1].inferencern # The model outputs labels, boxes and scores, we pull these out and creatern # a tuple with the label mapped to the coco_names and convert the tensorsrn return {‘file’ : kv[0], ‘inference’ : [rn {‘label’: coco_names[x],rn ‘box’ : y.detach().numpy().tolist(),rn ‘score’ : z.item()}rn for x,y,z in zip(output[‘labels’],rn output[‘boxes’],rn output[‘scores’])]}”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa424bb50>)])]

Let’s now run this pipeline with the direct runner, which will read the image from GCS, run it through the model, and output the results to BigQuery. We will need to pass in the BigQuery schema that we want to use, which should match the dict that we created in our post-processing. The WriteToBigquery transform takes the schema information as the table_spec object, which represents the following schema:

The schema has a file string, which is the key from our output tuple. Because each image’s prediction will have a List of (labels, score, and bounding box points), a RECORD type is used to represent the data in BigQuery.

Next, let’s create the pipeline using pipeline options, which will use the local runner to process an image from the bucket and push it to BigQuery. Because we need access to a project for the BigQuery calls, we will pass in project information via the options:

code_block[StructValue([(u’code’, u”pipeline_options = PipelineOptions().from_dictionary({rn ‘temp_location’:f’gs://{bucket}/tmp’,rn ‘project’: project})”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa2c25310>)])]

Next, we will see the pipeline put together with pre- and post-processing steps. 

The Beam transform MatchFiles matches all of the files found with the glob pattern provided. These matches are sent to the ReadMatches transform, which outputs a PCollection of ReadableFile objects. These have the Metadata.path information and can have the read() function invoked to get the files bytes(). These are then sent to the preprocessing path.

code_block[StructValue([(u’code’, u’pipeline_options = PipelineOptions().from_dictionary({rn ‘temp_location’:f’gs://{bucket}/tmp’,rn ‘project’: project})rnrn# This function is a workaround for a dependency issue caused by usage of PILrn# within a lambda from a notebookrndef open_image(readable_file):rn import iorn from PIL import Imagern return readable_file.metadata.path, Image.open(io.BytesIO(readable_file.read()))rnrnpipeline_options.view_as(SetupOptions).save_main_session = Truernrnwith beam.Pipeline(options=pipeline_options) as p:rn (prn | “ReadInputData” >> beam.io.fileio.MatchFiles(f’gs://{bucket}/images/*’)rn | “FileToBytes” >> beam.io.fileio.ReadMatches()rn | “ImageToTensor” >> beam.Map(open_image)rn | “PreProcess” >> beam.Map(preprocess_image)rn | “RunInferenceTorch” >> beam.ml.inference.RunInference(my_keyed_cloud_model_handler)rn | beam.Map(post_process)rn | beam.io.WriteToBigQuery(table_spec,rn schema=table_schema,rn write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,rn create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED)rn )’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa2c25850>)])]

After running this pipeline, the BigQuery table will be populated with the results of the prediction.

In order to run this pipeline on the cloud, for example if we had a bucket of 10000’s of images, we simply need to update the pipeline options and provide Dataflow with dependency information.:

Create requirements.txt file for the dependencies:

code_block[StructValue([(u’code’, u’!echo -e “apache-beam[gcp]\ntorch==1.11.0\ntorchvision==0.12.0″ > requirements.txt’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaa939fff10>)])]

Creating the right pipeline options:

code_block[StructValue([(u’code’, u”pipeline_options = PipelineOptions().from_dictionary({rn ‘runner’ : ‘DataflowRunner’,rn ‘region’ : ‘us-central1’,rn ‘requirements_file’ : ‘./requirements.txt’,rn ‘temp_location’:f’gs://{bucket}/tmp’,rn ‘project’: project})”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eaaa098e1d0>)])]

Conclusion 

The use of the new Apache Beam apache_beam.ml.RunInference transform removes large chunks of boiler plate data pipelines that incorporate machine learning models. Pipelines that make use of these transforms will also be able to make full use of the expressiveness of Apache Beam to deal with the pre- and post-processing of the data, and build complex multi-model pipelines with minimal code.

Source : Data Analytics Read More

7 Enterprise Applications for Companies Using Cloud Technology

7 Enterprise Applications for Companies Using Cloud Technology

The market for cloud technology is booming. Companies spent over $405 billion on cloud services last year. The sudden growth is not surprising, because the benefits of the cloud are incredible.

Enterprise cloud technology applications are the future industry standard for corporations. Cloud computing has found its way into many business scenarios and is a relatively new concept for businesses.

Here’s how enterprises use cloud technologies to achieve a competitive advantage in their essential business applications.

Data streaming

Information is moving at a faster pace today than ever before. Companies must take advantage of the information about their customers to stay updated and respond in real-time for quick decision-making.

Cloud computing (https://www.striim.com/product/striim-cloud/) can be used to support real-time data streams for better business decision-making. With cloud computing, companies can use their servers and hardware as much as they want with little overhead from extra hardware and software.

Cloud technology results in lower costs, quicker service delivery, and faster network data streaming. It also allows companies to offload large amounts of data from their networks by hosting it on remote servers anywhere on the globe.

Multi-cloud computing

Cloud computing allows companies’ multiple servers to store and manage their data in a distributed fashion. The model enables easy transfer of cloud services between different geographic regions, either onshore or offshore.

Companies have the flexibility to choose the location where they have the best infrastructure to deploy their businesses. Cloud computing has provided additional advantages like interoperability with traditional enterprise systems via APIs.

Testing new programs

With cloud computing, companies can test new programs and software applications from the public cloud. Parameters can be changed, updated, or performance enhanced without the time-consuming installation of new hardware and software.

Cloud technology allows companies to test many programs and decide which ones to launch for consumers quickly. The testing helps reduce the overall cost, time, and risk associated with building new hardware so companies can focus more on their core business functions.

Centralized data storage

Cloud technologies provide users with centralized storage for all information. For example, e-mail messages and documents are stored in the cloud, giving users access to their data from any location. 

Information is encrypted and stored on firewalls or protected by redundancy and many other security methods to ensure data safety. The data storage from one company can be accessed by another company through multiple clouds, decreasing or even eliminating the need for traditional onsite storage systems for companies.

Disaster recovery and data backup

Cloud technology helps companies recover business processes in a natural disaster. Cloud technology allows for information to be shared and stored in remote data centers, making it possible to continue operations even when premises get damaged. Companies can also implement redundant systems and virtualized equipment to ensure the continuity of services.

Data backup is one of the most important benefits of cloud computing. The data backup solution makes it possible to recover your business operations when a system fails. The system eliminates the requirement to purchase expensive backup systems and other equipment.

Big data analytics

The amount of data in today’s world is growing exponentially, and cloud computing provides excellent tools that analyze large volumes of information and carry out marketing segmentation. Using data mining and advanced analytics, companies can better analyze customer behavior patterns to predict their needs and provide more customized products and services.

An organization can process vast amounts of unstructured data from social media networks and websites by running computationally intense workflows on these applications in a distributed network environment.

Provision of infrastructure as a service (IaaS) and platform as a service (PaaS)

Cloud technologies allow companies to rent servers and storage to provide IaaS. The IaaS model works well for new startups that don’t want the capital expense of building an IT infrastructure.

Companies that handle hardware, software, and security measures in a data center provide managed cloud hosting services. Companies can share resources and pay according to their usage with automatic billing, lowering costs while still enjoying the security of scalable IT offerings.

Companies use PaaS to rent access to a development environment with applications and storage. Users can develop new software or update existing programs without installing additional software. Cloud computing enables companies to quickly deploy new applications for employees without wasting time on installation in their IT networks.

Before you go

Cloud technology has proven to be an excellent model for large companies. Cloud computing allows companies to increase productivity with reduced costs and improve their business structures by improving real-time information. Cloud technology also provides customers and employees new ways of connecting, sharing data, and creating customized services.

The post 7 Enterprise Applications for Companies Using Cloud Technology appeared first on SmartData Collective.

Source : SmartData Collective Read More

5 Ways B2B Companies Can Use Analytics for Pricing

5 Ways B2B Companies Can Use Analytics for Pricing

Analytics technology is very important for modern business. Companies spent over $240 billion on big data analytics last year. That figure is expected to grow as more businesses discover its benefits.

There are many important applications of data analytics technology. One of the most important is with helping companies set their prices correctly.

Analytics Can Be Essential for Helping Companies with their Pricing Strategies

We all know how difficult it can be to get the pricing right in B2B contexts. In today’s business world, pricing has become one of the most important parts of a company’s strategy. Prices must account for the company’s key value metric, cost structure, buyer personas, and other factors like competition.

Analytics technology can help companies optimize their prices more effectively. Last year, Tullika Tiwary addressed some of the reasons in her post in CustomerThink. Here are some ways companies can benefit from an analytics-driven pricing strategy:

Analytics helps companies segment their customers, so they can get a better understanding of their behavior. This helps them determine how different customer segments will behave in various situations, which helps them set their prices appropriately.Analytics can use existing data to model scenarios where customers will respond to different prices.Analytics technology helps companies make more nuanced insights about different products and the prices they should charge for them.

This article will walk you through 5 top B2B pricing models that you should consider when determining your own strategy. We will also talk about ways to incorporate data analytics into these models. We will also introduce methods to help you choose which model is right for your organization, as well as the implications of selecting a particular model.

Why Is It Important To Use Get B2B Pricing Right?

When you get the pricing right for your B2B business, you demonstrate your knowledge about buyer personas and their needs. You are proving that you understand your value-based metric and the dynamic factors in the marketplace, such as changes in the economy.

You are making buyers aware of how their competitors price their products and services to make informed decisions about what to pay for yours. That is why getting B2B pricing right is essential if you’re going to make a business value for your consumers and profit from it.

Since pricing strategies are so important, it is essential to use all technology at your fingertips to make the best pricing decisions. Analytics technology can help you significantly in this regard.

5 Top B2B Pricing Models and Ways to Use Analytics with Them

The best way to get the pricing right in B2B contexts is to consider how customer personas value your product or service, how price affects the buyer (their buying process), and your company’s cost structure.

We’ll look at each of these factors in detail and discuss their implications for successful B2B pricing decisions and how to use analytics with them.

Cost-Plus Pricing

The cost-plus pricing model is often used by small businesses that don’t have a lot of experience in B2B pricing. In this model, you create a cost structure for your product, then add a required profit margin. You can use analytics tools to track costs of your inputs and set prices correctly.

Value-Based Pricing

In a value-based pricing model, your price is determined by the value you provide to your buyer. The seller usually determines the price based on their ideal solution for a particular task as well as their budget. The value-based model is appropriate for companies focused on adding high-value products and services to their product offerings or just starting.

Analytics technology can be very useful in this regard, especially when costs are not static. You can use analytics models to forecast future costs of your inputs and apply the right markups on your products.

Needs-Based Pricing

Needs-based pricing is the opposite of value-based pricing. You may read here about the main differences. It considers a solution’s costs and benefits rather than its value to buyers. These are usually strictly business decisions where there is no selling involved. This method will help when you’re making decisions about services that add value to your overall product offering – for example, a consulting arm of your business or extra features on your SaaS solution.

Again, analytics technology can be very helpful, although the benefits will be applied in reverse. You will use data mining tools to understand the values customers get from various products and services and analytics technology will help you assess them. This will help you make more nuanced decisions.

Competition-Based Pricing

Competition-based pricing is a relatively straightforward approach that you can use for either new or existing B2B businesses. In this model, you look at your competitors’ prices and adjust yours appropriately to make sure your product or service is still profitable. You are interested in how the market reacts to your price and how consumers perceive it.

Analytics technology will help you better understand your competitors. You can use data mining tools to research pricing and sales volume of your competitors. This will help you understand your competitors and price your own products accordingly.

Dynamic Pricing

Dynamic pricing takes into account external factors that affect the buyer’s decision-making process. It can apply to any of the models we’ve discussed so far and is extremely useful when buyers are particularly sensitive about costs or when their cost structures change quickly over time.

This is one of the biggest reasons analytics is important. External variables that affect prices change quickly in many industries. You can use real-time data to stay up on these trends and take advantage of analytics to make the right decisions.

Choosing the Best Analytics-Driven Pricing Strategy

Once you’ve determined which analytics-based pricing model is right for you in your B2B situation, you need to pick the one that will work best in that context. You can do this by considering your value metric and then deciding based on it.

Know Your Value Metric

Your value metric is the yardstick to which you’re measuring your benefits to customers. It usually accounts for what your product or service gives customers (its value), as well as how long it lasts and how much of an impact it has on their lives.

Utilize Buyer Personas

Before choosing a pricing strategy for your B2B business, you need to know who your buyer persona is. Buyer personas are archetypes of real potential buyers that you create after analyzing the data about the type of person who would buy your solution. You should have complete knowledge about their age, profession, income level, and other information that will help you tailor your product or service to their needs.

Decide On A Pricing Model

With this information, you will be able to make more informed decisions regarding your pricing model and be able to make more money in the process. This works in both B2B and B2C contexts – whether you’re running a consulting business or launching a new SaaS solution that solves certain problems for companies in the industry they belong to, this guide will help you understand how to price your product or service properly.

Consider Buyer Expectations

There are many things that potential buyers expect from a product or service and what they need from it. For example, some buyers just want an easy way to access a service in their preferred way, while others may also want to make money on it as well – for example, by selling it to other businesses or customers. Understanding this will help you determine whether your product or service is suitable for those buyers and ensure that your B2B business is successful in the long run.

Take Into Account All Other Factors

Before deciding what type of pricing model will work best for your particular business, consider all other factors that affect how you price your product or service. Your company’s cost structure, its ability to work with clients, and market demand are just some examples.

Analytics is Vital in Pricing Strategies

Pricing strategies are different for B2B and B2C contexts, but the core principles of effective pricing strategies apply to both. The key to getting B2B pricing right is knowing how to read the data and information in your marketplace, your company’s cost structure, your value metric, and what buyers expect from a product or service. You will need the right analytics tools and the right pricing strategy to make the best decisions.

If you want to make more money with your business or just get the most out of it, use this guide today to get solid insights on how you can do it. It will help you determine the right pricing model for your needs and choose it so you can start using it immediately.

The post 5 Ways B2B Companies Can Use Analytics for Pricing appeared first on SmartData Collective.

Source : SmartData Collective Read More