Blog

Streaming graph data with Confluent Cloud and Neo4j on Google Cloud

Streaming graph data with Confluent Cloud and Neo4j on Google Cloud

There are many ways to classify data. Data can be characterized as batch and streaming. Similarly data can be characterized as tabular or connected. In this blog post, we’re going to explore an architecture focused on a particular kind of data — connected data which is streaming.

Neo4j is the leading graph database. It stores data as nodes and relationships between those nodes. This allows users to uncover insights from connections in their connected data. Neo4j offers Neo4j Aura, a managed service for Neo4j.

Apache Kafka is the de facto tool today for creating streaming data pipelines. Confluent offers Confluent Cloud, a managed service for Apache Kafka. In addition, Confluent provides the tools needed to bring together real-time data streams to connect the whole business. Its data streaming platform turns events into outcomes, enables intelligent, real-time apps, and empowers teams and systems to act on data instantly.

Both these products are available on Google Cloud, through Google Cloud Marketplace. Used together, Neo4j Aura and Confluent Cloud provide a streaming architecture that can extract value from connected data. Some examples include:

Retail: Confluent Cloud can stream real-time buying data to Neo4j Aura. With this connected data in Aura, graph algorithms can be leveraged to understand buying patterns. This allows for real time product recommendations, customer churn prediction. In supply chain management, use cases include finding alternate suppliers and demand forecasting.

Healthcare and Life Sciences: Streaming data into Neo4j Aura allows for real-time case prioritization and triaging of patients based on medical events and patterns. This architecture can capture patient journey data including medical events for individuals. This allows for cohort based analysis across events related to medical conditions patients experience, medical procedures they undergo and medication they take. This cohort journey can then be used to predict future outcomes or apply corrective actions.

Financial Services: Streaming transaction data with Confluent Cloud into Neo4j Aura allows for real time fraud detection. Previously unknown, benign-looking fraud-ring activities can be tracked in real-time and detected. This reduces the risk of financial losses and improves customer experience.

This post will take you through setting up a fully managed Kafka cluster running in Confluent Cloud and creating a streaming data pipeline that can ingest data into Neo4j Aura.

In this example we generate a message manually in Confluent Cloud. For production implementations, messages are typically generated by upstream systems. On Google Cloud this includes myriad Google services that Confluent Cloud can connect to such as Cloud Functions, BigTable and Cloud Run.

Pre-requisites

So let’s start building this architecture. We’ll need to set up a few things:

Google Cloud Account: You can create one for free if you don’t have one. You also get $300 credits once you sign-up.

Confluent Cloud: The easiest way to start with Confluent Cloud is to deploy through Google Cloud Marketplace. The relevant listing is here.

Neo4j Aura: To get started with Neo4j Aura, just deploy it via Google Cloud Marketplace here.

A VM: We need a terminal to execute confluent CLI commands and run docker. You can create a VM using Google Compute Engine (GCE).

Creating a Kafka topic

To start we’re going to need to create a Kafka cluster in Confluent Cloud. Then we’ll create a Kafka topic in that cluster. The steps below can be done via the Confluent Cloud UI. However, let’s do it via command line so that it is easier to automate the whole process. 

First, open a bash terminal on your GCE VM. Then, let’s install the Confluent CLI tool.
curl -sL –http1.1 https://cnfl.io/cli | sh -s — latest

Login to your Confluent account
confluent login –save

We have to create an environment and cluster to use. To create an environment:
confluent environment create test

To list down the environments available, run:
confluent environment list

This command will return a table of environment IDs and names. You will find the newly created `test` environment in the result. Let’s try to use its environment ID to create all the resources in the `test` environment. In my case, `env-3r2362` is the ID for the `test` environment.
confluent environment use env-3r2362

Using this environment, let’s create a kafka cluster on the GCP `us-central1` region.
confluent kafka cluster create test –cloud gcp –region us-central1

You can choose some other region from the list of supported regions:
confluent kafka region list –cloud gcp

You can obtain the cluster ID by executing:
confluent kafka cluster list

Now, let’s use the environment and cluster created above.
confluent environment use test
confluent kafka cluster use lkc-2r1rz1

An API key/secret pair is required to create a topic on your cluster. You also need it to produce/consume messages in a topic. If you don’t have one, you can create it using:
confluent api-key create –resource lkc-2r1rz1

Now, let’s create a topic to produce and consume in this cluster using:
confluent kafka topic create my-users

With these steps, our Kafka cluster is ready to produce and consume messages.

Creating a Connector instance

The Neo4j Connector for Apache Kafka can be run self-managed on a container inside Google Kubernetes Engine. Let’s create a `docker-compose.yml` and run a Kafka connect instance locally.

In the docker-compose file, we are trying to create and orchestrate a Kafka Connect container. We use the `confluentinc/cp-kafka-connect-base` as the base image. The connector will be running and exposed on port 8083.

code_block[StructValue([(u’code’, u”version: ‘3’rnservices:rn kconnect-neo4j-confluent:rn image: confluentinc/cp-kafka-connect-base:7.3.1rn container_name: kconnect-neo4j-confluentrn ports:rn – 8083:8083″), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588de750>)])]

Upon container start, we are going to install a Neo4j Sink Connector package via confluent-hub. Once the package is installed, we should be good to create a Sink instance running within the container.

First, let’s set the environment variables that the base image expects. 

In the following snippet, replace your Kafka URL and Port, which can be gotten from Confluent Cloud. 
`<KAFKA_INSTANCE_URL>` with your Kafka URL 
`<KAFKA_PORT>` with your Kafka Port. 

We are creating topics specific to this connector for writing configuration, offset and status data. Since we are going to write JSON data, let’s use JsonConverter for `CONNECT_KEY_CONVERTER` and `CONNECT_VALUE_CONVERTER`.

Our Kafka cluster inside confluent is protected and has to be accessed via a Key and Secret.

Kafka API and Secret created during setup has to be used to replace `<KAFKA_API_KEY>` and `<KAFKA_API_SECRET>` inside CONNECT_SASL_JAAS_CONFIG and CONNECT_CONSUMER_SASL_JAAS_CONFIG variables.

code_block[StructValue([(u’code’, u’environment:rn CONNECT_BOOTSTRAP_SERVERS: <KAFKA_INSTANCE_URL>:<KAFKA_PORT>rn CONNECT_REST_ADVERTISED_HOST_NAME: ‘kconnect-neo4j-confluent’rn CONNECT_REST_PORT: 8083rn CONNECT_GROUP_ID: kconnect-neo4j-confluentrn CONNECT_CONFIG_STORAGE_TOPIC: _config-kconnect-neo4j-confluentrn CONNECT_OFFSET_STORAGE_TOPIC: _offsets-kconnect-neo4j-confluentrn CONNECT_STATUS_STORAGE_TOPIC: _status-kconnect-neo4j-confluentrn CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_PLUGIN_PATH: ‘/usr/share/java,/usr/share/confluent-hub-components/’rn CONNECT_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_RETRY_BACKOFF_MS: “500”rn CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_SASL_MECHANISM: “PLAIN”rn CONNECT_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_CONSUMER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_CONSUMER_SASL_MECHANISM: “PLAIN”rn CONNECT_CONSUMER_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_CONSUMER_RETRY_BACKOFF_MS: “500”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f32d0>)])]

With all the Connector variables set, let’s focus on installing and configuring the Neo4j Sink connector. We have to install the binary via Confluent-hub
confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2

Sometimes, the above command might fail if there is any bandwidth or connection issue. Let’s keep trying until the command succeeds.

code_block[StructValue([(u’code’, u’while [ $? -eq 1 ]rn dorn echo “Failed to download the connector, will sleep and retry again”rn sleep 10rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn done’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f31d0>)])]

Once the package is installed, we have to use the RESTful API that the connector provides to install and configure a Neo4j Sink instance. Before that let’s wait until the connector worker is running:

code_block[StructValue([(u’code’, u’echo “Start Self-managed Connect Worker…”rn/etc/confluent/docker/run &rnwhile : ; dorncurl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)rnecho -e $$(date) ” Listener State : ” $$curl_status ” (waiting for 200)”rnif [ $$curl_status -eq 200 ] ; thenrnbreakrnfirnsleep 5rndone’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3290>)])]

After the worker is up, we can use the REST API to create a new Neo4j Sink Connector instance that listens to our topic and writes the JSON data in Neo4j. 

In the config below, we are listening to a topic named `test` “topics”: “my-users” and ingest the data via this cypher command: “MERGE (p:Person{name: event.name, surname: event.surname})” defined in the “neo4j.topic.cypher.test” property. Here, we are using a simple command to create or update a new Person node defined in the test topic. 

You might have to replace the <NEO4J_URL>, <NEO4J_PORT>, <NEO4J_USER>, <NEO4J_PASSWORD> placeholders with appropriate values.

code_block[StructValue([(u’code’, u’curl -i -X PUT -H “Accept:application/json” \rn -H “Content-Type:application/json” \rn http://localhost:8083/connectors/neo4j-sink/config \rn -d ‘{rn “topics”: “my-users”,rn “connector.class”: “streams.kafka.connect.sink.Neo4jSinkConnector”,rn “key.converter”: “org.apache.kafka.connect.storage.StringConverter”,rn “value.converter”: “org.apache.kafka.connect.json.JsonConverter”,rn “value.converter.schemas.enable”: “false”,rn “errors.retry.timeout”: “-1”,rn “errors.retry.delay.max.ms”: “1000”,rn “errors.tolerance”: “all”,rn “errors.log.enable”: “true”,rn “errors.log.include.messages”: “true”,rn “neo4j.server.uri”: “neo4j+s://<NEO4J_URL>:<NEO4J_PORT>”,rn “neo4j.authentication.basic.username”: “<NEO4J_USER>”,rn “neo4j.authentication.basic.password”: “<NEO4J_PASSWORD>”,rn “neo4j.topic.cypher.my-users”: “MERGE (p:Person{name: event.name, surname: event.surname})”rn }”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3950>)])]

Finally, let’s wait until this connector worker is up.

code_block[StructValue([(u’code’, u’while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors/neo4j-sink/status)rn echo -e $$(date) ” Neo4j Sink Connector State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn done’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f38d0>)])]

This is the complete docker-compose.yml. Ensure that you replace all the placeholders mentioned above:
docker-compose up

code_block[StructValue([(u’code’, u’—rnversion: ‘3’rnservices:rn kconnect-neo4j-confluent:rn image: confluentinc/cp-kafka-connect-base:7.3.1rn container_name: kconnect-neo4j-confluentrn ports:rn – 8083:8083rn environment:rn CONNECT_BOOTSTRAP_SERVERS: <KAFKA_INSTANCE_URL>:<KAFKA_PORT>rn CONNECT_REST_ADVERTISED_HOST_NAME: ‘kconnect-neo4j-confluent’rn CONNECT_REST_PORT: 8083rn CONNECT_GROUP_ID: kconnect-neo4j-confluentrn CONNECT_CONFIG_STORAGE_TOPIC: _config-kconnect-neo4j-confluentrn CONNECT_OFFSET_STORAGE_TOPIC: _offsets-kconnect-neo4j-confluentrn CONNECT_STATUS_STORAGE_TOPIC: _status-kconnect-neo4j-confluentrn CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverterrn CONNECT_PLUGIN_PATH: ‘/usr/share/java,/usr/share/confluent-hub-components/’rn CONNECT_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_RETRY_BACKOFF_MS: “500”rn CONNECT_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_SASL_MECHANISM: “PLAIN”rn CONNECT_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_SECURITY_PROTOCOL: “SASL_SSL”rn CONNECT_CONSUMER_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: “https”rn CONNECT_CONSUMER_SASL_MECHANISM: “PLAIN”rn CONNECT_CONSUMER_SASL_JAAS_CONFIG: ‘org.apache.kafka.common.security.plain.PlainLoginModule required username=”<KAFKA_API_KEY>” password=”<KAFKA_API_SECRET>”;’rn CONNECT_CONSUMER_REQUEST_TIMEOUT_MS: “20000”rn CONNECT_CONSUMER_RETRY_BACKOFF_MS: “500”rn command:rn – bashrn – -crn – |rn echo “Install Neo4j Sink Connector”rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn rn while [ $? -eq 1 ]rn dorn echo “Failed to download the connector, will sleep and retry again”rn sleep 10rn confluent-hub install –no-prompt neo4j/kafka-connect-neo4j:5.0.2rn donernrnrn echo “Start Self-managed Connect Worker…”rn /etc/confluent/docker/run &rn while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)rn echo -e $$(date) ” Listener State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn donernrnrn echo -e “\n–\n+> Create Neo4j Sink Connector”rn curl -i -X PUT -H “Accept:application/json” \rn -H “Content-Type:application/json” \rn http://localhost:8083/connectors/neo4j-sink/config \rn -d ‘{rn “topics”: “my-users”,rn “connector.class”: “streams.kafka.connect.sink.Neo4jSinkConnector”,rn “key.converter”: “org.apache.kafka.connect.storage.StringConverterrn”,rn “value.converter”: “org.apache.kafka.connect.json.JsonConverter”,rn “value.converter.schemas.enable”: “false”,rn “errors.retry.timeout”: “-1”,rn “errors.retry.delay.max.ms”: “1000”,rn “errors.tolerance”: “all”,rn “errors.log.enable”: “true”,rn “errors.log.include.messages”: “true”,rn “neo4j.server.uri”: “neo4j+s://<NEO4J_URL>:<NEO4J_PORT>”,rn “neo4j.authentication.basic.username”: “<NEO4J_USER>”,rn “neo4j.authentication.basic.password”: “<NEO4J_PASSWORD>”,rn “neo4j.topic.cypher.my-users”: “MERGE (p:Person{name: event.name, surname: event.surname})”rn }’rnrnrn echo “Checking the Status of Neo4j Sink Connector…”rn while : ; dorn curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors/neo4j-sink/status)rn echo -e $$(date) ” Neo4j Sink Connector State : ” $$curl_status ” (waiting for 200)”rn if [ $$curl_status -eq 200 ] ; thenrn breakrn firn sleep 5rn donern #rn #rn sleep infinity’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea7588f3610>)])]

Sending a message

Let’s write some messages via Confluent UI to test whether they get persisted on Neo4j. Go to your Confluent Cloud UI, click on your environment

You will now see the clusters within the environment. Click the cluster you created previously.

From the sidebar on the left, click on the `Topics` section and the `my-users` topic we created previously.

From the messages tab, you can start producing messages to this topic by clicking on the `Produce a new message to this topic` button.

Click the `Produce` button once you are done.

Alternatively, you can also write messages to our `my-users` topic via the command line.

Confluent CLI provides a command to write and consume messages from topics. Before using this command ensure that you are using an api-key.
confluent api-key use <API_KEY> –resource lkc-2r1rz1

confluent kafka topic produce my-users –parse-key –delimiter “:”

Using the last command, we can add messages containing key and value separated by a delimiter “:” in the topic.
“event”:{“name”: “John”, “surname”: “Doe”}

Go to your Neo4j Browser and check for the new Person node created with name ‘John’ and surname ‘Doe’.

Conclusion

In this blog post, we walked through setting up Confluent Cloud and Neo4j Aura on Google Cloud. We then used the Neo4j Connector for Apache Kafka to bridge between them. With that environment created, we tested sending a message through Confluent Cloud and capturing it in the Neo4j database. You can try this yourself with a Google Cloud account and the marketplace listings for Neo4j Aura and Confluent Cloud.

Confluent is a great data streaming platform to capture high volume of data in motion. Neo4j is a native graph platform that can sift through the connected data to deliver highly contextual insights in a low latency manner. In a highly connected world, real-time insights can add huge value to businesses. Customers across verticals are using Confluent Cloud and Neo4j to solve problems the moment they happen. Graph Data Science algorithms are leveraged to understand the seemingly random network, derive hidden insights, predict and prescribe the next course of action.

To know more about Neo4j and its use cases, reach out to ecosystem@neo4j.com.

Source : Data Analytics Read More

Introducing partitioning and clustering recommendations for optimizing BigQuery usage

Introducing partitioning and clustering recommendations for optimizing BigQuery usage

Do you have a lot of BigQuery tables? Do you find it hard to keep track of which ones are partitioned and clustered, and which ones could be? If so, we have good news. We’re launching a partitioning and clustering recommender that will do the work for you! The recommender analyzes all your organization’s workloads and tables and identifies potential cost optimization opportunities. And the best part is, it’s completely free!

“The BigQuery partitioning and clustering recommendations are awesome! They have helped our customers identify areas where they can reduce costs, improve performance, and optimize our BigQuery usage.” Sky, one of Europe leading media and communications companies

How does the recommender work?

Partitioning divides a table into segments, while clustering sorts the table based on user-defined columns. Both methods can improve the performance of certain types of queries, such as queries that use filter clauses and queries that aggregate data.

BigQuery’s partitioning and clustering recommender analyzes each project’s workload execution over the past 30 days to look for suboptimal scans of the table data. The recommender then uses machine learning to estimate the potential savings and generate final recommendations. The process has four key steps: Candidate Generation, Read Pattern Analyzer, Write Pattern Analyzer, and Generate Recommendations.

Candidate Generation is the first step in the process, where tables and columns are selected based on specific criteria. For Partitioning, tables larger than 100 Gb are chosen, and for Clustering tables larger than 10 Gb are chosen. The reason for filtering out the smaller tables is because the optimization benefit is smaller and less predictable. Then we identify columns that meet BigQuery’s partitioning and clustering requirements. 

In the Read Pattern Analyzer step, the recommender analyzes the logs of queries that filter on the selected columns to determine their potential for cost savings through partitioning or clustering. Several metrics, such as filter selectivity, potential file pruning, and runtime, are considered, and machine learning is used to estimate the potential slot time saved if partitioning or clustering is applied.

The Write Pattern Analyzer step is then used to estimate the cost that partitioning or clustering may introduce during write time. Write patterns and table schema are analyzed to determine the net savings from partitioning or clustering for each column.

Finally, in Generate Recommendations, the output from both the Read Pattern Analyzer and Write Pattern Analyzer is used to determine the net savings from partitioning or clustering for each column. If the net savings are positive and meaningful, the recommendations are uploaded to the Recommender API with proper IAM permissions.

Discovering BigQuery partitioning and clustering recommendations

You can access these recommendations via a few different channels:

Via the lightbulb or idea icon in the top right of BigQuery’s UI page

On our console via the Recommendation Hub

Via our Recommender API

You can also export the recommendations to BigQuery using BigQuery Export.

To learn more about the recommender, please see the public documentation

We hope you use BigQuery partitioning and clustering recommendations to optimize your BigQuery tables, and can’t wait to hear your feedback and thoughts about this feature. Please feel free to reach us at active-assist-feedback@google.com.

Source : Data Analytics Read More

Scaling reaction-based enumeration for next-gen drug discovery using Google Cloud

Scaling reaction-based enumeration for next-gen drug discovery using Google Cloud

Discovering new drugs is at the heart of modern medicine, yet finding a “needle in the haystack” is immensely challenging due to the enormous number of possible drug-like compounds (estimated at 10^60 or more). To increase our chances of finding breakthrough medicines for patients with unmet medical needs, we need to explore the vast universe of chemical compounds and use predictive in silico methods to select the best compounds for lab-based experiments. Enter reaction-based enumeration, a powerful technique that generates novel, synthetically accessible molecules. Our team at Psivant has been pushing the boundaries of this process to an unprecedented scale, implementing reaction-based enumeration on Google Cloud. By tapping into Google Cloud’s robust infrastructure and scalability, we’re unlocking the potential of this technique to uncover new chemical entities, leading to groundbreaking advancements and life-altering therapeutics.

Our journey began with a Python-based prototype, leveraging RDKit for chemistry and Ray for distributed computing. Despite initial progress, we encountered a roadblock: our on-premises computing resources were limited, holding back our prototype’s potential. While we could explore millions of compounds, our ambition was to explore billions and beyond. To address this limitation, we sought a solution that offered greater flexibility and scalability, leading us to the powerful ecosystem provided by Google Cloud.

Leveraging Google Cloud infrastructure

Google Cloud’s technologies allowed us to supercharge our pipelines and conduct chemical compound exploration at scale. By integrating Dataflow, Google Workflows, and Compute Engine, we built a sophisticated, high-performance system that is both flexible and resilient. 

Dataflow is a managed batch and streaming system that provides real-time, fault-tolerant, and parallel processing capabilities to manage and manipulate massive datasets effectively. Google Workflows orchestrates the complex, multi-stage processes involved in enumeration, ensuring smooth transitions and error handling across various tasks. Finally, Compute Engine provides us with scalable, customizable infrastructure to run our demanding computational workloads, ensuring optimal performance and cost-effectiveness. Together, these technologies laid the foundation for our cutting-edge solution to explore the endless possibilities of reaction-based enumeration.

We built a cloud-native solution to achieve the scalability we sought, taking advantage of Dataflow, which relies on Apache Beam, a versatile programming model with its own data structures, such as the PCollection — a distributed dictionary designed to handle computation efficiently.

Enter Dataflow 

Balancing performance and cost-efficiency was crucial during pipeline development. That is where Dataflow came in, allowing us to optimize resource utilization without compromising performance, paving the way for optimal resource allocation and cost control.

Our pipeline required a deep understanding of the chemistry libraries and Google Cloud ecosystem. We built a simple, highly distributed enumeration pipeline, then added various chemistry operations while ensuring scalability and performance at every step. Google Cloud’s team played a pivotal role in our success, providing expert guidance and troubleshooting support.

To 100 billion and beyond

Our journey implementing reaction-based enumeration at scale on Google Cloud has been an inspiring testament to the collaborative spirit, relentless innovation, and unwavering pursuit of excellence. With smart cloud-native engineering and cutting-edge technologies, our workflow achieves rapid scalability, capable of deploying thousands of workers within minutes, enabling us to explore an astounding 100 billion compounds in under a day. Looking ahead, we’re excited to integrate Vertex AI into our workflow as our go-to MLOps solution, and to supercharge our high-throughput virtual screening experiments with the robust capabilities of Batch, further enhancing our capacity to innovate.

We’d like to extend our heartfelt thanks to Javier Tordable for his guidance in distributed computing, enriching our understanding of building a massively scalable pipeline.

As we persistently push the boundaries of computational chemistry and drug discovery, we are continuously motivated by the immense potential of reaction-based enumeration. This potential is driven by the powerful and flexible infrastructure of Google Cloud, combined with the comprehensive capabilities of Psivant’s QUAISAR platform. Together, they empower us to design the next generation of groundbreaking medicines to combat the most challenging diseases.

Source : Data Analytics Read More

Transform your unstructured data with AI using BigQuery object tables, now GA

Transform your unstructured data with AI using BigQuery object tables, now GA

Today, the vast majority of data that gets generated in the world is unstructured (text, audio, images), but only a fraction of it ever gets analyzed. The AI pipelines required to unlock the value of this data are siloed from mainstream analytic systems, requiring engineers to build custom data infrastructure to integrate structured and unstructured data insights. 

Our goal is to help you realize the potential of all your data, whatever its type and format. To make this easier, we launched the preview of BigQuery object tables at Google Cloud Next 2022. Powered by BigLake, object tables provide BigQuery users a structured record interface for unstructured data stored in Cloud Storage. With it, you can use existing BigQuery frameworks to process and manage this data using object tables in a secure and governed manner. 

Since we launched the preview, we have seen customers use object tables for many use cases and are excited to announce that object tables are now generally available.

Analyzing unstructured data with BigQuery object tables

Object tables let you leverage the simplicity of SQL to run a wide range of AI models on your unstructured data. There are three key mechanisms for using AI models; all enabled through the BigQuery Inference engine

First, you can import your models and run queries on the object table to process the data within BigQuery. This approach works well for customers looking for an integrated BigQuery solution that allows them to utilize their existing BigQuery resources. Since the preview, we’ve expanded support beyond TensorFlow models with TF-Lite and ONNX models and introduced new scalar functions to pre-process images. We also added support for saving pre-processed tensors to allow for efficient multi-model use of tensors to help you reduce slot usage. 

Second, you can choose from various pre-trained Google models such as Cloud Vision API, Cloud Natural Language API, and Cloud Translation API, for which we have added pre-defined SQL table valued functions that invoke when querying an object table. The results of the inference are stored as a BigQuery table. 

Third, you can integrate customer-hosted AI models or custom models built through Vertex AI using remote functions. You can call these remote functions from BigQuery SQL to serve objects to models, and the results are returned as BigQuery tables. This option is well suited if you run your own model infrastructure such as GPUs, or have externally maintained models. 

During the preview, customers used a mix of these integration mechanisms to unify their AI workloads with data already present in BigQuery. For example, Semios, an agro-tech company, uses imported and remote image processing models to serve precision agriculture use cases. 

“With the new imported model capability with object table, we are able to import state-of-the-art Pytorch vision models to process image data and improve in-orchard temperature prediction using BigQuery. And with the new remote model capability, we can greatly simplify our pipelines and improve maintainability.” – Semios

Storage insights, fine-grained security, sharing and more 

Beyond processing with AI models, customers extending existing data management frameworks to unstructured data, resulting in several novel use cases such as:

Cloud Storage insights – Objects tables provide an SQL interface to Cloud Storage metadata (e.g., storage class), making it easy to build analytics on Cloud Storage usage, understand growth, optimize costs, and inform decisions to better manage data.

Fine-grained access control at scale – Object tables are built on BigLake’s unified lakehouse infrastructure and support row- and column-level access controls. You can use it to secure specific objects with governed signed URLs. Fine-grained access control has broad applicability for augmenting unstructured data use cases, for example securing specific documents or images based on PII inferences returned by the AI model.  

Sharing with Analytics Hub – You can share object tables, similar to BigLake tables, via Analytics Hub, expanding the set of sharing use cases for unstructured data. Instead of sharing buckets, you now get finer control over the objects you wish to share with partners, customers, or suppliers.  

Run generative AI workloads using object tables (Preview)

Members of Google Cloud AI’s trusted tester program can use a wide range of generative AI models available in Model Garden to run on the object table. You can use Generative AI studio to decide on a foundation model of your choice or fine-tune it to deploy a custom API endpoint. You can then call this API using BigQuery using the remote function integration to pass prompts/inputs and return the text results from Language Learning Models (LLM) in a BigQuery table. In the coming months, we will enable SQL functions through the BigQuery Inference engine to call LLMs directly, further simplifying these workloads. 

Getting started

To get started, follow along with a guided lab or tutorials to run your first unstructured data analysis in BigQuery. Learn more by referring to our documentation.

Acknowledgments: Abhinav Khushraj, Amir Hormati, Anoop Johnson, Bo Yang, Eric Hao, Gaurangi Saxena, Jeff Nelson, Jian Guo, Jiashang Liu, Justin Levandoski, Mingge Deng, Mujie Zhang, Oliver Zhuang, Yuri Volobuev and rest of the BigQuery engineering team who contributed to this launch.

Source : Data Analytics Read More

How AI is Boosting the Customer Support Game

How AI is Boosting the Customer Support Game

Great customer support plays a central role in a company’s success and profitability. Businesses with an excellent reputation for customer service tend to do better overall and stay in business longer. Companies that miss the mark often lose customers and generate negative reviews online. Unfortunately, bad reviews can deter new customers.

What makes some businesses fail to provide outstanding customer support? Usually, it’s because they haven’t employed the right technology. The reasons for this vary. Some business owners try to get by with the bare minimum, don’t know what they need, or can’t afford to hire programmers. This is where AI can help; with the right tools, there’s no need to hire developers.

AI greatly enhances customer support

Of all the advances in technology, AI has the power to enhance customer support like never before. Conversational AI, for example, completely transforms the interactions between customers and the support team.

The bottom line is that the majority of issues get resolved quickly, without involving a human agent. The business benefits by reducing agent workloads for repetitive issues that can be resolved with simple self-help solutions.

Through a conversational AI platform, users can engage with a chatbot and have a humanlike conversation instead of getting robotic, pre-programmed responses that don’t always help. Powered by generative AI and natural language processing (NLP), this technology can create customized, personalized responses that are relevant to the nuances of a user’s inquiry.

This technology isn’t just for customers – it can also be used to power internal help desk solutions, like the conversational AI platform from Aisera. When a business uses this technology for employees, they get answers and solutions fast, which makes them far more productive. When paired with ChatGPT, this technology gives automated support systems an even bigger boost.

Machine learning continually improves performance

Perhaps the best part of AI is machine learning. Companies that provide AI customer support solutions generally analyze mass amounts of user input, including tickets, to train the algorithm to respond appropriately.

This is an ongoing process, which means the effectiveness of an AI customer service tool is constantly improving. Compared to traditional tools, like rule-based chatbots that don’t improve unless they’re reprogrammed, conversational AI won’t fall behind.

Here are just some of the benefits businesses can get from using this automated AI technology:

Better customer service. Customers will always have questions, and AI technology helps them get the answers they seek.
Self-service options. A conversational AI chatbot is ideal for providing users with self-service options, which happens to be what people want. According to data sourced by Customer Gauge, 67% of people prefer self-help over speaking to a human for support, and 40% of people who call in have already searched for answers on their own.
Fast response times. Responses from an AI system are almost always faster than engaging a human.
Cost-effective. Great customer service is priceless because it’s what keeps a business going. Using AI reduces expenses, like wasted payroll dollars for agents who spend all day answering simple questions users can handle on their own with the right resource.
Less stress on agents. Since AI chatbots can resolve most customer issues automatically, support agents will have fewer tickets to manage. A reduced workload results in less stress and more time to focus on urgent matters.
24/7 support without live agents. Customers won’t have to wait to get support when you have a round-the-clock AI-powered chatbot available.

AI enhances customer service

Some people are worried that AI is going to obliterate the need for customer support agents, but that’s not true. There will always be some elements that require a human touch. This technology will transform jobs, but it’s only going to enhance – not replace – customer service teams.

Completely automating your entire customer service team won’t turn out well. However, you can use AI to support your teams by making them more productive and relieving them of the repetitive tasks that lead to frustration and job dissatisfaction.

The ideal solution is to use a blend of humans and AI-powered technology for customer service solutions.

AI chatbots empower businesses to succeed

When great customer service is the goal, AI delivers outstanding results through a variety of automated tools powered by NLP, machine learning, generative AI, and conversational AI. It’s even used in omnichannel marketing, which further supports this goal. For businesses that want to continuously improve customer service and reduce the strain on their agents, using AI technology is a must.

Source : SmartData Collective Read More

How an open data cloud is enabling Airports of Thailand and EVme to reshape the future of travel

How an open data cloud is enabling Airports of Thailand and EVme to reshape the future of travel

Aviation and accommodation play a big role in impacting the tourism economy, but analysis of recent data also highlights tourism’s impact on other sectors, from financial services to healthcare, to retail and transportation. 

With travel recovery in full swing post pandemic, Google search queries related to “travel insurance” and “medical tourism” in Thailand have increased by more than 900% and 500% respectively. Financial institutions and healthcare providers must therefore find ways to deliver tailored offerings to travelers who are seeking peace of mind from unexpected changes or visiting the country to receive specialized medical treatment.

Interest in visiting Thailand for “gastronomy tourism” is also growing, with online searches increasing by more than 110% year-on-year.  Players in the food and beverage industry should therefore be looking at ways to better engage tourists keen on authentic Thai cuisine.

Most importantly, digital services will play an integral role in travel recovery. More than one in two consumers in Thailand are already using online travel services, with this category expected to grow 22% year-on-year and contribute US$9 billion to Thailand’s digital economy by 2025. To seize growth opportunities amidst the country’s tourism rebound, businesses cannot afford to overlook the importance of offering always-on, simple, personalized, and secure digital services.

That is why Airports of Thailand (AOT), SKY ICT (SKY) and EVME PLUS (EVme) are adopting Google Cloud’s open data cloud to deliver sustainable, digital-first travel experiences.

Improving the passenger experience in the cloud

With Thailand reopening its borders, there has been an upturn in both inbound and outbound air travel. To accommodate these spikes in passenger traffic across its six international airports, AOT migrated its entire IT footprint to Google Cloud, which offers an open, scalable, and secure data platform, with implementation support from its partner SKY, an aviation technology solutions provider.

Tapping on Google Cloud’s dynamic autoscaling capabilities, the IT systems underpinning AOT’s ground aviation services and the SAWASDEE by AOT app can now accommodate up to 10 times their usual workloads. AOT can also automatically scale down its resources to reduce costs when they are no longer in use. Using the database management services of Google Cloud to eliminate data silos, the organization is able to enhance its capacity to deliver real-time airport and flight information to millions of passengers. As a result, travelers enjoy a smoother passenger experience, from check-in to baggage collection.

At the same time, SKY uses Google Kubernetes Engine (GKE) to transform SAWASDEE by AOT into an essential, all-in-one travel app that offers a full range of tourism-related services. GKE allows AOT to automate application deployment and upgrades without causing downtime. This frees up time for the tech team to accelerate the launch of new in-app features, such as a baggage tracker service, airport loyalty programs, curated travel recommendations, an e-payment system, and more.

EVme drives sustainable travel with data

Being able to travel more efficiently is only one part of the future of travel. More than ever, sustainability is becoming a priority for consumers when they plan their travel itineraries. For instance, search queries related to “sustainable tourism” in Thailand have increased by more than 200% in the past year, with close to four in 10 consumers sharing that they are willing to pay more for a sustainable product or service.

To meet this increasing demand and support Thailand’s national efforts to become a low-carbon society, EVme, a subsidiary of PTT Group, is building its electric vehicle lifestyle app on Google Cloud, the industry’s cleanest cloud. It has also deployed the advanced analytics and business intelligence tools of Google Cloud to offer its employees improved access to data-driven insights, which helps them better understand customer needs and deliver personalized interactions. These insights have helped EVme determine the range of electric vehicle models it offers for rental via its app, so as to cater to different preferences. At the same time, the app can also share crucial information, such as the availability of public electric vehicle charging stations, while providing timely support and 24-hour emergency assistance to customers.

As we empower organizations across industries with intelligent, data-driven capabilities to make smarter business decisions and be part of an integrated ecosystem that delivers world-class visitor experiences, our collaborations with AOT, SKY, and EVme will enhance their ability to serve travelers with personalized, digital-first offerings powered by our secure and scalable open data cloud.

Source : Data Analytics Read More

AI-Based Analytics Are Changing the Future of Credit Cards

AI-Based Analytics Are Changing the Future of Credit Cards

Few industries have been untouched by changes in artificial intelligence technology. However, the financial industry has been affected more than most others. Therefore, it should not be surprising to hear that the global market for AI in the financial services sector was worth $9.45 billion in 2021 and is growing at a rate of 16.5% a year.

AI is revolutionizing the financial industry by automating many processes and providing new, meaningful insights that were once impossible. From automated trading to fraud detection, AI has become a powerful tool for financial institutions trying to increase efficiency and improve their bottom lines. There are also many benefits for customers since AI helps financial institutions lower their fees, improve their product and service offerings, and offer their services to a broader range of consumers, such as approving a higher percentage of loans to reliable borrowers by improving actuarial decision-making.

AI also allows credit card companies to take advantage of predictive analytics capabilities, which can help make better decisions and identify trends in the market. With its ability to quickly process large amounts of data, AI is becoming increasingly important in the financial industry. It can help banks reduce costs while improving customer service and accuracy. As such, it is changing the way we interact with our finances daily.

The credit card industry is one of the financial sectors most affected by advances in artificial intelligence. AI technology has significantly improved analytics capabilities, which has helped solve many problems in the credit card sector. It can lead to lower interest rates and unique product offerings, such as great new student credit cards.

Ai Creates New Analytics Capabilities for Credit Card Providers and Customers

Disha Singha of Analytics Insights reports that AI technology has significantly changed the state of the credit card industry. She reports that AI technology can improve credit systems and increase the number of people using them. Banks worldwide offer credit cards with varying interests, deals, and rewards and do everything that they can to make it easy for customers to their balances. Credit cards are becoming the most common payment method for many products and services due to their convenience and the fact that they are more secure than cash.

Singha reports that marketing analysts believe the global credit card market will be worth $103.06 billion in 2021, which means it is growing at an annual rate of 3%. Meanwhile, the global market for artificial intelligence technology is expected to be worth S$228.3 billion in 2026, growing at a yearly rate of 32.7%. Therefore, leveraging AI technology can help credit card companies offer higher-quality services and increase their growth targets.

AI has been used by banking and fintech firms to improve fraud detection on credit and debit cards. It also analyzes patterns of defaulters and cautions users from overspending. Predictive analytics is now being used to enhance how credit and debit cards are used in real-time. Some companies have started using this technology

This has many incredible benefits for both credit card providers and their consumers. Credit card fraud is a growing problem that can be costly for both consumers and businesses. BankRate reports that credit card fraud costs nearly $6 billion a year, which requires card providers to charge higher interest rates for customers.

However, advancements in technology have enabled companies to reduce the amount of credit card fraud, which can help lower interest rates for consumers. By using AI to detect fraudulent activity, businesses can protect their customers from being victims of scams and other fraudulent activities. This will result in lower interest rates for consumers since credit card providers will no longer need to cover the costs associated with fraudulent activity. Additionally, companies can use AI to create better customer experiences by providing personalized services that reduce the risk of fraud.

While AI analytics is most useful for dealing with fraud prevention, it can be utilized by credit card companies to solve various other business challenges. For example, AI software can be used by credit card companies and financial institutions to enhance customer service and create targeted marketing campaigns for customers. One example of using AI for customer service is the use of AI chatbots, which are integral to many modern marketing strategies.

What Credit Card Companies Are Going to Use Ai Technology to Offer Better Service and Improve Profitability

In recent years, the credit card industry has become much more reliant on artificial intelligence. Credit card companies are using AI to offer better service and improve profitability. Customers benefit from lower rates as well. Overall, high technology has been highly beneficial for the sector, which means credit card companies are likely to dedicate even more resources to it in the near future. 

Source : SmartData Collective Read More

Faster together: How Dun & Bradstreet datasets accelerate your real-time insights

Faster together: How Dun & Bradstreet datasets accelerate your real-time insights

At the third annual Google Data Cloud and AI Summit, we shared how data analytics and insights continue to be a key focus area for our customers and how we’re accelerating their data journeys through new product innovations and partner offerings. 

A big part of that is helping customers turn their data into insights faster using differentiated datasets from partners and integrating them into their AI/ML workflows. We recently announced our partnership with Crux to add over 1,000 new datasets on Analytics Hubto provide customers with access to a rich ecosystem of data to enrich first-party data assets and accelerate time to value and scalability with real time insights. There will be an initial focus on Financial Services, ESG, and Supply Chain, but we plan to increase this to 2,000 datasets later this year. These datasets are critical to our customers who execute highly process-intensive analytics workloads for trading, planning, and risk calculations.

An industry leader, Dun & Bradstreet, will also make much of its catalog available on Analytics Hub and listed on the Google Cloud Marketplace. This will enable customers to achieve the same benefits they receive for SaaS purchases in the Marketplace, including simplified procurement, consolidated billing, and financial incentives. 

“We are excited to build upon our ten-year relationship with Google Cloud and both companies’ commitments to deliver innovative opportunities to our mutual customers,” said Ginny Gomez, President, Dun & Bradstreet North America. “By making D&B datasets and products available in the Google Cloud Marketplace, we are making it easier for our customers to access and utilize this critical data, while also helping to provide a frictionless procurement process for customers to use their committed Google Cloud spend.”

When you purchase and subscribe to a dataset in the Google Cloud Marketplace, the data is immediately accessible via your BigQuery environment via Analytics Hub, without ingress, storage charges, or wait times. This allows your project teams to leverage Google Cloud AI/ML, BigQuery, and other third-party innovations to get valuable insights from datasets with ease. This is a commercial expansion on the hundreds of public and free datasets already listed in the Google Cloud Marketplace.

Analytics Hub is built on a decade of data sharing in BigQuery. Since 2010, BigQuery has supported always-live, in-place data sharing within an organization’s security perimeter, as well as data sharing across boundaries to external organizations. Analytics Hub makes the administration of sharing assets across boundaries even easier and more scalable, while retaining access to key capabilities of BigQuery like its time-tested sharing infrastructure, and built-in ML, real-time and geospatial analytics. 

These datasets on Marketplace also benefit from BigQuery’s advantages:

Scale: BigQuery is an exabyte-scale data warehouse that can handle even the most demanding data sharing needs. It grows with your data needs including auto scaling capabilities.

Security: BigQuery is built on Google’s secure infrastructure and offers various security features to protect your data. Data is always encrypted and PII data discovery services can be directly used to improve the security of the data.

Freshness: BigQuery data can be shared without moving it, this means you can join shared data with your own data with no need to implement expensive ETLs to bring the data from the providers

Cost-effectiveness: BigQuery provides different billing models so each workload can make use of the data providing the best price/performance.

At Google Cloud, we believe data and AI have the power to transform businesses and unlock the next wave of innovation. We are excited to share that customers can now procure new data assets on the Google Cloud Marketplace to accelerate their business decisions and drive new innovations. Customers interested in these datasets, can request a custom quote, or more information by clicking Contact Sales on the Marketplace product page and completing the inquiry form.

Source : Data Analytics Read More

AI In Marketing: Is It Worth the Hype?

AI In Marketing: Is It Worth the Hype?

AI has been the latest controversy, as it may seem to eliminate the need for human working positions in most professions. Artists, especially, were vocal about their concerns about the technology, considering the system’s merging way of creating art. ChatGPT doesn’t seem to think about the importance of artists after it tells a writer they don’t exist but after citing their previous work.

In most industries, AI is hyped, especially by companies looking to cut costs and minimise risks linked with occupational hazards. And although artificial intelligence is a useful tool that can extend one’s work and capabilities, it must not be included too much in certain sectors, such as marketing. 

AI should be used with caution and consideration for people’s hard work. At the same time, it can be more powerful next to people’s contribution, which is why we’ll discuss its pros and cons and analyse the hype.

What can AI bring to marketing? 

Marketing is always changing in a matter of strategies and methods in relationship with current world trends. Marketing is a mix of technology and human-based interactions, with customer support being the most important part of the advertisement and promotion process. During the steps of this development, marketers can recognise the number of repetitive actions that are time-consuming and don’t offer enough space for innovation and creativity.

Here comes AI technology, whose programmable system is capable of automating these tasks regarding the stage in the marketing process, be it data analysis or campaign personalisation. Artificial intelligence can indeed minimise human errors, such as grammar mistakes or incorrect data entry.

However, AI has some limitations in marketing

Of course, you can choose AI to do your whole marketing campaign, but customers will notice this change more likely and might not like it. In the marketing world, there’s nothing like human interaction to boost the company’s brand into fame, for which services like Savanta in Europe offer personalised research and consulting for B2B and B2C markets to create custom plans for each type of business.

Customers require human connection

Despite clients appreciating customer service chatbots for rapid inquiries, their disadvantage is the lack of emotions and empathy for people’s concerns. And it seems like customers are avoiding using the services or products of a company whose chatbot experience was unpleasant. Of course, that doesn’t mean not using AI in customer support, but your business shouldn’t rely on it fully because any kind of technology cannot overtake human interaction.

Even AI can be wrong sometimes

Although AI is believed to be always right, fast and efficient in providing answers and guidance, the truth gives us another perspective on its performance. For example, AI cannot fully notice different tones in people’s emotions, which is why performing a reliable sentiment analysis is challenging for the system. Intentions are also difficult to seize by technology, and this lack of human features can lead to inaccurate predictions and calculations, affecting data marketing.

AI requires huge sets of data for proper intervention

AI needs a lot of human intervention and considerable sets of data in order to contain all the knowledge required to perform on its own. The tool can only carry out tasks as long as it has the proper set of data with which it can make decisions ―otherwise, its efficiency is little to nothing. Therefore, your company needs someone who is trained in guiding AI by uploading the required data sets, which can also count as a tedious process.

AI can’t compete with human creativity 

No matter how advanced this technology is, it can’t beat human creations. AI is merging human-made music and art to create unique pieces, so its activity is still possible with the help of previous human genius outcomes. If you base your marketing strategy on AI too much, the content exposed may seem disconnected from reality, which customers won’t appreciate. After all, your target audience includes only human customers, so why have a fully robotic approach?

How to blend AI and Human resources

Your business can be successful by implementing a fusion between artificial intelligence and employees. This will ensure enough support for your marketing campaign while keeping up with trends and customer demands.

The first thing you can do is conduct an audit and find the areas in your business where AI can bring the most benefits. You can involve your employees or team in this process to brainstorm ideas on the possible contribution of AI to your company. AI can do many repetitive and tiring tasks, so pinpoint them and then focus your attention on the creative process your team can do.

The following step requires you to collect relevant data because AI needs a lot of it to assure you of efficiency. Introduce your AI system to the information you’ve achieved to manage, but be prepared to work continuously in this process because AI and human tasks need to be optimised

Your staff shouldn’t worry about being replaced because AI can’t be reliable without human sustaining and training. AI can be a big help when it comes to automating tasks and collecting data, which are more or less the best things you can use this technology for.

AI shouldn’t be used to make a profit but to enhance people’s creativity resources. In marketing and similar sectors where employees are demanded to use their capabilities, this technological advancement is necessary to help a company flourish with all its employees contributing.

Final take: what’s your opinion on using AI in marketing? 

AI is a pretty controversial topic for all professionals out there, and few of them had the courage to take control of the tool and use it solely to leverage creative outcomes. Still, others feel threatened, but technological advancements seem to eliminate the need for human services in most domains. However, AI is best used next to people’s contributions, which is why companies should adopt an inclusive approach and blend these two elements together to develop the economy and bring a new era to life. 

Source : SmartData Collective Read More

Tips to Protect Office 365 Systems from Data Breaches

Tips to Protect Office 365 Systems from Data Breaches

Data breaches have become frighteningly common in recent years. In 2022, over 422 million individuals were affected by over 1,800 data breaches.

According to the Identity Theft Resource Center’s 2022 Data Breach Report, the number of data breaches decreased slightly in 2022 from the number reported in 2021. However, this was due to a lag in the beginning of the year and the total number of Americans affected actually surged.

The number of Americans affected increased by 42% compared to the previous year. Reports were significantly lower in the first half of 2022, possibly due to the war in Ukraine and volatile cryptocurrency prices. However, the number increased in the second half of the year. Most of the compromises in both years were classified as data breaches, but in a few cases, data was exposed in other ways that didn’t involve a breach of computer systems.

All organizations need to take the right steps to protect against data breaches. These attacks will become even more common now that many hackers are using AI to deploy more cyberattacks. This entails shielding applications from cyberattacks.

Since so many companies use Office 365, they need to make sure the documents stored on it are safe from hackers.

Office 365 Products Need to Be Protected from Hackers

Office 365 has become the go-to productivity suite for businesses of all sizes. Moreover, cloud-based productivity software is defining the way modern companies do business. Not only does this software digitize the workplace, but it improves collaboration, productivity, and connectivity with customers and coworkers.

But like any other cloud-based service, users must learn to protect Office 365 to prevent data breaches and loss. Luckily, Office 365 offers a myriad of data protection features. But to fully understand the protection capabilities of Office 365, we must discuss the available data and threat protection features.

This guide will offer a deep dive into the best Office 365 protection tips to give you an overview of the suite’s capabilities and why implementation is essential for business continuity. But first, does Office 365 offer enough protection?

Does Office 365 Do Enough Protection?

While Office 365 offers built-in security features, it is essential to understand that these measures alone may not be sufficient to safeguard your business data. Microsoft provides a secure foundation, but it’s up to individual organizations to implement additional security measures to ensure comprehensive protection.

Because of that, and despite the attention to detail and professionalism, the native features aren’t enough for full-scale protection. That’s what forces many organizations to seek third-party Office 365 protection software specializing in data security.

With that said the tips in our guide will enhance the security of your Office 365 environment and minimize the risk of data breaches or unauthorized access.

6 Tips for Office 365 Security

Below are six tips we recommend for securing the Office 365 suite against threat actors, accidental deletion, and data loss prevention.

Enable Multi-Factor Authentication (MFA)

An efficient and easy way to enhance the security of each Office 365 tenant is to enable MFA. MFA adds an additional layer of protection that requires users to use an additional verification form. This is usually a code sent to their mobile device and their password.

By implementing MFA, even if an attacker obtains a user’s password, they would still need physical access to the user’s secondary authentication device, significantly reducing the risk of unauthorized access. As a result, MFA is a standout feature of most modern cloud services and one that Microsoft actively recommends you enable.

Log in to the Microsoft 365 admin center to disable legacy authentication and enable MFA.

Regularly Update and Patch Applications

Keeping your Office 365 applications up to date is crucial for maintaining security. Unfortunately, hackers will often target employees with outdated software in hopes of breaching the Office 365 suite. Luckily, Microsoft frequently releases security patches and updates to address vulnerabilities and protect against emerging threats.

Unfortunately, many employees don’t go ahead with updates and simply click the “remind me later” button whenever they log in to Office 365. That’s why you must encourage them to update when a new patch is available.

On the other hand, Office administrators can schedule these updates and eliminate employees needing to update Office 365 and its many applications manually. Establish a process to regularly review and apply these updates to protect your environment with the latest security fixes.

Implement Data Loss Prevention (DLP) Policies

Office 365 offers powerful data loss prevention features that allow you to define policies to prevent sensitive information from being shared or leaked outside your organization. These policies will alert users whenever sending confidential or sensitive data through email or other communication channels.

By configuring DLP policies, you can detect and mitigate potential data breaches, ensuring that confidential data remains secure. Subsequently, you could pair these policies with the powerful encryption features in Office 365. For example, one way to prevent sensitive data from being stolen or lost through email is to encrypt email communications.

Enforce Identity and Access Management

Office 365 Identity and Access Management (IAM) is crucial to securing and controlling access to Microsoft Office 365 resources and services. It encompasses a set of tools, policies, and practices that enable organizations to manage user identities, control access, and authorization processes, and enforce security policies within their Office 365 environment through Azure Active Directory.

The primary goal of Office 365 IAM is to ensure that the right individuals have appropriate access to the right resources while maintaining data confidentiality, integrity, and availability. In addition, IAM helps organizations prevent unauthorized access, protect sensitive information, and manage user identities efficiently to strengthen security capabilities and management.

Monitor and Analyze User Activity

Similarly to IAM, we can analyze and monitor user activity within Office 365 to see first-hand which data employees access. The monitoring features of Office 365 will allow you to track user actions, detect suspicious activity, and identify potential security incidents.

Similarly, you can also enable conditional access to specific types of data and apps and prevent unauthorized users from accessing them. Monitoring user activity gives you the benefit of limiting security threats and taking appropriate action to mitigate risks.

Backup Your Office 365 Data

Lastly, Office 365 offers several data backup capabilities for your entire digital workspace from accidental deletion, malicious attacks, or software errors. Unfortunately, any of these events can result in permanent data loss, which could have far greater consequences for your business.

However, Office 365 offers few automation tools for backing up sensitive data. This forces companies to go with dedicated third-party tools for all of their backup needs. On the other hand, these tools are expensive, and the features Office 365 offers are free of charge.

Office 365 Applications Need to Be Secured Against Data Breaches

Securing your Office 365 environment is paramount to protecting your digital workplace and valuable business data against the growing the threat of data breaches. While Office 365 provides a solid foundation, it is crucial to implement additional security measures tailored to your organization’s needs. By following the tips mentioned in our guide, you can significantly enhance the security of each Office 365 tenant and minimize the risk of data breaches or unauthorized access.

Source : SmartData Collective Read More