Blog

Managing the Looker ecosystem at scale with SRE and DevOps practices

Managing the Looker ecosystem at scale with SRE and DevOps practices

Many organizations struggle to create data-driven cultures where each employee is empowered to make decisions based on data. This is especially true for enterprises with a variety of systems and tools in use across different teams. If you are a leader, manager, or executive focused on how your team can leverage Google’s SRE practices or wider DevOps practices, definitely you are in the right place!

What do today’s enterprises or mature start-ups look like?

Today large organizations are often segmented into hundreds of small teams which are often working around data in the magnitude of several petabytes and in a wide variety of raw forms. ‘Working around data’ could mean any of the following: generating, facilitating, consuming, processing, visualizing or feeding back into the system. Due to a wide variety of responsibilities, the skill sets also vary to a large extent. Numerous people and teams work with data, with jobs that span the entire data ecosystem:

Centralizing data from raw sources and systemsMaintaining and transforming data in a warehouseManaging access controls and permissions for the dataModeling dataDoing ad-hoc data analysis and explorationBuilding visualizations and reports

Nevertheless, a common goal across all these teams is keeping services running and downstream customers happy. In other words, the organization might be divided internally, however, they all have the mission to leverage the data to make better business decisions. Hence, despite silos and different subgoals, destiny for all these teams is intertwined for the organization to thrive. To support such a diverse set of data sources and the teams supporting them, Looker supports over 60 dialects (input from a data source) and over 35 destinations (output to a new data source).

Below is a simplified* picture of how the Looker ecosystem is central to a data-rich organization.

Simplified* Looker ecosystem in a data-rich environment

*The picture hides the complexity of team(s) accountable for each data source. It also hides how a data source may have dependencies on other sources. Looker Marketplace can also play an important role in your ecosystem.

What role can DevOps and SRE practices play?

In the most ideal state, all these teams will be in harmony as a single-threaded organization with all the internal processes so smooth that everyone is empowered to experiment (i.e. fail, learn, iterate and repeat all the time). With increasing organizational complexities, it is incredibly challenging to achieve such a state because there will be overhead and misaligned priorities. This is where we look up to the guiding principles of DevOps and SRE practices. In case you are not familiar with Google SRE practices, here is a starting point. The core of DevOps and SRE practices are mature communication and collaboration practices. 

Let’s focus on the best practices which could help us with our Looker ecosystem.

Have joint goals. There should be some goals which are a shared responsibility across two or more teams. This helps establish a culture of psychological safety and transparency across teams.

Visualize how the data flows across the organization. This enables an understanding how each team plays their role and how to work with them better.

Agree on theGolden Signals (aka core metrics). These could mean data freshness, data accuracy, latency on centralized dashboards etc. These signals allow teams to set their error budgets and SLIs.

Agree on communication and collaboration methods that work across teams. 

Regular bidirectional communication modes – have shared Google Chat spaces/slack channels

Focus on artifacts such as jointly owned documentations pages, shared roadmap items, reusable tooling, etc. For example, System Activity Dashboards could be made available to all the relevant stakeholders and supplemented with notes tailored to your organization.

Set up regular forums where commonly discussed agenda items include major changes, expected downtime and postmortems around the core metrics. Among other agenda items, you could define/refine a common set of standards, for example centrally defined labels, group_labels, descriptions, etc. in the LookML to ensure there is a single terminology across the board.

Promote informal sharing opportunities such as lessons learned, TGIFs, Brown bag sessions, and shadowing opportunities. Learning and teaching have an immense impact on how teams evolve. Teams often become closer with side projects that are slightly outside of their usual day-to-day duties.

Have mutually agreed upon change management practices. Each team has dependencies so making changes may have an impact on other teams. Why not plan those changes systematically? For example, getting common standards across the Advance deploy mode.

Promote continuous improvements. Keep looking for better, faster, cost-optimized versions of something important to the teams.

Revisit your data flow. After every major reorganization, ensure that organizational change has not broken the established mechanisms.

despite silos and different subgoals, destiny for all these teams is intertwined for the organization to thrive.

Are you over-engineering?

There is a possibility that in the process of maturing the ecosystem, we may end up in an overly engineered system – we may unintentionally add toil to the environment. These are examples of toil that often stem from communication gaps. 

Meetings with no outcomes/action plans – This one is among the most common forms of toil, where the original intention of a meeting is no longer valid but the forum has not taken efforts to revisit their decision.

Unnecessary approvals – Being a single threaded team can often create unnecessary dependencies and your teams may lose the ability to make changes.

Unaligned maintenance windows – Changes across multiple teams may not be mutually exclusive hence if there is misalignment then it may create unforeseen impacts on the end user.

Fancy, but unnecessary tooling – Side projects, if not governed, may create unnecessary tooling which is not being used by the business. Collaborations are great when they solve real business problems, hence it is also required to refocus if the priorities are set right.

Gray areas – When you have a shared responsibility model, you also may end up in gray areas which are often gaps with no owner. This can lead to increased complexity in the long run. For example, having the flexibility to schedule content delivery still requires collaboration to reduce jobs with failures because it can impact the performance of your Looker instance.

Contradicting metrics – You may want to pay special attention to how teams are rewarded for internal metrics. For example, if a team focuses on accuracy of data and other one on freshness then at scale they may not align with one another.

Conclusion

To summarize, we learned how data is handled in large organizations with Looker at its heart unifying a universal semantic model. To handle large amounts of diverse data, teams need to start with aligned goals and commit to strong collaboration. We also learned how DevOps and SRE practices can guide us navigate through these complexities. Lastly, we looked at some side effects of excessively structured systems. To go forward from here, it is highly recommended to start with an analysis of how data flows under your scope and how mature the collaboration is across multiple teams.

Further reading and resources

Getting to know Looker – common use cases

Enterprise DevOps Guidebook

Know thy enemy: how to prioritize and communicate risks—CRE life lessons

How to get started with site reliability engineering (SRE)

Bring governance and trust to everyone with Looker’s universal semantic model

Related articles

How SREs analyze risks to evaluate SLOs | Google Cloud Blog

Best Practice: Create a Positive Experience for Looker Users

Best Practice: LookML Dos and Don’ts

Source : Data Analytics Read More

Top 5 Takeaways from Google Cloud’s Data Engineer Spotlight

Top 5 Takeaways from Google Cloud’s Data Engineer Spotlight

In the past decade, we have experienced an unprecedented growth in the volume of data that can be captured, recorded and stored.  In addition, the data comes in all shapes and forms, speeds and sources. This makes data accessibility, data accuracy, data compatibility, and data quality more complex than ever more. Which is why this year at our Data Engineer Spotlight, we wanted to bring together the Data Engineer Community to share important learning sessions and the newest innovations in Google Cloud. 

Did you miss out on the live sessions? Not to worry – all the content is available on demand

Interested in running a proof of concept using your own data? Sign up here forhands-on workshop opportunities.

Here are the five biggest areas to catch up on from Data Engineer Spotlight, with the first four takeaways written by a loyal member of our data community: Francisco Garcia, Founder of Direcly, a Google Cloud Partner

#1: The next generation of Dataflow was announced, including Dataflow Go (allowing engineers to write core Beam pipelines in Go, data scientists to contribute with Python transforms, and data engineers to import standard Java I/O connectors). The best part, it all works together in a single pipeline. Dataflow ML (deploy easy ML models with PyTorch, TensorFlow, or stickit-learn to an application in real time), and Dataflow Prime (removes the complexities of sizing and tuning so you don’t have to worry about machine types, enabling developers to be more productive). 

Read on the Google Cloud Blog: The next generation of Dataflow: Dataflow Prime, Dataflow Go, and Dataflow ML

Watch on Google Cloud YouTube: Build unified batch and streaming pipelines on popular ML frameworks 

#2: Dataform Preview was announced (Q3 2022), which helps build and operationalize scalable SQL pipelines in BigQuery. My personal favorite part is that it follows software engineering best practices (version control, testing, and documentation) when managing SQL. Also, no other skills beyond SQL are required. 

Dataform is now in private preview. Join the waitlist 

Watch on Google Cloud YouTube: Manage complex SQL workflows in BigQuery using Dataform CLI 

#3: Data Catalog is now part of Dataplex, centralizing security and unifying data governance across distributed data for intelligent data management, which can help governance at scale. Another great feature is that it has built-in AI-driven intelligence with data classification, quality, lineage, and lifecycle management.  

Read on the Google Cloud Blog: Streamline data management and governance with the unification of Data Catalog and Dataplex 

Watch on Google Cloud YouTube: Manage and govern distributed data with Dataplex

#4: A how-to on BigQuery Migration Services was covered, which offers end-to-end migrations to BigQuery, simplifying the process of moving data into the cloud and providing tools to help with key decisions. Organizations are now able to break down their data silos. One great feature is the ability to accelerate migrations with intelligent automated SQL translations.  

Read More on the Google Cloud Blog: How to migrate an on-premises data warehouse to BigQuery on Google Cloud 

Watch on Google Cloud YouTube: Data Warehouse migrations to BigQuery made easy with BigQuery Migration Service 

#5: The Google Cloud Hero Game was a gamified three hour Google Cloud training experience using hands-on labs to gain skills through interactive learning in a fun and educational environment. During the Data Engineer Spotlight, 50+ participants joined a live Google Meet call to play the Cloud Hero BigQuery Skills game, with the top 10 winners earning a copy of Visualizing Google Cloud by Priyanka Vergadia

If you missed the Cloud Hero game but still want to accelerate your Data Engineer career, get started toward becoming a Google Cloud certified Data Engineer with 30-days of free learning on Google Cloud Skills Boost. 

What was your biggest learning/takeaway from playing this Cloud Hero game?

It was brilliantly organized by the Cloud Analytics team at Google. The game day started off with the introduction and then from there we were introduced to the skills game. It takes a lot more than hands on to understand the concepts of BigQuery/SQL engine and I understood a lot more by doing labs multiple times. Top 10 winners receiving the Visualizing Google Cloud book was a bonus. – Shirish Kamath

Copy and pasting snippets of codes wins you competition. Just kidding. My biggest takeaway is that I get to explore capabilities of BigQuery that I may have not thought about before. – Ivan Yudhi

Would you recommend this game to your friends? If so, who would you recommend it to and why would you recommend it? 

Definitely, there is so much need for learning and awareness of such events and games around the world, as the need for Data Analysis through the cloud is increasing. A lot of my friends want to upskill themselves and these kinds of games can bring a lot of new opportunities for them. – Karan Kukreja

What was your favorite part about the Cloud Hero BigQuery Skills game? How did winning the Cloud Hero BigQuery Skills game make you feel?

The favorite part was working on BigQuery Labs enthusiastically to reach the expected results and meet the goals. Each lab of the game has different tasks and learning, so each next lab was giving me confidence for the next challenge. To finish at the top of the leaderboard in this game makes me feel very fortunate. It was like one of the biggest milestones I have achieved in 2022. – Sneha Kukreja

Source : Data Analytics Read More

Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?

Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility?

Bitcoin has experienced tremendous price volatility in recent months. Traders are struggling to make sense of these patterns. Fortunately, new predictive analytics algorithms can make this easier.

The financial industry is becoming more dependent on machine learning technology with each passing day. Last summer, a report by Deloitte showed that more CFOs are using predictive analytics technology. Machine learning has helped reduce man-hours, increase accuracy and minimize human bias.

One of the biggest reasons people in the financial profession are investing in predictive analytics is to anticipate future prices of financial assets, such as stocks and bonds. The evidence demonstrating the effectiveness of predictive analytics for forecasting prices of these securities has been relatively mixed. However, the same principles can be applied to nontraditional assets more effectively, because they are in less efficient markets.

Many experts are using predictive analytics technology to forecast the future value of bitcoin. This is becoming a more popular idea as bitcoin becomes more volatile.

Can Predictive Analytics Really Help with Forecasting Bitcoin Price Movements Amidst Huge Market Volatility?

Bitcoin’s price is notoriously volatile. In the past, the value of a single Bitcoin has swung wildly by as much as $1,000 in a matter of days. As the market matures and more investors enter the space, we are beginning to see increased stability in prices. However, given the nature of cryptocurrency markets, it is still quite possible for prices to fluctuate rapidly. The good news is that predictive analytics technology can reduce risk exposure for these investors. For further information explore quantum code.

Predictive analytics algorithms are more effective at anticipating price patterns when they are designed with the right variables. There are a number of factors that can contribute to sudden changes in Bitcoin’s price that machine learning developers need to incorporate into their pricing models. These include:

News events: Positive or negative news about Bitcoin can have a significant impact on its price. For example, when China announced crackdowns on cryptocurrency exchanges in 2017, the price of Bitcoin fell sharply.Market sentiment: Investor sentiment can also drive price movements. When investors are bullish on Bitcoin, prices tend to rise. Conversely, when sentiment is bearish, prices tend to fall.Technical factors: Technical factors such as changes in trading volume, or the introduction of new trading platforms can also impact prices.

Predictive analytics technology helps traders assess these factors. , Chhaya Vankhede, a machine learning expert and author at Medium, developed a predictive analytics algorithm to predict bitcoin prices using LSTM. This algorithm proved to be surprisingly effective at forecasting bitcoin prices. However, they were not close to perfect, so she wants that more improvements need to be made.

Vankhede isn’t the only one that has developed predictive analytics models to predict bitcoin prices. Pratikkumar Prajapati of Cornell University published a study demonstrating the opportunity to forecast prices based on social media and news stories. This can be used to create more effective machine learning algorithms for traders.

Of course, it’s important to remember that Bitcoin is still a relatively new asset, and its price is subject to significant volatility. Therefore, predictive analytics is still an imperfect tool for projecting prices. In the long run, however, many believe that Bitcoin will become more stable as it continues to gain mainstream adoption.

Bitcoin’s price volatility has been a major source of concern for investors and observers alike. While the digital currency has seen its fair share of ups and downs, its overall trend has been positive, with prices steadily climbing since its inception. However, this doesn’t mean that there isn’t room for improvement.

There are a few key factors that contribute to Bitcoin’s volatility. Firstly, it is still a relatively new asset class, meaning that there are less data to work with when trying to predict future price movements. Secondly, the majority of Bitcoin users are speculators, rather than people using it as a currency to buy goods and services. This means that they are more likely to sell when prices rise, in order to cash in on their profits, leading to sharp price declines.

Finally, there is the question of trust. While the underlying technology of Bitcoin is sound, there have been a number of high-profile hacks and scams involving exchanges and wallets. This has led to some people losing faith in the digital currency, causing them to sell their holdings, leading to further price drops.

Despite these concerns, it is important to remember that Bitcoin is still in its early days. As more people adopt it and use it for everyday transactions, its price is likely to become more stable. In the meantime, investors should be prepared for periods of volatility. They can still minimize the risks by using predictive analytics strategically.

Positive Impacts of Bitcoin’s Price Volatility

Increased global awareness and media coverageMore people are interested in buying BitcoinThe price of Bitcoin becomes more stable over timeMore merchants start to accept Bitcoin as a payment methodGovernmental and financial institutions take notice of BitcoinThe value of Bitcoin increases

Negative Impacts of Bitcoin’s Price Volatility

People may lose interest in Bitcoin if the price is too volatileMerchants may be hesitant to accept Bitcoin if the price is volatileGovernmental and financial institutions may be reluctant to use Bitcoin if the price is unstableThe value of Bitcoin may decrease if the price is too volatileinvestors may be hesitant to invest in Bitcoin if the price is volatileSpeculators may take advantage of Bitcoin’s price volatility.

Bitcoin’s price is notoriously volatile, and this has caused many to wonder about the future of digital currency. Some have even called for it to be regulated in order to stabilize its value. However, others believe that Bitcoin’s volatility is actually a good thing, as it allows the market to correct itself and find true price discovery.

Bitcoin’s price is highly volatile compared to other asset classes. This means that its price can fluctuate rapidly in response to news and events. For example, the price of bitcoin fell sharply following the Mt. Gox hack in 2014 and the collapse of the Silk Road marketplace in 2013.

Investors must be aware of this risk when considering investing in bitcoin. While the potential for large gains is there, so is the potential for large losses. Bitcoin should only be a small part of an investment portfolio.

Predictive Analytics Technology is Necessary for Bitcoin Traders Trying to Minimize their Risk

Predictive analytics technology is a gamechanger in the financial sector. Nontraditional investors such as bitcoin traders can use this technology to mitigate their risks and maximize returns.

The post Can Predictive Analytics Help Traders Navigate Bitcoin’s Volatility? appeared first on SmartData Collective.

Source : SmartData Collective Read More

The Huge Impact of Blockchain & Bitcoin Mining on the Planet

The Huge Impact of Blockchain & Bitcoin Mining on the Planet

Blockchain technology has changed our world in countless ways. Some of these changes have been beneficial, while others have been less helpful. For better or worse, we have to understand the impact it has had. One of the biggest changes the blockchain has created has been due to bitcoin mining.

Bitcoin Mining and the Blockchain Are Shaping Our World in Surprising Ways

The blockchain is having a huge impact on the global economy. One study predicts it will increase global GDP by nearly $1.8 trillion.

There are many important applications of blockchain technology. One of the most significant has been bitcoin mining.

Bitcoin mining is a process of verifying and adding transaction records to the public ledger called the blockchain. The blockchain is a distributed database that contains a record of all Bitcoin transactions that have ever been made. Every time a new transaction is made, it is added to the blockchain and verified by miners.

Miners are people or groups of people who use powerful computers to verify transactions and add them to the blockchain. Bitcoin miners are rewarded with newly created bitcoins and transaction fees for their work. Bitcode Prime provides more digital trading information.

Bitcoin mining has become increasingly popular over the years as the value of Bitcoin has surged. This wouldn’t have been possible without the blockchain. The blockchain plays a very important role in helping people buy bitcoin. As more people have started mining, the difficulty of finding new blocks has increased, making it more difficult for individual miners to earn rewards. However, large-scale miners have been able to find ways to keep their costs down and continue to profit from Bitcoin mining.

Bitcoin mining has had a large impact on the global economy. It has been estimated that the total energy consumption of Bitcoin mining could be as high as 7 gigawatts, which is equivalent to 0.21% of the world’s electricity consumption. This is because the blockchain is unfortunately not at all energy efficient. This estimate is based on a study that looked at the energy usage of different types of cryptocurrency mining.

The study found that Bitcoin mining is more energy-intensive than gold mining, and this difference is even larger when compared to other activities such as aluminum production or reserve banking. The large-scale nature of Bitcoin mining has led some experts to suggest that it could have a significant impact on the environment.

A recent report by the World Economic Forum estimated that the electricity used for Bitcoin mining could power all of the homes in the United Kingdom. This is based on the current rate of energy consumption and the number of homes in the country. The report also suggested that if the trend continues, Bitcoin mining could eventually use more electricity than is currently produced by renewable energy sources. The blockchain is unlikely to become more energy efficient without some major improvements. This can be a big problem as AI technology makes bitcoin even more popular in the UK.

The impact of Bitcoin mining on the environment has been a controversial topic. Some argue that it is a necessary evil that is needed to power the global economy, while others believe that it is a wasteful activity that should be banned. However, there is no denying that Bitcoin mining has had a significant impact on the world’s energy consumption and carbon footprint.

Bitcoin mining is a process that helps the Bitcoin network secure and validates transactions. It also creates new bitcoins in each block, similar to how a central bank prints new money. Miners are rewarded with bitcoin for their work verifying and committing transactions to the blockchain.

Bitcoin mining has become increasingly competitive as more people look to get involved in the cryptocurrency market. As a result, miners have had to invest more money in hardware and electricity costs in order to keep up with the competition.

This has led to some concerns about the environmental impact of Bitcoin mining, as the process requires a lot of energy. In particular, critics have pointed to the fact that most Bitcoin mining takes place in China, which relies heavily on coal-fired power plants.

However, it is worth noting that the vast majority of Bitcoin miners are using renewable energy sources. In fact, a recent study found that 78.79% of Bitcoin mining is powered by renewable energy.

This indicates that the environmental impact of Bitcoin mining is not as significant as some critics have claimed. Nevertheless, it is still important to keep an eye on the energy consumption of the Bitcoin network and ensure that steps are taken to improve efficiency where possible.

The 21st century has seen some incredible technological advances, and none more so than in the world of finance. The rise of digital currencies like Bitcoin has been nothing short of meteoric, and it doesn’t show any signs of slowing down. Bitcoin mining is the process by which new Bitcoins are created and transactions are verified on the blockchain. It’s a critical part of the Bitcoin ecosystem, but it comes with an environmental cost.

Bitcoin mining consumes a lot of energy. The exact amount is unknown, but it’s estimated that it could be as high as 7 gigawatts, which is about as much as the entire country of Bulgaria. This electricity consumption is contributing to climate change and damaging our planet.

Blockchain and Bitcoin Mining Have a Huge Impact on the Environment

There are a few ways to reduce the environmental impact of blockchain and Bitcoin mining. One is to use renewable energy sources, such as solar or wind power. Another is to use more efficient mining hardware. But the most important thing we can do is to raise awareness of the issue and work together to find a solution.

The post The Huge Impact of Blockchain & Bitcoin Mining on the Planet appeared first on SmartData Collective.

Source : SmartData Collective Read More

No pipelines needed. Stream data with Pub/Sub direct to BigQuery

No pipelines needed. Stream data with Pub/Sub direct to BigQuery

Pub/Sub’s ingestion of data into BigQuery can be critical to making your latest business data immediately available for analysis. Until today, you had to create intermediate Dataflow jobs before your data could be ingested into BigQuery with the proper schema. While Dataflow pipelines (including ones built with Dataflow Templates) get the job done well, sometimes they can be more than what is needed for use cases that simply require raw data with no transformation to be exported to BigQuery.

Starting today, you no longer have to write or run your own pipelines for data ingestion from Pub/Sub into BigQuery. We are introducing a new type of Pub/Sub subscription called a “BigQuery subscription” that writes directly from Cloud Pub/Sub to BigQuery. This new extract, load, and transform (ELT) path will be able to simplify your event-driven architecture. For Pub/Sub messages where advanced preload transformations or data processing before landing data in BigQuery (such as masking PII) is necessary, we still recommend going through Dataflow.

Get started by creating a new BigQuery subscription that is associated with a Pub/Sub topic. You will need to designate an existing BigQuery table for this subscription. Note that the table schema must adhere to certain compatibility requirements. By taking advantage of Pub/Sub topic schemas, you have the option of writing Pub/Sub messages to BigQuery tables with compatible schemas. If schema is not enabled for your topic, messages will be written to BigQuery as bytes or strings. After the creation of the BigQuery subscription, messages will now be directly ingested into BigQuery.

Better yet, you no longer need to pay for data ingestion into BigQuery when using this new direct method. You only pay for the Pub/Sub you use. Ingestion from Pub/Sub’s BigQuery subscription into BigQuery costs $50/TiB based on read (subscribe throughput) from the subscription. This is a simpler and cheaper billing experience compared to the alternative path via Dataflow pipeline where you would be paying for the Pub/Sub read, Dataflow job, and BigQuery data ingestion. See the pricing page for details. 

To get started, you can read more about Pub/Sub’s BigQuery subscription or simply create a new BigQuery subscription for a topic using Cloud Console or the gcloud CLI.

Source : Data Analytics Read More

Use R to train and deploy machine learning models on Vertex AI

Use R to train and deploy machine learning models on Vertex AI

R is one of the most widely used programming languages for statistical computing and machine learning. Many data scientists love it, especially for the rich world of packages from tidyverse, an opinionated collection of R packages for data science. Besides the tidyverse, there are over 18,000 open-source packages on CRAN, the package repository for R. RStudio, available as desktop version or on theGoogle Cloud Marketplace, is a popular Integrated Development Environment (IDE) used by data professionals for visualization and machine learning model development.

Once a model has been built successfully, a recurring question among data scientists is: “How do I deploy models written in the R language to production in a scalable, reliable and low-maintenance way?”

In this blog post, you will walk through how to use Google Vertex AI to train and deploy  enterprise-grade machine learning models built with R. 

Overview

Managing machine learning models on Vertex AI can be done in a variety of ways, including using the User Interface of the Google Cloud Console, API calls, or the Vertex AI SDK for Python

Since many R users prefer to interact with Vertex AI from RStudio programmatically, you will interact with Vertex AI through the Vertex AI SDK via the reticulate package. 

Vertex AI provides pre-built Docker containers for model training and serving predictions for models written in tensorflow, scikit-learn and xgboost. For R, you build a container yourself, derived from Google Cloud Deep Learning Containers for R.

Models on Vertex AI can be created in two ways:

Train a model locally and import it as a custom model into Vertex AI Model Registry, from where it can be deployed to an endpoint for serving predictions.

Create a TrainingPipeline that runs a CustomJob and imports the resulting artifacts as a Model.

In this blog post, you will use the second method and train a model directly in Vertex AI since this allows us to automate the model creation process at a later stage while also supporting distributed hyperparameter optimization.

The process of creating and managing R models in Vertex AI comprises the following steps:

Enable Google Cloud Platform (GCP) APIs and set up the local environment

Create custom R scripts for training and serving

Create a Docker container that supports training and serving R models with Cloud Build and Container Registry 

Train a model using Vertex AI Training and upload the artifact to Google Cloud Storage

Create a model endpoint on Vertex AI Prediction Endpoint and deploy the model to serve online prediction requests

Make online prediction

Fig 1.0 (source)

Dataset

To showcase this process, you train a simple Random Forest model to predict housing prices on the California housing data set. The data contains information from the 1990 California census. The data set is publicly available from Google Cloud Storage at gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv

The Random Forest regressor model will predict a median housing price, given a longitude and latitude along with data from the corresponding census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).

Environment Setup

This blog post assumes that you are either using Vertex AI Workbench with an R kernel or RStudio. Your environment should include the following requirements:

The Google Cloud SDK

Git

R

Python 3

Virtualenv

To execute shell commands, define a helper function:

code_block[StructValue([(u’code’, u’library(glue)rnlibrary(IRdisplay)rnrnsh <- function(cmd, args = c(), intern = FALSE) {rn if (is.null(args)) {rn cmd <- glue(cmd)rn s <- strsplit(cmd, ” “)[[1]]rn cmd <- s[1]rn args <- s[2:length(s)]rn }rn ret <- system2(cmd, args, stdout = TRUE, stderr = TRUE)rn if (“errmsg” %in% attributes(attributes(ret))$names) cat(attr(ret, “errmsg”), “\n”)rn if (intern) return(ret) else cat(paste(ret, collapse = “\n”))rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eadaafa0290>)])]

You should also install a few R packages and update the SDK for Vertex AI:

code_block[StructValue([(u’code’, u’install.packages(c(“reticulate”, “glue”))rnsh(“pip install –upgrade google-cloud-aiplatform”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a419d0>)])]

Next, you define variables to support the training and deployment process, namely:

PROJECT_ID: Your Google Cloud Platform Project ID

REGION: Currently, the regions us-central1, europe-west4, and asia-east1 are supported for Vertex AI; it is recommended that you choose the region closest to you

BUCKET_URI: The staging bucket where all the data associated with your dataset and model resources are stored

DOCKER_REPO: The Docker repository name to store container artifacts

IMAGE_NAME: The name of the container image

IMAGE_TAG: The image tag that Vertex AI will use

IMAGE_URI: The complete URI of the container image

code_block[StructValue([(u’code’, u’PROJECT_ID <- “YOUR_PROJECT_ID”rnREGION <- “us-central1″rnBUCKET_URI <- glue(“gs://{PROJECT_ID}-vertex-r”)rnDOCKER_REPO <- “vertex-r”rnIMAGE_NAME <- “vertex-r”rnIMAGE_TAG <- “latest”rnIMAGE_URI <- glue(“{REGION}-docker.pkg.dev/{PROJECT_ID}/{DOCKER_REPO}/{IMAGE_NAME}:{IMAGE_TAG}”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a41550>)])]

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

code_block[StructValue([(u’code’, u’sh(“gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a41d90>)])]

Next, you import and initialize the reticulate R package to interface with the Vertex AI SDK, which is written in Python.

code_block[StructValue([(u’code’, u’library(reticulate)rnlibrary(glue)rnuse_python(Sys.which(“python3”))rnrnaiplatform <- import(“google.cloud.aiplatform”)rnaiplatform$init(project = PROJECT_ID, location = REGION, staging_bucket = BUCKET_URI)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a41410>)])]

Create Docker container image for training and serving R models

The docker file for your custom container is built on top of the Deep Learning container — the same container that is also used for Vertex AI Workbench. In addition, you add two R scripts for model training and serving, respectively.

Before creating such a container, you enable Artifact Registry and configure Docker to authenticate requests to it in your region.

code_block[StructValue([(u’code’, u’sh(“gcloud artifacts repositories create {DOCKER_REPO} –repository-format=docker –location={REGION} –description=\”Docker repository\””)rnsh(“gcloud auth configure-docker {REGION}-docker.pkg.dev –quiet”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a41d50>)])]

Next, create a Dockerfile.

code_block[StructValue([(u’code’, u’# filename: Dockerfile – container specifications for using R in Vertex AIrnFROM gcr.io/deeplearning-platform-release/r-cpu.4-1:latestrnrnWORKDIR /rootrnrnCOPY train.R /root/train.RrnCOPY serve.R /root/serve.Rrnrn# Install FortranrnRUN apt-get updaternRUN apt-get install gfortran -yyrnrn# Install R packagesrnRUN Rscript -e “install.packages(‘plumber’)”rnRUN Rscript -e “install.packages(‘randomForest’)”rnrnEXPOSE 8080′), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93a41450>)])]

Next, create the file train.R, which is used to train your R model. The script trains a randomForest model on the California Housing dataset. Vertex AI sets environment variables that you can utilize, and since this script uses a Vertex AI managed dataset, data splits are performed by Vertex AI and the script receives environment variables pointing to the training, test, and validation sets. The trained model artifacts are then stored in your Cloud Storage bucket.

code_block[StructValue([(u’code’, u’#!/usr/bin/env Rscriptrn# filename: train.R – train a Random Forest model on Vertex AI Managed Datasetrnlibrary(tidyverse)rnlibrary(data.table)rnlibrary(randomForest)rnSys.getenv()rnrn# The GCP Project IDrnproject_id <- Sys.getenv(“CLOUD_ML_PROJECT_ID”)rnrn# The GCP Regionrnlocation <- Sys.getenv(“CLOUD_ML_REGION”)rnrn# The Cloud Storage URI to upload the trained model artifact tornmodel_dir <- Sys.getenv(“AIP_MODEL_DIR”)rnrn# Next, you create directories to download our training, validation, and test set into.rndir.create(“training”)rndir.create(“validation”)rndir.create(“test”)rnrn# You download the Vertex AI managed data sets into the container environment locally.rnsystem2(“gsutil”, c(“cp”, Sys.getenv(“AIP_TRAINING_DATA_URI”), “training/”))rnsystem2(“gsutil”, c(“cp”, Sys.getenv(“AIP_VALIDATION_DATA_URI”), “validation/”))rnsystem2(“gsutil”, c(“cp”, Sys.getenv(“AIP_TEST_DATA_URI”), “test/”))rnrn# For each data set, you may receive one or more CSV files that you will read into data frames.rntraining_df <- list.files(“training”, full.names = TRUE) %>% map_df(~fread(.))rnvalidation_df <- list.files(“validation”, full.names = TRUE) %>% map_df(~fread(.))rntest_df <- list.files(“test”, full.names = TRUE) %>% map_df(~fread(.))rnrnprint(“Starting Model Training”)rnrf <- randomForest(median_house_value ~ ., data=training_df, ntree=100)rnrfrnrnsaveRDS(rf, “rf.rds”)rnsystem2(“gsutil”, c(“cp”, “rf.rds”, model_dir))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead920dc110>)])]

Next, create the file serve.R, which is used for serving your R model. The script downloads the model artifact from Cloud Storage, loads the model artifacts, and listens for prediction requests on port 8080. You have several environment variables for the prediction service at your disposal, including:

AIP_HEALTH_ROUTE: HTTP path on the container that AI Platform Prediction sends health checks to.

AIP_PREDICT_ROUTE: HTTP path on the container that AI Platform Prediction forwards prediction requests to.

code_block[StructValue([(u’code’, u’#!/usr/bin/env Rscriptrn# filename: serve.R – serve predictions from a Random Forest modelrnSys.getenv()rnlibrary(plumber)rnrnsystem2(“gsutil”, c(“cp”, “-r”, Sys.getenv(“AIP_STORAGE_URI”), “.”))rnsystem(“du -a .”)rnrnrf <- readRDS(“artifacts/rf.rds”)rnlibrary(randomForest)rnrnpredict_route <- function(req, res) {rn print(“Handling prediction request”)rn df <- as.data.frame(req$body$instances)rn preds <- predict(rf, df)rn return(list(predictions=preds))rn}rnrnprint(“Staring Serving”)rnrnpr() %>%rn pr_get(Sys.getenv(“AIP_HEALTH_ROUTE”), function() “OK”) %>%rn pr_post(Sys.getenv(“AIP_PREDICT_ROUTE”), predict_route) %>%rn pr_run(host = “0.0.0.0”, port=as.integer(Sys.getenv(“AIP_HTTP_PORT”, 8080)))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead920dcf10>)])]

Next, you build the Docker container image on Cloud Build — the serverless CI/CD platform.  Building the Docker container image may take 10 to 15 minutes.

code_block[StructValue([(u’code’, u’sh(“gcloud builds submit –region={REGION} –tag={IMAGE_URI} –timeout=1h”)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead920dcc90>)])]

Create Vertex AI Managed Dataset

You create a Vertex AI Managed Dataset to have Vertex AI take care of the data set split. This is optional, and alternatively you may want to pass the URI to the data set via environment variables.

code_block[StructValue([(u’code’, u’data_uri <- “gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv”rnrndataset <- aiplatform$TabularDataset$create(rn display_name = “California Housing Dataset”,rn gcs_source = data_urirn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead920dc650>)])]

The next screenshot shows the newly created Vertex AI Managed dataset in Cloud Console.

Train R Model on Vertex AI

The custom training job wraps the training process by creating an instance of your container image and executing train.R for model training and serve.R for model serving.

Note: You use the same custom container for both training and serving.

code_block[StructValue([(u’code’, u’job <- aiplatform$CustomContainerTrainingJob(rn display_name = “vertex-r”,rn container_uri = IMAGE_URI,rn command = c(“Rscript”, “train.R”),rn model_serving_container_command = c(“Rscript”, “serve.R”),rn model_serving_container_image_uri = IMAGE_URIrn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93040050>)])]

To train the model, you call the method run(), with a machine type that is sufficient in resources to train a machine learning model on your dataset. For this tutorial, you use a n1-standard-4 VM instance.

code_block[StructValue([(u’code’, u’model <- job$run(rn dataset=dataset,rn model_display_name = “vertex-r-model”,rn machine_type = “n1-standard-4″rn)rnrnmodel$display_namernmodel$resource_namernmodel$uri’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead930402d0>)])]

The model is now being trained, and you can watch the progress in the Vertex AI Console.

Provision an Endpoint resource and deploy a Model

You create an Endpoint resource using the Endpoint.create() method. At a minimum, you specify the display name for the endpoint. Optionally, you can specify the project and location (region); otherwise the settings are inherited by the values you set when you initialized the Vertex AI SDK with the init() method.

In this example, the following parameters are specified:

display_name: A human readable name for the Endpoint resource.

project: Your project ID.

location: Your region.

labels: (optional) User defined metadata for the Endpoint in the form of key/value pairs.

This method returns an Endpoint object.

code_block[StructValue([(u’code’, u’endpoint <- aiplatform$Endpoint$create(rn display_name = “California Housing Endpoint”,rn project = PROJECT_ID,rn location = REGIONrn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93040ed0>)])]

You can deploy one of more Vertex AI Model resource instances to the same endpoint. Each Vertex AI Model resource that is deployed will have its own deployment container for the serving binary.

Next, you deploy the Vertex AI Model resource to a Vertex AI Endpoint resource. The Vertex AI Model resource already has defined for it the deployment container image. To deploy, you specify the following additional configuration settings:

The machine type.

The (if any) type and number of GPUs.

Static, manual or auto-scaling of VM instances.

In this example, you deploy the model with the minimal amount of specified parameters, as follows:

model: The Model resource.

deployed_model_displayed_name: The human readable name for the deployed model instance.

machine_type: The machine type for each VM instance.

Due to the requirements to provision the resource, this may take up to a few minutes.

Note: For this example, you specified the R deployment container in the previous step of uploading the model artifacts to a Vertex AI Model resource.

code_block[StructValue([(u’code’, u’model$deploy(endpoint = endpoint, machine_type = “n1-standard-4″)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93040b50>)])]

The model is now being deployed to the endpoint, and you can see the result in the Vertex AI Console.

Make predictions using newly created Endpoint

Finally, you create some example data to test making a prediction request to your deployed model. You use five JSON-encoded example data points (without the label median_house_value) from the original data file in data_uri. Finally, you make a prediction request with your example data. In this example, you use the REST API (e.g., Curl) to make the prediction request.

code_block[StructValue([(u’code’, u’library(jsonlite)rndf <- read.csv(text=sh(“gsutil cat {data_uri}”, intern = TRUE))rnhead(df, 5)rnrninstances <- list(instances=head(df[, names(df) != “median_house_value”], 5))rninstancesrnrnjson_instances <- toJSON(instances)rnurl <- glue(“https://{REGION}-aiplatform.googleapis.com/v1/{endpoint$resource_name}:predict”)rnaccess_token <- sh(“gcloud auth print-access-token”, intern = TRUE)rnrnsh(rn “curl”,rn c(“–tr-encoding”,rn “-s”,rn “-X POST”,rn glue(“-H ‘Authorization: Bearer {access_token}'”),rn “-H ‘Content-Type: application/json'”,rn url,rn glue(“-d {json_instances}”)rn ),rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93040b90>)])]

The endpoint now returns five predictions in the same order the examples were sent.

Cleanup

To clean up all Google Cloud resources used in this project, you can delete the Google Cloud project you used for the tutorial or delete the created resources.

code_block[StructValue([(u’code’, u’endpoint$undeploy_all()rnendpoint$delete()rndataset$delete()rnmodel$delete()rnjob$delete()’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ead93078150>)])]

Summary

In this blog post, you have gone through the necessary steps to train and deploy an R model to Vertex AI. For easier reproducibility, you can refer to this Notebook on GitHub

Acknowledgements

This blog post received contributions from various people. In particular, we would like to thank  Rajesh Thallam for strategic and technical oversight, Andrew Ferlitsch for technical guidance, explanations, and code reviews, and Yuriy Babenko for reviews.

Source : Data Analytics Read More

5 Setmore Alternatives that Use Big Data to Manage Appointments

5 Setmore Alternatives that Use Big Data to Manage Appointments

Big data technology has helped businesses improve efficiency in many important ways. Many companies are using big data to streamline many different aspects of their business. They use data analytics tools to improve financial management,

One of the ways that many companies are using big data is to improve the way that they manage appointments. They can use data-driven appointment management tools to make this process easier than ever. This is one of the biggest benefits of big data for customer service.

Data-Driven Booking Tools Make Appointment Management Easier than Ever

Nowadays, online appointment scheduling systems are becoming way too popular for many companies to handle . The Covid-19 situation changed the way many people sought to interact with businesses, so digital platforms have become a big deal. Businesses rely on these digital platforms to complete certain tasks with minimal frustration, which is why the online appointment scheduling platforms have become an important part of today’s world.

Similarly, Setmore is setting more of an example with its works. But there are undoubtedly other alternatives to setmore which work just in a similar way. Here are the five best setmore alternatives that use big data technology to aid with the booking process.

1.      Book Like a Boss

Book like a boss is one of the best online appointment scheduling tools that works as an all-in-one system. It works very well, due to the highly sophisticated data analytics and data storage algorithms that it depends on. You can save time with the use of this application and can organize meetings without any hustle. With the help of this software, you can send links to your BLAB page to get the available prospects so that you can get paid.

If you use Book like a boss, you will be able to import all the calendar entries and can even submit them directly through the software to the calendar. It can store substantial amounts of data on your customers and events. The app allows the feature of messages between the customers and the users. 

2.      Vyte. in

If you are searching for a cloud-based, data-driven software alternative to setmore, then you will have no other better application than vyte. in. You will be able to plan group meetings or one-to-one meetings with the help of vyte.in if you are a part of any small or medium-sized industry or organization.

It helps connect with the help of various IT services, digital learning, and marketing. It uses sophisticated data analytics algorithms to offer valuable features such as double booking rejection, automatic sync of time zones, and others. If you are using vyte.in you will get the benefit of integration of vyte.in with various calendar programs such as Google Calendar, Microsoft Outlook, and others. 

3.      Traft

If you are looking for software which offers great business-to-business features, then there is nothing better than traft. With the help of craft, you can do tasks very easily, due to its wonderful data analytics and storage capabilities. Your clients will be able to schedule appointments even without even speaking with any of the workers.

The application always aims for businesses that are facing more towards the customer, such as sports arenas, hairdressers, gyms, and others. With the help of traft you can even hire freelancers from different areas. Freelancers such as photographers, housekeepers, and others. It has been claimed by traft that one can even use the application to enhance retention rates and conversions. 

4.      Sidekick Ai

Have you been searching for an AI-based application? Then, say no more. Sidekick Ai is known to be one of the best AI-based applications with some of the most powerful data analytics features available. It is such an application that helps to sync with Gmail and outlook to take the hustle of back and forth scheduling procedures. Isn’t it better? Thus, it helps take down the workload on you, allowing you to sit back and relax. You may call Sidekick Ai as your personal assistant.

That is how finely this application works. If you ever forward any meeting to Sidekick Ai without any hustle, it will book the meeting for you. That is a straightforward procedure, isn’t it? Customization is even available here in Sidekick Ai. You can prioritize contacts and even meet in your favorite places. But one thing you need to be careful about is that you will be offered only limited slots instead of being disclosed to a full schedule. 

5.      TuCalendi

With the help of TuCalendi, you will be able to integrate your calendar on the website. You will be able to plan various events and appointments in no time if you use TuCalendi. You can customize the booking forms and even interactive widgets—no worries about languages. You will be able to use various languages with the help of this data-driven software application. 

Use Data Analytics Tools to Optimize Your Scheduling and Booking Tasks

These days, appointment scheduling platforms have become a significant part of managing businesses. Data analytics tools have made them more effective than ever.

They have not only eased the struggles of many business owners but have also made the work more compatible. Work has become less time consuming and what not. So, if you have still not used one of these powerful data-driven booking applications, now is the time! Get started today!

The post 5 Setmore Alternatives that Use Big Data to Manage Appointments appeared first on SmartData Collective.

Source : SmartData Collective Read More

5 Tips to Improve the Data Security of Software Applications

5 Tips to Improve the Data Security of Software Applications

In today’s world, data is increasingly being shared and stored electronically. Therefore, the need to protect data from unauthorized access or theft is more important than ever.

The of data breaches cannot be overstated. Over 440 million data records were exposed in data breaches in 2018 alone. This figure is growing as more people work from home and don’t take adequate precautions.

Data security of a software application is the set of security measures implemented to prevent unauthorized access while protecting the data from being lost or corrupted.

Here are a few tips for ensuring data security in software applications.

Using a Strong Password

As per a recent study, most users are not aware of the notion that their credentials have been leaked and misused. It can be a severe problem because if any hacker gets hold of your password, your sensitive data can be easily accessed and can be used to commit fraud or identity theft. Hence, it is essential to use a strong password for the data security of the software application.

A weak password can easily be guessed or cracked, leaving your data vulnerable to attack. A strong password is a critical part of keeping your data secure.

Creating a strong password is simple but requires some thought and effort. Use a mix of upper and lowercase letters, numbers and symbols in your password. It is also essential to keep your password confidential and not to share it with anyone. You can also use two-factor authentication to provide an extra layer of security.

Storing Data Securely on Servers

With each passing day, data security standards are evolving with the software’s needs and requirements. As software increasingly relies on data stored on servers, the issue of data security becomes more critical.

Server-based data storage can be more convenient than storing data locally, as you can access your data from anywhere in the world. However, this also means you rely on the server’s security. If the server is hacked, your data could be compromised.

Businesses should choose a reputable and reliable hosting provider to secure the data on servers. The hosting provider should have robust security measures to protect the servers from external threats. Second, businesses should encrypt their data before storing it on the server. It will make it hard for hackers to access the data if they gain access to the server.

Hence, if you store sensitive data on a server, you must take extra precautions to protect it.

Getting an Application Security Audit

One way of ensuring data security is to perform code security auditing. It examines an application’s source code to identify potential security risks. In this regard, automated tools can provide a more comprehensive code analysis.

When auditing code, security experts will look for potential vulnerabilities that attackers could exploit. These vulnerabilities can come from various sources, including coding errors, poor design choices, or third-party components with known security issues. Once potential vulnerabilities are identified, they are mitigated through making changes to the code or using security controls such as access control mechanisms or encryption.

Hence, regular code security audits ensure data security in software applications. By identifying and addressing potential security risks early, organizations can reduce the likelihood of attacks and minimize the impact of an attack.

Backing Up Your Data

A significant percentage of people are working remotely now. Apart from other essential practices for data security, it is more important than ever to ensure that your data is secure. Moreover, as a software application developer, keeping your data secure by shielding it against hardware failures, software crashes and even malicious attacks is essential.

We can never be too careful regarding our most important files and documents. Backing up our data is one of the best ways to protect ourselves from losing everything in the event of a computer crash or other unforeseen disaster.

You can back up your data using different cloud-based storage services. These services will allow you to store your files online and access them anywhere. Another option is to store data on an external hard drive or USB flash drive. This method will require you to physically connect the storage device to your computer.

You can use an external hard drive, a cloud storage service, or even a simple USB drive to secure the data. Whichever method you choose, make a schedule to back up your data regularly. If something happens to your computer or software, you will still have a copy of all your files through a regular data backup.

Monitoring Activity

Monitoring activity is vital for the data security of the application for numerous reasons. First, understanding what users are doing within the application makes it possible to identify potential misuse or malicious activity. Secondly, tracking user activity can help to prevent data breaches by quickly identifying and responding to unauthorized access. Finally, monitoring activity can also assist in troubleshooting issues with the application or identifying areas for improvement.

To properly implement real-time security monitoring and protection, it is crucial to understand how these systems work. Security professionals typically use a combination of hardware and software to monitor for potential threats. The hardware components of a security system can include sensors, cameras and other devices designed to detect suspicious activity. The software components usually have a database of known threats and an analytics engine to identify patterns that may indicate a potential attack.

Various commercial and open-source monitoring tools are available, each with strengths and weaknesses. It is essential to select one according to your requirements. Some factors include the size of your application, the type of data you need to protect and your budget. Once you have chosen a tool, you must set up some basic parameters. For example, you must decide how often you want the tool to scan your application for potential threats.

Conclusion

The importance of data security for software applications cannot be understated. It is a critical issue for any software application. There are many potential threats to the data stored in a software application. These can come from inside and outside the organization. Organizations should consider implementing monitoring solutions that fit their specific needs. They must take steps to protect their data from these threats by implementing security controls.

The post 5 Tips to Improve the Data Security of Software Applications appeared first on SmartData Collective.

Source : SmartData Collective Read More

There Many Amazing Benefits of VR in Education

There Many Amazing Benefits of VR in Education

Virtual reality is a powerful technology that is changing the future of our world. Research from eMarketer shows that there are 57.4 million VR users in the United States.

One of the fields that is being shaped by VR is education. VR technology provides rich simulations that mimic the sights and sounds of real situations, which can be invaluable in the classroom. Though most prominent as a gaming apparatus, it has quickly emerged as a means of training professionals and educating students in an environment that is at once extremely stimulating and very informative.

While this tech may need to wait before it finds its way into classrooms everywhere, it does hold significant potential even for the students of today.

Benefits of VR in Education

Virtual reality is driving a number of disruptive changes. In this article, we take a look at the benefits of VR in teaching.

Difficult Training

Nursing or medical school are good examples of disciplines that can benefit enormously from the introduction of virtual or augmented reality in the classroom. While most training for future medical caregivers happens in the hospital, VR provides a low-stakes environment in which emergency scenarios can play out.

It’s already true that nursing students have a safety net while they are still being educated—this coming in the form of a floor of other doctors and nurses who are typically able to help them in their initial patient interactions. VR probably will not be replacing these experiences, but it can be used to better prepare future nurses for them.

Similar VR applications could be used for police training—or in short any classroom that needs to simulate situations with great immediacy.

Important Context

Virtual reality can be used to provide valuable context to traditional school lessons. Students who are learning about marine biology can use VR to get a convincing look at ocean scenes. Students who are learning about life in a different country could use VR to take a walk through its market.

It’s well documented that students learn well through multiple forms of media. Lessons are reinforced through repetition, particularly when points can be made in different ways.

By combining read lessons with captivating sounds and visuals, students will be more likely to remember what they have been taught. They may even take a deeper interest in learning more.

Distraction-Free Learning

VR is so immersive that it doesn’t allow for the same distractions that would occur during a typical classroom lecture. Billy can’t flick the back of Samantha’s neck. Tommy can’t check his phone. Alex can’t stare out the window and think about how she would rather be playing soccer.

VR benefits from the same immediacy as practical experience. While students have the headset on, they have no choice but to focus only on what is right in front of them.

It’s Exciting

Finally, VR also has the benefit of being exciting enough to get kids eager to learn. Student engagement is one of the most important metrics for predicting academic success. Even more relevant than native ability, a student’s interest in learning will vastly inform their educational outcomes.

Excitement is one thing VR has in abundance. Who wouldn’t want to take a visit to Pompeii at the end of a lesson plan on volcanic eruptions?

Kids are used to environments of constant stimulation, to the point that it has widdled down their natural attention span.

It’s a problem that has been caused by technology, yes, but it is also one that can potentially be remedied by the right technological solutions and exposure.

Certainly, VR should not be applied constantly, nor as a substitution for reading. It’s a supplementary tool that can be used to improve engagement and make lesson plans very literally come to life before students’ eyes.

Obstacles

Accessibility is perhaps the primary hurdle between classrooms and VR technology. In a world of underfunded schools, how can any district justify spending many thousands of dollars outfitting classrooms with glorified gaming headsets?

The extent to which classroom VR is feasible will certainly hinge largely on the number of resources that can be dedicated to it. We may be many years removed from the day where VR is spread as thickly as tablets are today.

For now, school districts may consider VR along the same veins as they have other STEM-related acquisitions, outfitting each school with several headsets that can be employed across multiple classrooms.

Of course, the schools will also need software that aligns with their VR-related educational goals. While there may not be a simulation available for every mainstream lesson plan, this is a problem that shrinks exponentially with each passing day. VR and AR are growing rapidly.

In the years to come, educational related programs are only expected to grow.

The post There Many Amazing Benefits of VR in Education appeared first on SmartData Collective.

Source : SmartData Collective Read More

Cloud Composer at Deutsche Bank: workload automation for financial services

Cloud Composer at Deutsche Bank: workload automation for financial services

Running time-based, scheduled workflows to implement business processes is regular practice at many financial services companies. This is true for Deutsche Bank, where the execution of workflows is fundamental for many applications across its various business divisions, including the Private Bank, Investment and Corporate Bank as well as internal functions like Risk, Finance and Treasury. These workflows often execute scripts on relational databases, run application code in various languages (for example Java), and move data between different storage systems. The bank also uses big data technologies to gain insights from large amounts of data, where Extract, Transform and Load (ETL) workflows running on Hive, Impala and Spark play a key role.

Historically, Deutsche Bank used both third-party workflow orchestration products and open-source tools to orchestrate these workflows. But using multiple tools increases complexity and introduces operational overhead for managing underlying infrastructure and workflow tools themselves.

Cloud Composer, on the other hand, is a fully managed offering that allows customers to orchestrate all these workflows with a single product. Deutsche Bank recently began introducing Cloud Composer into its application landscape, and continues to use it in more and more parts of the business.

“Cloud Composer is our strategic workload automation (WLA) tool. It enables us to further drive an engineering culture and represents an intentional move away from the operations-heavy focus that is commonplace in traditional banks with traditional technology solutions. The result is engineering for all production scenarios up front, which reduces risk for our platforms that can suffer from reactionary manual interventions in their flows. Cloud Composer is built on open-source Apache Airflow, which brings with it the promise of portability for a hybrid multi-cloud future, a consistent engineering experience for both on-prem and cloud-based applications, and a reduced cost basis. 

We have enjoyed a great relationship with the Google team that has resulted in the successful migration of many of our scheduled applications onto Google Cloud using Cloud Composer in production.” -Richard Manthorpe, Director Workload Automation, Deutsche Bank

Why use Cloud Composer in financial services

Financial services companies want to focus on implementing their business processes, not on managing infrastructure and orchestration tools. In addition to consolidating multiple workflow orchestration technologies into one and thus reducing complexity, there are a number of other reasons companies choose Cloud Composer as a strategic workflow orchestration product.

First of all, Cloud Composer is significantly more cost-effective than traditional workflow management and orchestration solutions. As a managed service, Google takes care of all environment configuration and maintenance activities. Cloud Composer version 2  introduces autoscaling, which allows for an optimized resource utilization and improved cost control, since customers only pay for the resources used by their workflows. And because Cloud Composer is based on open source Apache Airflow, there are no license fees; customers only pay for the environment that it runs on, adjusting the usage to current business needs.

Highly regulated industries like financial services must comply with domain-specific security and governance tools and policies. For example, Customer-Managed Encryption Keys ensure that data won’t be accessed without the organization’s consent, while Virtual Private Network Service Controls mitigate the risk of data exfiltration. Cloud Composer supports these and many other security and governance controls out-of-the box, making it easy for customers in regulated industries to use the service without having to implement these policies on their own. 

The ability to orchestrate both native Google Cloud as well as on-prem workflows is another reason that Deutsche Bank chose Cloud Composer. Cloud Composer uses Airflow Operators (connectors for interacting with outside systems) to integrate with Google Cloud services like BigQuery, Dataproc, Dataflow, Cloud Functions and others, as well as hybrid and multi-cloud workflows. Airflow Operators also integrate with Oracle databases, on-prem VMs, sFTP file servers and many others, provided by Airflow’s strong open-source community.

And while Cloud Composer lets customers consolidate multiple workflow orchestration tools into one, there are some use cases where it’s just not the right fit. For example, if customers have just a single job that executes once a day on a fixed schedule, Cloud Scheduler, Google Cloud’s managed service for Cron jobs, might be a better fit. Cloud Composer in turn excels for more advanced workflow orchestration scenarios. 

Finally, technologies based on open source technologies also provide a simple exit strategy from cloud — an important regulatory requirement for financial services companies. With Cloud Composer, customers can simply move their Airflow workflows from Cloud Composer to a self-managed Airflow cluster. Because Cloud Composer is fully compatible with Apache Airflow, the workflow definitions stay exactly the same if they are moved to a different Airflow cluster. 

Cloud Composer applied 

Having looked at why Deutsche Bank chose Cloud Composer, let’s dive into how the bank is actually using it today. Apache Airflow is well-suited for ETL and data engineering workflows thanks to the rich set of data Operators (connectors) it provides. So Deutsche Bank, where a large-scale data lake is already in place on-prem, leverages Cloud Composer for its modern Cloud Data Platform, whose main aim is to work as an exchange for well-governed data, and enable a “data mesh” pattern. 

At Deutsche Bank, Cloud Composer orchestrates the ingestion of data to the Cloud Data Platform, which is primarily based on BigQuery. The ingestion happens in an event-driven manner, i.e., Cloud Composer does not simply run load jobs based on a time-schedule; instead it  reacts to events when new data such as Cloud Storage objects arrives from upstream sources. It does so using so-called Airflow Sensors, which continuously watch for new data. Besides loading data into BigQuery, Composer also schedules ETL workflows, which transform data to derive insights  for business reporting. 

Due to the rich set of Airflow Operators, Cloud Composer can also orchestrate workflows that are part of standard, multi-tier business applications running non-data-engineering workflows. One of the use cases includes a swap reporting platform that provides information about various asset classes, including commodities, credits, equities, rates and Forex. In this application, Cloud Composer orchestrates various services implementing the business logic of the application and deployed on Cloud Run — again, using out-of-the-box Airflow Operators.

These use cases are already running in production and delivering value to Deutsche Bank. Here is how their Cloud Data Platform team sees the adoption of Cloud Composer: 

“Using Cloud Composer allows our Data Platform team to focus on creating Data Engineering and ETL workflows instead of on managing the underlying infrastructure. Since Cloud Composer runs Apache Airflow, we can leverage out of the box connectors to systems like BigQuery, Dataflow, Dataproc and others, making it well-embedded into the entire Google Cloud ecosystem.”—Balaji Maragalla, Director Big Data Platforms, Deutsche Bank

Want to learn more about how to use Cloud Composer to orchestrate your own workloads? Check out this Quickstart guide or Cloud Composer documentation today.

Source : Data Analytics Read More