Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings. 

Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.

To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.

What is the BigQuery anti-pattern recognition tool?

The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go. 

It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns. 

The tool takes a BigQuery SQL query as an input, such as:

code_block[StructValue([(u’code’, u’SELECTrn t.dim1,rn t.metric1rnFROMrn `dataset.table` trnWHERErn NOT IN (rn SELECTrn idrn FROMrn `dataset.table2`)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f389190>)])]

And produces the output as:

code_block[StructValue([(u’code’, u’Subquery in filter without aggregation at line 8.’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6f50>)])]

It examines potential optimizations, including:

Selecting only the necessary columns

Handling multiple WITH-clause references 

Addressing subqueries in filters with aggregations

Optimizing ORDER BY queries with LIMIT

Enhancing string comparisons

Improving JOIN patterns

Avoiding subquery aggregation in the WHERE clause

The solution supports reading from various sources, such as:

Command line 

Local files

Cloud Storage files

Local folders

Cloud Storage folders

CSV (with one query per line)


Additionally, the solution provides flexibility in writing output to different destinations, including:

Printing to the terminal

Exporting as CSV

Writing to a BigQuery table

Using the BigQuery anti-pattern recognition tool

The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud. 


Linux OS

JDK 11 or above is installed



gcloud CLI

Quick start: – steps

1. Clone the repo into your local machine.

code_block[StructValue([(u’code’, u’git clone’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6cd0>)])]

2. Build the tool image inside the `bigquery-antipattern-recognition` folder.

code_block[StructValue([(u’code’, u’mvn clean package jib:dockerBuild -DskipTests’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df63d0>)])]

3. Run the tool for a simple inline query.

code_block[StructValue([(u’code’, u’docker run \rn -i bigquery-antipattern-recognition \rn –query “SELECT * FROM \`project.dataset.table1\`”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f33ad50>)])]

4. Below is the output result in the command-line interface:

Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.

Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.

Getting started

Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.

Have questions or feedback?

We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.

Source : Data Analytics Read More

QR Codes Leverage the Benefits of Big Data in Education

QR Codes Leverage the Benefits of Big Data in Education

The field of academia is more dependent on big data than ever before. Educational institutions reportedly spent over $13 billion on big data in 2020. Expenditures on big data in academia are projected to be worth over $57 billion by 2030.

As colleges and other educational institutions invest more heavily in big data, they are going to take advantage of different forms of technology that help them leverage it more effectively. QR codes are among the tools that colleges will use to take advantage of big data technology in the future.

QR codes are game changer for educational institutions relying on big data

QR codes are increasingly becoming an important part of the digital world. We already mentioned that they can play an important role in data-driven marketing strategies. They are used to store and transfer data quickly and securely, which makes them ideal for big data applications. With the help of QR codes, businesses can track customer behavior, analyze trends, and make better decisions based on the data gathered. Furthermore, they can also be used to collect customer feedback in real-time. By leveraging big data through QR codes, businesses can gain valuable insights into their customers’ preferences and needs.

At first glance, QR codes do not seem to help with big data applications. They can only store around 3,000 bytes of data. However, this data can be very important it is easily transferred to other storage centers. Therefore, a growing number of industries are using QR codes to facilitate projects that rely on big data technology.

QR codes can be especially useful for educational institutions trying to rely on big data. There are many amazing applications.

In the digital age, QR codes have become a versatile tool with numerous applications, so many organizations that relied on big data are taking advantage of them. In this article, we will explore the innovative ways QR codes, including the telegram qr code generator, are utilized in the field of education and training. From interactive learning experiences to personalized tracking and statistics, QR codes offer immense potential for enhancing educational practices.

Enhancing Learning with QR Codes

QR codes provide educators with a powerful tool to engage and interact with learners by leveraging data analytics more effectively. Here are some ways QR codes can be effectively used in the sphere of education and training:

Interactive Assignments: Teachers can create QR codes that link to additional learning resources, such as videos, articles, or interactive quizzes. Students can scan the codes using their smartphones or tablets to access the supplementary materials and deepen their understanding of the subject.

Virtual Field Trips: Edutopia reports that virtual field trips are making a big difference in education. They are one of the most innovative ways that schools are leveraging big data and other forms of digital technology. QR codes can transport students to virtual field trips by linking to 360-degree images, virtual reality experiences, or guided tours. This allows learners to explore historical sites, museums, or natural wonders from the comfort of their classrooms, making education more immersive and engaging.

Access to Digital Portfolios: QR codes can be used to provide quick access to students’ 

digital portfolios. By linking the code to an online platform, teachers, parents, and potential employers can easily view students’ work, including projects, artwork, and achievements, fostering a sense of pride and motivation.

Instant Feedback and Assessments: QR codes can enable students to receive immediate feedback on their assignments or assessments. By embedding the correct answers within the QR code, students can scan it and compare their responses, promoting self-assessment and identifying areas for improvement.

Tracking Statistics and Personalization

The use of QR codes in education and training goes beyond interactive learning experiences. They also offer valuable opportunities for tracking statistics and personalization. Here’s how:

Attendance Tracking: QR codes can streamline attendance tracking by allowing students to scan a code upon entering the classroom. This eliminates the need for manual attendance records and enables accurate and efficient tracking.

Assessment and Progress Tracking: By integrating QR codes into assessment materials, teachers can easily collect data on students’ performance and progress. This data can be used to identify areas of strength and weakness, tailor instruction to individual needs, and provide targeted feedback.

Personalized Learning Paths: QR codes can be used to create personalized learning paths for students. By linking codes to specific learning resources or assignments based on individual learning styles or interests, educators can provide customized learning experiences that cater to the unique needs of each student.

In conclusion, QR codes have revolutionized education and training by providing interactive and personalized learning experiences. With the telegram qr code generator, such as ME-QR, educators can easily create effective QR codes to enhance the educational journey. From interactive assignments to virtual field trips, QR codes engage students and expand their knowledge. Additionally, QR codes enable efficient attendance tracking, assessment monitoring, and personalized learning paths. By harnessing the power of QR codes, educators can transform traditional education into a dynamic and engaging process, catering to the diverse needs of learners.

QR codes help educational facilities that are trying to leverage big data

A growing number of colleges are trying to find new ways to take advantage of big data technology. QR codes give them a number of advantages that they wouldn’t otherwise be able to take advantage of.

Source : SmartData Collective Read More

The Role of Data Analytics in Football Performance

The Role of Data Analytics in Football Performance

We have talked extensively about the many industries that have been impacted by big data. many of our articles have centered around the role that data analytics and artificial intelligence has played in the financial sector. However, many other industries have also been affected by advances in big data technology. The sports industry is among them.

The Sports Analytics Market is expected to be worth over $22 billion by 2030. It is growing at a piece of 28% a year.

Data analytics can impact the sports industry and a number of different ways. Sports leagues and teams are using analytics to estimate turn out at various sporting events, predict the performance of individual athletes, identify ways that athletes can improve their performance and improve marketing strategies.

We have mentioned that golf players have used data analytics to improve performance. However, analytics technology is probably even more important for football teams, players and coaches.

Data analytics is transforming the future of football

Football, a sport loved and cherished by millions worldwide, has evolved significantly over the years. The introduction of data collection and analysis has revolutionized the way teams and coaches approach the game. Liam Fox, a contributor for Forbes detailed some of the ways that data analytics is changing the NFL. Big data will become even more important in the near future.

In this article, we will explore the crucial role that data collection plays in analyzing football performance, unlocking valuable insights that can shape the future of the sport.

1. Introduction

Football has become more than just a game of skill and passion. It has transformed into an arena where data-driven decisions and analysis play a pivotal role. Teams and coaches now rely on data collection to gain a competitive edge, enabling them to make informed choices that can impact the outcome of matches. Platforms such as Futemax, Xoilac TV, and Multicanais have emerged as valuable sources of football content, providing access to a vast array of matches and insights.

2. The Evolution of Data Collection in Football

Traditionally, football relied on basic statistics such as goals, assists, and possession percentages to evaluate performance. However, the advent of advanced technologies and analytics has ushered in a new era of data collection. Today, teams utilize sophisticated tracking systems, video analysis tools, and wearable devices to gather a wide range of performance metrics. This wealth of information has transformed the sport, empowering teams and coaches to make data-driven decisions that can significantly impact their success.

3. Types of Data Collected in Football

Tracking data utilizes GPS technology and sophisticated sensors to capture player movements, positioning, and speed during matches. This data helps teams analyze player behavior on the field, identify patterns, and make strategic adjustments to maximize performance.

In addition to performance metrics, data collection also includes injury and fitness data. Monitoring player fitness levels, tracking recovery progress, and identifying potential injury risks are crucial for maintaining the overall well-being of players.

4. The Role of Data Collection in Game Analysis

Data collection in football serves as a powerful tool for game analysis. Coaches and analysts meticulously study match statistics, player performance metrics, and tracking data to gain valuable insights into team dynamics. By examining patterns and trends, they can devise effective game plans and tactics to exploit opponents’ weaknesses and capitalize on their strengths.

Understanding the tactical aspects of the game becomes easier with data analysis. Coaches can analyze formations, player movements, and positioning to identify areas of improvement and optimize strategies. This data-driven approach enhances decision-making on the field and increases the chances of success.

5. Enhancing Player Performance through Data Analysis

Data collection and analysis have a significant impact on individual player performance. By studying player-specific data, coaches can identify strengths and weaknesses, helping players refine their skills and reach their full potential. Understanding each player’s unique attributes allows for personalized training programs tailored to their needs.

Furthermore, data analysis aids in monitoring player workload and fatigue. By tracking exertion levels, coaches can manage training loads effectively, prevent burnout, and reduce the risk of injuries. This data-driven approach promotes player longevity and ensures consistent performance throughout the season.

6. Scouting and Recruitment

Data collection plays a vital role in scouting and recruitment processes. By leveraging data analysis, teams can identify talented players with precision. Comprehensive player profiles are created, including performance metrics, playing styles, and comparative analysis. This data-driven approach helps teams make informed decisions when it comes to transfers and player acquisitions.

7. The Influence of Data Collection on Training

Training regimes can be greatly optimized through data analysis. Coaches can identify areas for improvement and tailor training sessions based on individual and team performance data. Real-time feedback and analysis enable players to understand their strengths and weaknesses, leading to targeted training interventions that enhance performance.

By incorporating data-driven methodologies, teams can design player development programs that foster continuous improvement. Training becomes more precise, efficient, and goal-oriented, resulting in accelerated growth and skill refinement.

8. Fan Engagement and Experience

Data collection not only benefits teams and players but also enhances the fan experience. Platforms like Futemax, Xoilac TV, and Multicanais leverage data analysis to provide personalized content and insights to fans. Fans can access detailed match statistics, player profiles, and tactical analysis, allowing them to delve deeper into the intricacies of the game.

Integration of data in live broadcasts enhances the viewing experience, providing real-time statistics and analysis. This interactive approach keeps fans engaged, fostering a deeper connection between the sport and its followers.

9. Ethical Considerations and Data Privacy

While data collection offers numerous benefits, it is essential to address ethical considerations and data privacy. Teams and organizations must ensure the security and privacy of collected data. Transparent data collection practices, consent from players, and compliance with privacy regulations are crucial for maintaining trust and upholding ethical standards.

Balancing the quest for performance analysis with player well-being is essential. Striking the right balance ensures that data collection is used responsibly, respecting the privacy and welfare of players.

10. Future Trends in Football Data Analysis

As technology continues to advance, the future of football data analysis looks promising. Artificial intelligence and machine learning algorithms will play a more significant role in interpreting complex data sets, providing deeper insights and predictive analytics. Sensor technologies and wearables will evolve, offering more precise tracking and performance monitoring capabilities. The fusion of data analysis and advanced technology will unlock new dimensions in understanding and optimizing football performance.


The role of data collection in analyzing football performance cannot be overstated. From tactical analysis and game planning to player development and fan engagement, data-driven decision-making has revolutionized the sport. As Futemax, Xoilac TV, and Multicanais provide access to an extensive range of football content, the power of data analysis becomes accessible to fans, teams, and coaches alike. Moving forward, embracing ethical practices and leveraging technological advancements will unlock further insights, propelling the sport to new heights.

Source : SmartData Collective Read More

Productionizing SQL-based workflows in Google Cloud

Productionizing SQL-based workflows in Google Cloud

Despite its use and relevance in the Big Data realm, SQL can be cumbersome to manage from a DevOps perspective, as there are no standardized processes for managing, deploying, or orchestrating a repository of SQL scripts.

Dataform is a tool that enables cross-team collaboration on SQL-based pipelines. By pairing SQL data transformations with configuration-as-code, data engineers can collectively create an end-to-end workflow within a single repository. For a general introduction to Dataform, please see the GA announcement blog post.

The purpose of this article is to demonstrate how to set up a repeatable and scalable ELT pipeline in Google Cloud using Dataform and Cloud Build. The overall architecture discussed here can be scaled across environments and developed collaboratively by teams, ensuring a streamlined and scalable production-ready set up.

Sample workflow

First, let’s consider a simple ELT workflow. In this example, there are three sample subject areas that need to be created as part of a Data Warehouse: Account, Customer, and Sales. The example workflow just consists of building a view based on data in a source table, but this can be expanded as needed into a real world transformation.

To see what the Dataform code would look like for this example, please refer to the accompanying sample code for this blog.

The three subject areas are organized by tags, which separates them into distinct sections of code in the repository. Using the Dataform CLI, you can run commands on your local machine to execute this workflow and build individual subject areas:

dataform run –tags=[SUBJECT_AREA]

When this command is run, Dataform will compile your project into a DAG of SQL actions, run the actions with  the tag in question, and create BigQuery jobs to execute the workflow. With tags, a team of developers can build out each subject area or domain in a data warehouse collaboratively.

By using GitHub (or similar version controlled repository) and running local Dataform CLI executions with tags, code changes can be tested in isolation from the rest of the subject areas.

Dataform offers many tools for building out extensive SQL pipelines, which will not be covered in this article. I highly recommend reviewing the Documentation to learn more about how powerful Dataform can be!

Setting up CI/CD for the sample workflow

The previous section highlighted how a team can collaboratively build out and test a Dataform project in GitHub. The question now is, how to easily deploy these code changes and set up a CI/CD pipeline for our Dataform code? This is where Cloud Build comes into play.

You can set up CI/CD in the Dataform project using Cloud Build, a serverless platform that can scale up or down depending on the workload and has tight integration with other GCP services (this will become very useful in the following section).

Going back to the example workflow, you can set up Cloud Build triggers that fetch the latest SQLX files for any given subject area, and execute Dataform to update the data in BigQuery. Let’s explore how this would work for your first subject area (Account).

Example CI/CD workflow for Dataform

Preliminary setup: IAM

First, ensure that you have a service account that contains all the necessary roles and permissions needed to build and deploy Dataform code. It is advised to create a custom service account to accomplish this (e.g., “dataform-deployer”) and assign it the following roles:

BigQuery Admin

Pub/Sub Publisher

Logs Writer

1. Version control

To ensure strong version control of all code changes, it is important to store all changes in a git repository. For this example, please refer to the accompanying github repository

Having a proper repository set up ahead of time will streamline the DataOps process and allow for easy deployment.

2. Build and deploy

To build out this data on your local machine, you just needed to run the Dataform command and pass in the appropriate tag. To move this step into Cloud Build however, you will need to add the following files to your source repository:

A shell script containing a generalized Dataform command (see

A separate shell script for running unit tests (see

A cloudbuild configuration file for our subject area (see cloudbuild_account.yaml)

Next, you will need to create a Cloud Build trigger following the steps in the public documentation. The first step to accomplish this is to connect the GitHub repository to Cloud Build. You will be asked to authenticate your account and select the repository in question.

Note: the location of the repository is very important. You will only be able to create triggers in the same region that you have connected your repository.

Once the repository is connected, you can create a trigger that will execute the Dataform code. This example will create a manual trigger that you can schedule to run daily, but Cloud Build also supports a variety of other trigger options that might suit your specific use case better.

Follow the on-screen steps to create the trigger, and ensure that the fields accurately reflect how your repository is set up. For the example code, refer to the configurations below:

For the example project to run successfully, you must enter a Project ID and location (US) as substitution variables during trigger creation.

3. Run Dataform CLI and execute BQ jobs

Once the trigger has been set up, you can kick off a manual build and see the pipeline in action. Under the hood, Cloud Build is connecting to your GitHub repository to fetch all the relevant files

The cloud build YAML file contains the build configuration (i.e., what steps are in the build, what order they execute in, etc.)

Inside the YAML file are calls to your shell script, which executes a generalized version of the “dataform run” command. Rather than pass in tags and variables manually, we can set these in the cloud build configuration as environment variables

Everytime the shell script is called, it pulls the relevant SQLX files and executes Dataform

Dataform in turn interprets the SQLX files you passed in, and spins up BigQuery jobs in response.

Example workflow for Cloud Build set up

Thanks to Cloud Build, you now have CI/CD enabled in your workflow. Every code change that is made in GitHub can be immediately factored into our pipeline execution the next time we trigger a build.

In order to orchestrate this execution to run every day/hour/etc. You just need to create a scheduler job that invokes our manual trigger. This can also be done directly in Cloud Build:

4. Publish success message with Pub/Sub

Lastly, you will need to create a Pub/Sub topic called “dataform-deployments”. The Cloud Build deployment is configured to publish messages to this topic every time a build succeeds (see the final step in the cloudbuild_account.yaml file). This is not going to be very useful in the current section, but you will see its benefit later on.

That’s all there is to it! Now a team of developers can make as many changes to the source code as needed, without worrying about tracking those changes or deploying them.

Linking builds together with Pub/Sub

The previous section explained how to set up one trigger to build one subject area. Let’s also discuss the use case where you have multiple subject areas that need to run one after another. You can revise your Cloud Build configuration file to build each subject area in a separate step, but this comes with one key limitation — if your build fails halfway, you will have to re-run the entire build again (and rebuild subject areas that may have already been successfully updated).

In the working example, you are building Account, then Customer, then Sales in that order. If Account builds successfully but Customer fails mid-way through, you will want to be able to resolve the error and kick off the pipeline from its most recent step (i.e., the Customer build).

This is where Cloud Build’s tight integration with GCP services comes into play. Cloud Build natively can communicate with Pub/Sub, and create push subscriptions for triggers. This means that we can create separate builds for each subject area, and link them together via Pub/Sub.

To view this in action, let’s revisit the cloudbuild_account.yaml configuration file in the example project. The final step in the build publishes a message “SUCCESS” to the “dataform-deployments” topic we created. We also provide a subjectArea attribute that clarifies which subject area just completed.

Now we can take advantage of this system to link our Cloud Build triggers together. Let’s start by creating a trigger for the Customer subject area. Instead of a manual trigger, this time we will create a pub/sub invoked trigger and subscribe it to the “dataform-deployments” topic.

The rest of the configuration steps will be the same as what we set for our Account trigger, with three key differences:

The configuration file needs to point to the Customer config (“cloudbuild_customer.yaml”)

We need to add one more substitution variable that stores the metadata from our Pub/Sub message

We need to add a filter that ensures this trigger only invokes when the “account” subject area completes. We don’t want it to fire every single time a message is published to our topic.

Once the trigger has been set up, let’s go back and kick off the manual trigger for our Account subject area.

What we should now see is for the customer trigger to automatically invoke once the account trigger completes:

Putting it all together: Create a scalable ELT pipeline

Following the same process, we can set up a third build trigger for the “Sales” subject area. This will work the same way as Customer, but it will be waiting for a Pub/Sub message with the attribute “Customer”.

We have now successfully set up a chain of builds that forms a pipeline:

Example pipeline

With this set up, scaling the workflow becomes a quick and easy process.

Expanding the SQLX files for a specific subject area

SQLX can now be modified as often as needed by developers in the source repo. Additional files can be created with no impact to the deployment or orchestration process. The only caveat is that new files must be assigned the appropriate tag in their config block.

This way, no matter how much the source code is modified, all of the changes are tracked in GitHub and picked up by Cloud Build automatically.

Creating new subject areas

Creating new subject areas also becomes a straightforward process. A new tag just needs to be assigned to the subject area in question, and from that point the Dataform code can be written and tested locally before merging with the master branch. To incorporate the changes into Cloud Build, a new configuration file and trigger will need to be created. This becomes a simple process however, since most of the configuration is repeatable from previous builds.

Deployment strategies for different environments

Scale across environments

All of the work done thus far is for a single GCP project. What happens when you want to scale this process out across a dev, QA, and production environment? The same process is easily applicable in this scenario — it just requires the following steps:

Set up a GCP project for each environment

Create separate branches in your repository for each environment

Ensure the dataform.json file points to the correct project in each branch

Set up identical builds in each environment, but configure them to pull from the correct branch in your source repository

Link the specific branch in each project to Dataform in GCP, following the documentation

Once this is set up, developers can build, test, and deploy changes up through production with very little upkeep or maintenance. All that is required is proper tagging protocols in the Dataform code, and a PR/merge process to move data from one environment to another:

Example deployment strategy

VPC-SC considerations

It is important to note that the proposed setup may not work out-of-the-box in GCP environments leveraging VPC Service Controls for Pub/Sub and/or Cloud Build. Please keep in mind the following considerations.

When running Cloud Build inside of a VPC security perimeter, builds must be run inside of a private pool. Additionally, Pub/Sub push subscriptions inside a security perimeter are only supported for the Cloud Run default URLs. As a result, when attempting to create a push subscription for the Cloud Build trigger, a VPC error will likely pop up.

This can be worked around by either moving relevant projects outside of the perimeter to create the push subscription and then moving them back in, or using webhook-invoked triggers that do not rely on pub/sub. 

What next?

The methods show here enable large teams of Data Engineers and Analysts to collaboratively develop code, create deployments, and orchestrate executions all from a single source repository. Using the directions in this blog and the accompanying github sample project, the goal is to enable teams to set up their own scalable architecture for ELT in the cloud. 

For more information on adopting Dataform into your workloads, please be sure to review the Google Cloud documentation.

Source : Data Analytics Read More

Predict, personalize, and wow your customers with better analytics and AI

Predict, personalize, and wow your customers with better analytics and AI

What will financial services look like in the future? Historically, financial services have been driven by one-on-one relationship-based interactions. 

Now, customers expect these trusted relationships to translate into personalized, digital experiences that prove their providers know and value them, and have the capabilities to help them.

It’s no secret that customer engagement leads to revenue growth. That’s why customer data platforms (CDPs) have gained traction across multiple industries to help businesses analyze, predict, and personalize customer journeys.

However, a CDP by itself may not be sufficient to gain a holistic view of the customer. Financial services institutions already capture a myriad of customer and marketing data and use many tools to engage their customers. But companies still struggle to unify and analyze all that data across silos and create a holistic view of their customers that’s actionable across different channels and capabilities.

Financial services companies will need to think strategically about maximizing their use of first-party data and analytics to drive better and sustainable business outcomes.

With a true 360 view of the customer, financial services institutions can seamlessly combine all the right data and gain meaningful insights to deliver the type of personalization that really matters to customers.

Google Cloud is here to help

Customer expectations are changing rapidly, and financial services leaders need to be as innovative and efficient as possible. By combining customer servicing and transactional data with marketing, channel, CRM, and other first-party data within a secure cloud-based environment, equipped with intelligent, ML-driven customer insights, financial services companies can uncover deeper customer insights and activate them quickly.

To learn more or explore how Google Cloud fits into your customer data technology strategy, please contact us or visit our website.

aside_block[StructValue([(u’title’, u'[White paper] Lead the way in customer-focused financial services with analytics and AI’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e7bda7d71d0>), (u’btn_text’, u’Download now’), (u’href’, u’’), (u’image’, None)])]


1. “The value of getting personalization right—or wrong—is multiplying,” McKinsey & Company, November 12, 2021
2.  “Survey shows customers give banks lackluster grades on their personalization,” Insider Intelligence, April 11, 2022
3.“The Future of Insurance: Personalized, Digitized and Connected,” Capco Insurance, September 21, 2021
4.Press Release: “72% of Consumers Would Share Personal Data to Get Cheaper Insurance Premiums,” Capco, July 22, 2021 
5. “The value of getting personalization right—or wrong—is multiplying,” McKinsey & Company, November 12, 2021
6. “2030 Today, 2021,”, 2021
7. “Getting Personal: Consumer Perspectives on AI in Marketing and Customer Service,”, April 2022 
8. Gartner®, Gartner’s Customer Data Survey: The 360-Degree View of the Customer Is More Myth Than Reality, Benjamin Bloom, Lizzy Foo Kune, Mike McGuire, 30 November 2021
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.
9. “Closing the data value gap,” Accenture, December 2021
10. “Reshaping retail banking for the next normal,” McKinsey & Company, 2020
11. “Bring Customers Into Clear Focus,” Gallup, January 6, 2021
12. “Responsible Marketing With First-Party Data,” BCG, May 18, 2020

aside_block[StructValue([(u’title’, u’How financial services can unlock customer insights to deliver personalization securely’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e7bc2cff610>), (u’btn_text’, u’Read now’), (u’href’, u’’), (u’image’, <GAEImage: fsi x transform.jpg>)])]

Source : Data Analytics Read More

7 Mind-Blowing Ways Smart Homes Use Data to Save Your Money

7 Mind-Blowing Ways Smart Homes Use Data to Save Your Money

Big data technology is changing our lives in tremendous ways. One of the most significant changes has been the invention of smart homes. Market analysts expect that the market for smart homes will be $worth 581 billion by 2032.

Think about this for a moment: what if your house could think for you? Better yet, what if it could think about you? Imagine a world where your home understands your habits, preferences, and even your routine, then uses this information to cut costs and save you money. Sounds incredible, right? Welcome to the future – welcome to the era of smart homes. There are already many ways smart homes use data to improve our lives.

What’s a Smart Home Anyway?

A smart home is the next level in home automation. It’s not just about having gadgets that respond to your commands. It’s about the seamless integration of devices to create an environment that intuitively understands and reacts to your lifestyle. In essence, your home becomes a personal assistant, a faithful guardian, and a financial adviser all rolled into one.

Show Me the Money: The 7 Ways Smart Homes Save Big

Let’s get to the meat of the matter: how does a smart home save you money? Here are seven astonishing ways:

1. Smart Thermostats

Imagine a thermostat that learns your routine and adjusts temperatures accordingly. No more forgetting to turn down the heat when you leave. No more chilly surprises in the morning. Sounds nice, doesn’t it? Well, that’s exactly what a smart thermostat does. It optimizes energy consumption, reducing your bills and helping the planet too.

2. Efficient Lighting

Smart lighting can detect when rooms are vacant and turn off lights automatically. It can also dim lights to your preferred settings, reducing energy consumption while enhancing the ambiance. Little changes, big savings.

3. Intelligent Home Security

Forget expensive security services. Smart homes monitor and protect your property using motion sensors, cameras, and automated locks. Besides offering peace of mind, these features can reduce home insurance premiums. Now that’s smart.

4. Proactive Maintenance

What if your home could detect issues before they turn into expensive repairs? With predictive analytics, your smart home can warn you about potential problems like leaks or electrical faults, enabling you to fix them early and avoid hefty repair costs.

5. Smart Appliances

Smart appliances like refrigerators, washing machines, and ovens adjust their performance based on usage patterns, reducing energy waste. Some can even order groceries or replacement parts for you. Talk about convenience and savings rolled into one!

6. Water Conservation

Smart irrigation systems can adjust water usage based on weather forecasts and soil conditions, avoiding water wastage. Smart showers can limit water usage based on your preferences. It’s like having a personal water conservation officer right at home.

7. Real-Time Energy Monitoring

Smart homes can provide real-time energy consumption reports, allowing you to identify energy hogs and make informed decisions. Knowledge is power, especially when it comes to reducing electricity bills.

A Smart Investment for Your Wallet and the Planet

So, are smart homes worth the investment? Consider this: they save you money, increase your home’s value, and contribute to a sustainable environment. The question isn’t whether you can afford to make your home smart. The question is, can you afford not to?

In essence, smart homes are more than a fad or a luxury. They are a testament to the potential of human ingenuity, a beacon of hope in an era of the climate crisis, and a practical solution for everyday financial management. In the grand scheme of things, we’re not just talking about cutting costs. We’re talking about a lifestyle transformation – one that places efficiency, convenience, and sustainability at its heart.

The Bottom Line

The benefits of smart homes extend far beyond the realms of convenience and efficiency. They touch upon environmental sustainability and long-term financial prudence. If you think about it, it’s a marvelous blend of technology and lifestyle aimed at making life better and easier for us.

In an era where data is gold, the ability to analyze and utilize it in our daily lives gives us an unprecedented level of control. The fact that this control can lead to significant financial savings is the icing on the cake.

So, What’s Holding You Back?

Could it be the upfront cost? True, smart homes come with an initial investment. But think about it as planting a seed. You water it, you nurture it, and then you watch it grow and bear fruit. Similarly, the initial cost of setting up a smart home should be seen as an investment. An investment that not only brings a return in the form of financial savings but also elevates your standard of living.

Or perhaps you’re wary of the technology itself? Let’s face it – new tech can be intimidating. But remember, technology is merely a tool. It’s here to serve us, to make our lives better. And with the pace at which user-friendly home automation devices are hitting the market, you don’t need to be a tech wizard to enjoy the benefits.

Plus, if your home automation system ever runs faulty, consider buying a home warranty service that will keep you covered for the damages. (If you’d like to know more about the process, visit the following link:

So, let’s welcome the future. Let’s embrace the paradigm shift in how we live and manage our homes. Let’s move towards homes that are not just spaces to live in, but companions that understand us, work with us and, above all, save us money.

In the end, isn’t that what everyone wants? A home that’s not just a shelter but a partner in our day-to-day lives, making every day a bit easier, a bit smarter, and yes, a bit cheaper. So, here’s to smart living and smarter savings!

Source : SmartData Collective Read More

Building ML workflows in BigQuery the easy way, without code

Building ML workflows in BigQuery the easy way, without code

As enterprise data volumes continue to explode, and businesses depend more heavily on predictive analytics and automated decisioning, cloud data warehouses such as Google BigQuery are often the only viable option for organizations looking to implement ever-more complex and scalable workloads using ML and AI capabilities. 

Cloud is the way for analytics at scale

BigQuery, with its compute and storage separation, enables unparalleled scalability, flexibility and cost-effectiveness. Our serverless data warehouse works across clouds and scales with your data; with BI, Machine Learning and AI built in. It is becoming the de facto choice for enterprises looking for a single source of data truth and truly scalable ML-driven analytics. 

In this post, we’ll explore how BigQueryML can be used in conjunction with an innovative visual analytics tool from our partner CARTO, to extend the reach of cloud native analysis to a broader user base.  

CARTO Workflows is a flexible platform tool that automates analytical workflows using an easy-to-use, no code interface natively in BigQuery.

Traditional visual automation tools do not fully leverage the cloud and can be expensive

In today’s data-driven world, enterprises are searching for efficient ways to conduct more advanced analytics and develop data-driven applications. BigQuery empowers analysts and data scientists to process massive volumes of data using familiar SQL commands. However, many alternative visual analytics tools fall short, as they are not entirely cloud-native. These solutions rely on inflexible and expensive desktop or server-based infrastructures, which often lack comprehensive data governance, encounter performance limitations, and prove costly when scaling on demand.

Introducing CARTO Workflows, visual analytics native to BigQuery

CARTO Workflows allows users to visually design their analysis by dragging and dropping components and data sources. The workflow is then automatically compiled into SQL and is pushed down to BigQuery. Users can design, execute, automate, and share sophisticated analytical workflows with all capabilities of BigQuery SQL and its extensions.

You can create and visualize the entire analytics pipeline, execute individual steps and debug, where necessary. Everything created in Workflows is computed natively in BigQuery. The simplicity of the tool enables wider adoption of analytics across teams and faster onboarding, unlocking advanced ML capabilities for a wider cohort of non-expert users.

Unlocking BigQuery ML with CARTO Workflows 

To demonstrate the power of CARTO Workflows and BigQuery, let’s take a look at a practical example. In this scenario we will build out a workflow to predict daily store sales for a consumer brand category, using a model based on the ARIMA family available in BigQueryML.

Step 1 – Input Data

For this example, we will be using a publicly available dataset of Liquor sales in the state of Iowa. This dataset can be accessed on the Google Cloud Marketplace here. To replicate this analysis, you can sign up for a free 14-day CARTO trial.  With our native connection to BigQuery, we can access this dataset through the CARTO Data Explorer, and simply drag the dataset to the Workflows canvas. 

Step 2 – Data Preparation  

Since we have all transactions of daily liquor sales, we have to group transactions by store, and by day. The result will give us a single daily sales value for each point of sale.

Step 3 – Model Training Filters

The deadline for our model is 2020-06-01. Previous daily data will be used to train our model. To do this we must perform a Simple_Filter. In addition, we are going to predict with a year of historic data, we therefore apply an additional Simple_Filter with the date 2019-06-01.

Then we select the desired columns from the dataset and sort to display only the latest data.

Step 4 – ARIMA Model Training

We have trained the data with an ARIMA model using variables for 1802 stores in a single query, in less than 10 minutes. We use the CREATE_MODEL statement from BigQuery. US holidays have been selected. The frequency is daily and takes into consideration different possible seasonalities. We leave the task of coefficient estimation and parameter selection to BigQueryML. The BigQuery options defined in the workflow are as follows:

code_block[StructValue([(u’code’, u’model_type= “ARIMA_PLUS”, rntime_series_timestamp_col=”date”, time_series_data_col=”bottles_sold”, rntime_series_id_col = “store_number”, rnHOLIDAY_REGION = “US”, rnDATA_FREQUENCY = “DAILY”, rnSEASONALITIES = [“YEARLY”, “WEEKLY”, “MONTHLY”], SEASONALITIES = [“YEARLY”, “WEEKLY”, “MONTHLY”].’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8b12fcd2d0>)])]

Step 5 – Forecasting

We also use ML.FORECAST to make our daily sales predictions. We can simply pick the inputs for forecast values and also set the confidence interval. This process uses the newly trained model to make future predictions according to the set horizon. We also add a column named index(CONCAT DATE WITH STORE NUMBER). This column will be used to join the forecasted data with its actual values to compare results.

Step 6 – Save the Forecast

In this final step, we save the results of our forecast into a BigQuery table. We can further analyze these results, or visualize the output using CARTO Builder.

To see the complete steps of this workflow in action, check out our video.

BigQuery is leading the analytics space for good reason. It provides a single source of data truth, enabling better data governance, avoiding duplication, and scales automatically based on workload demands. In conjunction with CARTO Workflows and its easy-to-use visual interface, sophisticated data pipelines can be automated to truly democratize the full potential of cloud-native ML capabilities across business units, and for a limitless scope of use cases.

Click here to try out: CARTO Workflows

Source : Data Analytics Read More

Announcing Dataform in GA: Develop, version control, and deploy SQL pipelines in BigQuery

Announcing Dataform in GA: Develop, version control, and deploy SQL pipelines in BigQuery

Data teams building SQL pipelines grapple with the challenge of manually piecing together custom processes and infrastructure, creating a lag time in development, wasting time troubleshooting issues, and often preventing data analysts from contributing to the process. To help, today we are announcing the general availability of Dataform, which lets data teams develop, version control, and deploy SQL pipelines in BigQuery. Dataform helps data engineers and data analysts of all skill levels build production-grade SQL pipelines in BigQuery while following software engineering best practices such as version control with Git, CI/CD, and code lifecycle management. 

Dataform offers a single unified UI and API with which to build, version control, and operationalize scalable SQL pipelines. In this single environment, data practitioners can develop new tables faster, ensure data quality, and operationalize their pipelines with minimal effort, making data more accessible across their organization.

An end-to-end SQL pipeline experience in BigQuery

With Dataform, data and analytics teams can:

Develop complex pipelines with SQL in code with Dataform core, an open-source framework that brings automated dependency management, data-quality testing, code reuse, and table-documentation features to SQL development.

Develop pipelines on the web from the BigQuery console, where users can work from individual, isolated workspaces, visualize their pipelines dependencies, get real-time errors, and version control their code with Git. 

Deploy SQL pipelines in different execution environments, on a schedule or via API triggers, without having to manage any infrastructure.

A unified experience for all data practitioners

Dataform helps organizations standardize SQL pipeline development across their organization around a single piece of tooling and a single development process. 

Data teams can collaborate following software engineering best practices like Git, CI/CD, and code lifecycle management. 

Data engineers can manage code lifecycle and scheduling across development, staging, and production execution environments without having to manage any infrastructure. 

And lastly, data analysts can contribute to existing pipelines or manage their own by developing, testing, and version controlling SQL pipelines from a web interface. 

What customers are saying

“As a company with 1000+ collaborators, we used to struggle with a lack of governance and standards to manage our BigQuery data,” says Lucas Rolim, Director of Data & Analytics at Hurb. “Dataform provides our data team a common interface to adopt software development best practices such as versioning, code reviews and commit history.”

“Before we started using Dataform, we used an in-house system to transform our data which was struggling to scale to meet our needs,” says Neil Schwalb, Data Engineering Manager at Intuit Mailchimp. “After adopting Dataform and more recently Dataform in Google Cloud we’ve been able to speed up and scale our data transformation layer to 300+ tables across large volumes of data. The Google Cloud-Dataform integration has also sped up our development workflow by enabling faster testing, clearer logging, and broader accessibility.”

“Over the last few years the demand for Data Science, Machine Learning and AI has grown massively at OVO on our Path to Zero [carbon emissions]. We grew fast, and each team of Data Scientists were tasked with building data and machine learning pipelines. It meant we could deploy new capabilities quickly, and drive real benefits for our customers, but it wasn’t scalable,” says Dr. Katie Russell, Data Director at OVO. “Adopting Dataform has allowed us to create consistency without sacrificing flexibility or pace of development. With Dataform we have sped up deployment processes, reduced quality issues and improved documentation and discoverability.”

How can I learn more?

You can get started with Dataform by visiting the website or reading the documentation.

Related Article

Dataform is joining Google Cloud: Deploy data transformations with SQL in BigQuery

With our acquisition of Dataform, you can now leverage software development best practices to define, document, test and deploy data tran…

Read Article

Source : Data Analytics Read More

AI Can Help Accelerate Development with Low-Code Frameworks

AI Can Help Accelerate Development with Low-Code Frameworks

Artificial intelligence technology is significantly changing the software development profession. According to Github, 1.2 million developers already rely on artificial intelligence technology to do their jobs.

However, despite their knowledge about programming, many developers do not fully grasp the benefits that AI can bring to their profession. There are a number of reasons that they should consider using AI, such as taking advantage of low-code frameworks.

Bernard Marr wrote an excellent article in Forbes about the role of artificial intelligence in creating low code frameworks for developers. This is one of the many ways that AI is going to be a gamechanger in the software development profession.

Low code tools will become more popular as developers look for ways to use AI to boost efficiency

Marr writes that the emergence of no-code/low-code AI tools and platforms is exciting. Anyone can create applications that use machine learning in innovative ways. It’s easy to get started with AI, from designing web services to coordinating sales and marketing campaigns. No-code/low-code solutions work through a drag-and-drop interface or a wizard. If you know how to code, you can fine-tune the results for more specific applications. Basic knowledge of computer code structure and syntax is helpful.

‍In today’s rapidly changing business landscape, companies must adapt and innovate to stay competitive. As a result, many organizations are turning to AI-driven low-code frameworks to accelerate their development processes and bring innovative solutions to market more quickly. Low-code frameworks enable businesses to build powerful applications with minimal coding, streamlining the development process and reducing time to market. This article examines the advantages of low-code frameworks that rely on advanced AI algorithms, explores their key features, and provides guidelines for choosing the right platform for your organization.

What are Low-Code AI Frameworks?

Low-code frameworks are modular, visual development platforms that simplify the process of creating custom software solutions by reducing the amount of hand-coding required. These platforms utilize drag-and-drop interfaces, pre-built components, advanced AI algorithms and templates to facilitate rapid application development, enabling developers and even non-technical users to build and deploy applications quickly and efficiently.

Low-code platforms have emerged as a popular alternative to traditional software development methodologies, offering numerous benefits, including increased agility, reduced development costs, and the ability to address growing IT backlogs.

The Rise of Low-Code Development

According to a Gartner report, by 2025, 70% of new applications will be developed using low-code or no-code technologies, up from less than 25% in 2020. This rapid growth can be attributed to several factors:

Increasing demand for digital transformation and agile development methodologies

The need for faster, more efficient application development processes

The growing skills gap in the software development industry

The desire to democratize software development and empower citizen developers

Low-code frameworks have become increasingly popular as organizations seek to address these challenges and accelerate their digital transformation initiatives.

Top Low-Code Platforms

There are numerous low-code platforms available on the market, each offering unique features and capabilities. Some of the top low-code platforms include:

1. Mendix

Low code development using Mendix: Mendix is a low-code platform that enables users to build and launch powerful applications with minimal coding effort. It offers a range of features, including visual development tools, pre-built components, and templates, making it an ideal choice for organizations looking to accelerate their development processes.

2. OutSystems

Low code development using OutSystems: OutSystems is a low-code platform that offers a comprehensive set of features for building and deploying enterprise web and mobile applications. With its full application lifecycle management capabilities, OutSystems empowers organizations to focus on customer experience and innovation, rather than application development and management.

3. Microsoft Power Apps

Low code development using Microsoft Power Apps: Microsoft Power Apps is a cloud-based low-code platform that enables rapid application development and deployment. It is integrated with Microsoft Dataverse, linking all your data with Microsoft 365, Dynamics 365, and Azure capabilities. This platform is suitable for businesses of all sizes looking to digitize their processes and enhance their operations.

Indium specialize in providing solutions using low-code development platforms. These platforms allow businesses to quickly and easily create custom applications without the need for extensive coding knowledge.

Looking to create or improve your business applications without the need for extensive coding knowledge? Look no further than our low-code/no-code services. Low-code/no-code platforms are user-friendly and insightful platforms that empower users to easily design, develop and deploy custom applications for their business needs.

Advantages of Low-Code Frameworks

Low-code frameworks offer numerous benefits for organizations looking to accelerate their development processes and improve overall efficiency:

1. Faster Development

Low-code platforms streamline the development process by providing visual design tools, pre-built components, and templates that eliminate much of the manual coding required in traditional development. This enables developers to build applications more quickly and efficiently, reducing time to market.

2. Greater Agility

Low-code frameworks allow organizations to respond more quickly to changing market conditions and customer needs by enabling rapid application prototyping and iterative development. This increased agility enables businesses to stay ahead of the competition and adapt to evolving business requirements.

3. Reduced Development Costs

By minimizing the amount of hand-coding required, low-code platforms can significantly reduce development costs. This includes not only the cost of hiring and retaining skilled developers but also the time and resources spent on development projects.

4. Democratization of Development

Low-code frameworks empower non-technical users, or “citizen developers,” to build and deploy applications without extensive coding knowledge. This democratization of development can help organizations address the growing skills gap in the software development industry and reduce reliance on specialized developers.

5. Improved Collaboration

Low-code platforms foster better communication and collaboration between business and IT teams by simplifying the development process and making it more accessible to non-technical users. This can lead to a more aligned and efficient development process, with better outcomes for the organization.

6. Scalability and Flexibility

Low-code platforms offer a scalable and flexible solution for application development, enabling organizations to build and deploy applications that can grow and adapt with their business needs. This makes low-code an attractive option for businesses looking to future-proof their software investments.

Key Features of Low-Code Platforms

Low-code platforms offer a range of features designed to simplify and accelerate the development process. Some of the most important features include:

1. Visual Development Tools

Low-code platforms typically provide drag-and-drop interfaces and visual development tools that enable developers and non-technical users to build applications quickly and easily, without extensive coding knowledge.

2. Pre-Built Components and Templates

Low-code platforms offer a library of pre-built components and templates that can be easily customized and integrated into applications. These components can save time and reduce development effort by providing reusable building blocks for common application functionality.

3. Integration Capabilities

Low-code platforms often include built-in connectors and integration capabilities that enable seamless communication between applications and external systems, such as databases, cloud services, and third-party APIs. This can help streamline the development process and ensure that applications can easily integrate with existing IT infrastructure.

4. Automation and Workflow Management

Many low-code platforms include automation and workflow management features that allow developers to easily define and manage complex business processes within their applications. This can help improve operational efficiency and ensure that applications are aligned with business requirements.

5. Security and Compliance

Low-code platforms typically offer robust security and compliance features to help protect sensitive data and ensure that applications meet industry standards and regulations. This can be particularly important for organizations operating in highly regulated industries, such as healthcare, finance, and government.

Choosing the Right Low-Code Platform

When selecting a low-code platform for your organization, there are several factors to consider:

1. Ease of Use

A key advantage of low-code platforms is their ease of use, so it’s essential to choose a platform that offers an intuitive, user-friendly interface that can be easily adopted by both technical and non-technical users.

2. Customization and Flexibility

Choose a low-code platform that allows for a high degree of customization and flexibility to ensure that you can build applications that meet your unique business requirements.

3. Integration Capabilities

Ensure that the low-code platform you select offers robust integration capabilities, allowing your applications to seamlessly connect with existing systems and data sources.

4. Scalability

Consider the scalability of the low-code platform, ensuring that it can support your organization’s growth and adapt to changing business needs.

5. Vendor Support and Community

Select a low-code platform with strong vendor support and an active community of developers and users who can provide guidance, resources, and best practices for building and deploying applications.

AI Technology Has Made Low Code Frameworks Possible

AI technology has disrupted the software development profession in many spectacular ways. These AI development practices are going to be even more important as the outlook for the software development industry becomes more uncertain. One of these changes has been the proliferation of low-code development.

Low-code frameworks have emerged as a powerful tool for organizations looking to accelerate their development processes and bring innovative solutions to market more quickly by taking advantage of the many benefits of AI technology. By reducing the amount of hand-coding required, low-code platforms enable developers and non-technical users to build and deploy applications rapidly, streamlining the development process and reducing time to market.

To get the most out of low-code frameworks, organizations should carefully consider the platform’s ease of use, customization options, integration capabilities, scalability, and vendor support that AI technology has made possible. By selecting the right platform and adopting a collaborative, agile approach to development, businesses can leverage the power of low-code to drive digital transformation and stay ahead of the competition.

Developers are going to need to consider the many benefits of leveraging AI to improve efficiency. Taking advantage of low-code frameworks is just one of them.

Source : SmartData Collective Read More

Build an image data classification model with BigQuery ML

Build an image data classification model with BigQuery ML

There are countless use cases for capturing, storing, and classifying unstructured image data — think social media analytics to find missing people, image analytics for tracking road traffic, or media analytics for e-commerce recommendations, to name a few. Most organizations are not able to be fully data driven because the majority of the data generated these days is highly unstructured and when it comes to large-scale analytics across data types and formats, there are some limiting factors for enterprise applications: 1) data storage and management, 2) infrastructure management and 3) availability of data science resources. With BigQuery’s new unstructured data analysis feature, you can now store, process, analyze, model, predict, with unstructured data, and combine it with structured data in queries. Best of all, you can do all of this in no-code SQL-only steps.

In this blog, we will discuss the use case of storing and analyzing images of yoga poses in BigQuery, and then implement a classification model with BigQuery ML to label the poses using only SQL constructs.

BigQuery and BQML

BigQuery is a serverless, multi-cloud data warehouse that can scale from bytes to petabytes with zero operational overhead. This makes it a great choice for storing ML training data. Besides, the built-in BigQuery Machine Learning (BQML) and analytics capabilities allow you to create no-code predictions using just SQL queries. And you can access data from external sources with federated queries, eliminating the need for complicated ETL pipelines. You can read more about everything BigQuery has to offer on the BigQuery page. 

So far, we know BigQuery as this fully managed cloud data warehouse that helps users analyze structured and semi-structured data. But,

BigQuery has expanded to perform all analytics and ML on unstructured data as well

We can use SQL queries to perform insightful analysis, analytics and ML on images, videos, audio etc. at scale without having to write additional code

We have the ability to combine structured and unstructured data as if they all existed together in a table

We will discuss these in our Yoga Pose Classification use case covered in the next section.

Image Data Classification with BigQuery ML

With first of its kind access to image data related “structured” queries, we can now predict results using machine learning classification models using BigQuery ML. I have narrowed down the stages involved into 5 steps for easy understanding.

1. Create dataset and a BigLake connection

For our use case of image detection of five Yoga poses, I have used a publicly available dataset and you can access the dataset from this repo. The Yoga poses we are identifying are limited to Downdog, Goddess, Plank, Tree and Warrior2.

Before you begin with the BigQuery Dataset creation, make sure to select or create a Google Cloud Project and check if billing is enabled on the project. Enable BigQuery API and BigQuery Connection API.

a. Using below steps, create the dataset “yoga_set”

b. BigLake Connection allows us to connect the external data source while retaining fine-grained BigQuery access control and security, which in our case is the Cloud Storage for the image data. We will use this connection to read objects from Cloud Storage. Follow steps below to create the BigLake Connection

Click ADD DATA on the Explorer pane of the BigQuery page:

Click Connections to external data sources and select BigLake and Remote functions option:

Provide Connection Name and create the connection. Remember to take a note of the Service Account id that is created in this process.

2. Create a Cloud Storage bucket and grant permissions

We are going to use a Cloud Storage bucket to contain the image files of Yoga poses that we want to create the model on. 

a. Go to Cloud Storage Buckets page and click CREATE

b. On the Create a bucket page, enter your bucket information and continue, making sure it is in the same region as the dataset and the connection discussed in above steps and create

c. Once the bucket is created, store your images (through console or Cloud Shell commands or programmatically) and grant the necessary permissions for the connection’s service account (that we saved earlier) to access the images

> export sa=”yourServiceAccountId@email.address”
> gsutil iam ch serviceAccount:$sa:objectViewer “gs://<<bucket>>”

3. Create an object table

Create an external object table from BigQuery to access the unstructured data in the bucket using the connection we created. Run the below CREATE SQL from BigQuery editor:

code_block[StructValue([(u’code’, u’CREATE OR REPLACE EXTERNAL TABLE `<<dataset>>.<<table_name>>` rnWITH CONNECTION `us.<<connection-name>>` rnOPTIONS(rnobject_metadata=”SIMPLE”, uris=[“gs://<<bucket>>/<<folder>>/*.jpg”]);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba1d7af3d0>)])]

External Table is created as shown below:

Let’s quickly query a pose from the newly created table:

code_block[StructValue([(u’code’, u”SELECT data , urirnFROM `yoga_set.yoga_poses` rnWHERE REGEXP_CONTAINS(uri, ‘gs://yoga_images/Downdog’)rnLimit 1;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba0fdc91d0>)])]

As you can see in the screenshot below, you can create and operate on unstructured images as if they are structured data:

Now let’s export the query result from above into a small Python snippet to visualize the result:

Now that we have created the external table and accessed images from Cloud Storage only using SQL queries, let us move on to the next section that is to create the Classification Model.

4. Create the model and upload it to Cloud Storage

For this implementation, we are going to use the pre-trained ResNet 50 Model to run inference on the object table we just created. The ResNet 50 model analyzes image files and outputs a batch of vectors representing the likelihood that an image belongs to the corresponding class (logits). 

Before moving on to this step, make sure you have all the necessary permissions in place. Then follow the below steps:

a. Download the model from this location and save it in your local
b. It should unpackage into saved_model.pb and a variables folder
c. Upload these two (the file and the folder) into the bucket we created in previous section

Once this step is completed, your model related files should be present in the same bucket as your images as seen in the image above.

5. Load the model into BQML and infer!

In this step, we are going to load the model into the same BigQuery Dataset as the external table we created earlier and apply it for the images we have stored in the Cloud Storage.

a. From BigQuery Editor, run the following SQL statement

code_block[StructValue([(u’code’, u”CREATE MODEL `<<Dataset>>.<<Model_Name>>`rnOPTIONS(rnmodel_type = ‘TENSORFLOW’,rnmodel_path = ‘gs://<<Bucket>>/*’);”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba1c490510>)])]

Once the execution is completed, you would see the model listed in your Dataset section in BigQuery.

b. Inspect the model to see its input and output fields. Expand the dataset and click on the model we just created “yoga_poses_resnet”. Click the Schema tab:

In the Labels section, you see the “activation_49” field that represents the output field. In the Features section, you can see “input_1” that represents the field that is expected to be input to the model. You will reference “input_1” in your inference query (or prediction query) as the field you are passing in for your “test” data.

c. Infer your Yoga Pose!

Let’s use the model we just created to classify our test image data. Make sure you have some test images (Yoga poses) identified from your Cloud Storage bucket that made it into the External Table when we created it. We are going to selectively query for those test images in BigQuery to perform the inference using the BQML model we just created. Use the below query to trigger the test.

code_block[StructValue([(u’code’, u”SELECT * rnFROM ML.PREDICT(rnMODEL yoga_set.yoga_poses_resnet,rn(SELECT uri, ML.DECODE_IMAGE(data) AS input_1rnFROM yoga_set.yoga_poses where REGEXP_CONTAINS(uri,rn’gs://yoga_images/Downdog/00000097.jpg’)));”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba1de9c050>)])]

In the above query, we select one test image that is identified to contain a specific URI value (00000097.jpg) in the external table. Also, the SELECT part uses the ML.DECODE_IMAGE construct as field “input_1” in order for the ML.PREDICT function to work.

Once execution is completed, you will see the result as shown below:

Now for those who know the ResNet model in depth, this should help understand the classification. But for those like me, let’s code a small snippet to understand the classification visually. 

d. Flattening the result

One way of visualizing the above output is to flatten the activation_49 field values using BigQuery SQL’s UNNEST construct. Please refer to the query below for flattening the result from the earlier step. If you want to further textually label the resulting class, you can introduce the logic in place of the placeholder <<LABEL_LOGIC>> in the query (uncomment when using).

code_block[StructValue([(u’code’, u’with predictions as (rnSELECTrnUri, data, SPLIT(uri, “/”)[OFFSET(ARRAY_LENGTH(SPLIT(uri, “/”)) – 1)] as img, rni as label_i,rn<<LABEL_LOGIC>> label,rnScorernFROM ML.PREDICT(rnMODEL yoga_set.yoga_poses_resnet, rn(SELECT data, uri, ML.DECODE_IMAGE(data) AS input_1 rnFROM yoga_set.yoga_poses rnWHERE rnREGEXP_CONTAINS(uri,’gs://yoga_images/Goddess/00000007.jpg’))), rnUNNEST(activation_49) as score WITH OFFSET i)rnSELECT * FROM predictions rnORDER BY score DESCrnLIMIT 5;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba0fbc2190>)])]

Without the class labeling logic, below is the output to the query:

You can read further about the model and apply the logic that works best with your data and the model output.

e. Visualizing the inference

Finally, a quick Python snippet to visualize the result from the classification! Export the above query result to a CSV file and reference it in the Python code.

The above image output refers to the Yoga Pose “Downward Dog” which is exactly the same test input we passed into the ML.PREDICT query for classification using BQML!

Unifying structured and unstructured with BigQuery

Lastly, my favorite part of this implementation is to unify the fields from my structured relational table with this unstructured image data. I created a structured BigQuery table in the same dataset as the external table to hold the pose and its health related data.

The image above represents the schema of the structured data table named “yoga_health” and the fields are pose, focus, health_benefit and breath. The query below joins both Structured and Unstructured data:

code_block[StructValue([(u’code’, u’SELECT SPLIT(uri, “/”)[OFFSET(ARRAY_LENGTH(SPLIT(uri, “/”)) – 2)] as pose,rna.health_benefit, breath, focus, data rnFROM `abis-345004.yoga_set.yoga_health` a, yoga_set.yoga_poses b rnWHERE a.pose = SPLIT(uri, “/”)[OFFSET(ARRAY_LENGTH(SPLIT(uri, “/”)) – 2)];’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eba0fcbebd0>)])]

Below is the result:

Note: All of the queries we have covered in this blog can be run directly from your Python Notebook using the BigQuery Magic commands.

Try it out

That’s it! We have successfully stored and queried unstructured data in BigQuery, created a Classification Model using BQML and predicted test yoga poses with the model. If you would like to implement this, get started with your Google Cloud project and follow the codelab. Also, if you would like to learn more about databases or other end to end application implementations in Google Cloud, please head to my blogs. For feedback and questions, you can stay in touch with me here.

Source : Data Analytics Read More