Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

Improve query performance and optimize costs in BigQuery using the anti-pattern recognition tool

BigQuery is a serverless and cost-effective enterprise data warehouse that works across cloud environments and scales with your data. As with any large scale data-intensive platform, following best practices and avoiding inefficient anti-patterns goes a long way in terms of performance and cost savings. 

Usually SQL optimization requires a significant time investment from engineers, who must read high-complexity queries, devise a variety of approaches to improve performance and efficiency, and test several optimization techniques. The best place to start is to fix anti-patterns, since this only requires easily applicable changes and provides significant performance improvements.

To facilitate the task of identifying and fixing said anti-pattern, Google Professional Services Organization (PSO) and Global Services Delivery (GSD) have developed a BigQuery anti-pattern recognition tool. This tool automates the process of scanning SQL queries, identifying antipatterns, and providing optimization recommendations.

What is the BigQuery anti-pattern recognition tool?

The BigQuery anti-pattern recognition tool let you easily identify performance impacting anti-patterns across a large number of SQL queries in a single go. 

It utilizes ZetaSQL to parse BigQuery SQL queries into abstract syntax trees (AST) and then traverses the tree nodes to detect the presence of anti-patterns. 

The tool takes a BigQuery SQL query as an input, such as:

code_block[StructValue([(u’code’, u’SELECTrn t.dim1,rn t.metric1rnFROMrn `dataset.table` trnWHERErn t.id NOT IN (rn SELECTrn idrn FROMrn `dataset.table2`)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f389190>)])]

And produces the output as:

code_block[StructValue([(u’code’, u’Subquery in filter without aggregation at line 8.’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6f50>)])]

It examines potential optimizations, including:

Selecting only the necessary columns

Handling multiple WITH-clause references 

Addressing subqueries in filters with aggregations

Optimizing ORDER BY queries with LIMIT

Enhancing string comparisons

Improving JOIN patterns

Avoiding subquery aggregation in the WHERE clause

The solution supports reading from various sources, such as:

Command line 

Local files

Cloud Storage files

Local folders

Cloud Storage folders

CSV (with one query per line)

INFORMATION_SCHEMA

Additionally, the solution provides flexibility in writing output to different destinations, including:

Printing to the terminal

Exporting as CSV

Writing to a BigQuery table

Using the BigQuery anti-pattern recognition tool

The BigQuery anti-pattern recognition tool is hosted on GitHub. Below are the Quick Start steps on using the tool via command line for inline queries. You can also leverage Cloud Run to deploy it as a container on cloud. 

Prerequisites

Linux OS

JDK 11 or above is installed

Maven

Docker

gcloud CLI

Quick start: – steps

1. Clone the repo into your local machine.

code_block[StructValue([(u’code’, u’git clone git@github.com:GoogleCloudPlatform/bigquery-antipattern-recognition.git’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df6cd0>)])]

2. Build the tool image inside the `bigquery-antipattern-recognition` folder.

code_block[StructValue([(u’code’, u’mvn clean package jib:dockerBuild -DskipTests’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b58df63d0>)])]

3. Run the tool for a simple inline query.

code_block[StructValue([(u’code’, u’docker run \rn -i bigquery-antipattern-recognition \rn –query “SELECT * FROM \`project.dataset.table1\`”‘), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e0b4f33ad50>)])]

4. Below is the output result in the command-line interface:

Additionally, the above tool can read queries from Information Schema and load the output recommendations to a BigQuery table.

Below is an example of the BigQuery anti-pattern recognition tool results exported to a BigQuery table.

Getting started

Ready to start optimizing your BigQuery queries and cutting costs? Check out the tool here and contribute to the tool via GitHub.

Have questions or feedback?

We’re actively working on new features to make the tool as useful to our customers. Use it and tell us what you think! For product feedback/technical questions, reach out to us at bq-antipattern-eng@google.com. If you’re already a BigQuery customer and would like a briefing on the tool, please reach out, we’d be happy to talk.

Source : Data Analytics Read More