Archives February 2024

Standardize your cloud billing data with the new FOCUS BigQuery view

Standardize your cloud billing data with the new FOCUS BigQuery view

Businesses today often rely on multiple cloud providers, making it crucial to have a unified view of their cloud spend. This is where the FinOps Open Cost and Usage Specification (FOCUS) comes in. And today, we’re excited to announce a new BigQuery view that leverages the recent FOCUS (Preview) to help simplify cloud cost management across clouds.

What is FOCUS?

The FinOps Cost and Usage Specification aims to deliver consistency and standardization across cloud billing data, by unifying cloud and usage data into one common data schema. Before FOCUS, there was no industry-standard way to normalize key cloud cost and usage measures across multiple cloud service providers (CSPs), making it challenging to understand how billing costs, credits, usage, and metrics map from one cloud provider to another (see FinOps FAQs for more details).

FOCUS helps FinOps practitioners perform fundamental FinOps capabilities using a generic set of instructions and unified schema, regardless of the origin of the dataset. FOCUS is a living, breathing specification that is constantly being iterated on and improved by the Working Group, which consists of FinOps practitioners, CSP leaders, Software as a Service (SaaS) providers, and more. The FOCUS specification v1.0 Preview was launched in November 2023, paving the way for more efficient and transparent cloud cost management. If you’d like to read more or join the Working Group, here is a link to the FOCUS website.

Introducing a BigQuery view for FOCUS v1.0 Preview

Historically, we’ve offered three ways to export cost and usage-related Cloud Billing data to BigQuery: Standard Billing Export, Detailed Billing Export (resource-level data and price fields to join with Price Export table), and Price Export. Today, we are introducing a new BigQuery view that transforms this data so that it aligns with the data attributes and metrics defined in the FOCUS v1.0 Preview.

A BigQuery view is a virtual table that represents the results of a SQL query. The BigQuery view can be formed off of a base query (see below on how to get access) that maps Google Cloud data into the display names, format, and behavior of the FOCUS Preview dimensions and metrics. BigQuery views are great because the queryable virtual table only contains data from the tables and fields specified in the base query that defines the view. BigQuery views are virtual tables, so incur no additional charges for data storage if you are already using Billing Export to BigQuery.

You should spend time optimizing costs, not mapping billing terminology across Cloud Providers. With the FOCUS BigQuery view, you can now…

View and query Google Cloud billing data that is adapted towards the FOCUS specificationUse the BigQuery view as a data source for a visualization tools like Looker StudioAnalyze your Google Cloud costs alongside data from other providers using the common FOCUS format

How it works

The FOCUS BigQuery view acts as a virtual table that sits on top of your existing Cloud Billing data. To use this feature, you will need Detailed Billing Export and Price Exports enabled. Follow these instructions to set up your billing exports to BigQuery. The FOCUS BigQuery view uses a base SQL query to map your Cloud Billing data into the FOCUS schema, presenting it in the specified format. This allows you to query and analyze your data as if it were native to FOCUS, making it easier to compare costs across different cloud providers.

We’ve made it easy to leverage the power of FOCUS with a step-by-step guide. To view this sample SQL query and follow the step-by-step guide, sign up here.

Looking ahead: A commitment to open standards and collaboration

At Google Cloud, open standards are part of our DNA. We were a founding member of the FinOps Foundation, the first CSP to join the Open Billing Standards Working group, and a core contributor to the v0.5 and v1.0 specifications. As a strong advocate for open billing standards, we believe customers deserve a glimpse of what’s possible with Google Cloud Billing data considering the latest FOCUS specification.

We look forward to shaping the standards of open billing standards alongside our customers, FinOps practitioners in the industry, the FinOps Foundation, CSPs, SaaS providers, and more. Get a unified view of your cloud costs today with the FOCUS BigQuery view. Sign up here to learn more and get started.

Related Article

When they go closed, we go open – Google Cloud and open billing data

Google Cloud partnered with the FinOps Foundation on FOCUS, a Linux Foundation project, to establish an open specification for cloud bill…

Read Article

Source : Data Analytics Read More

Serverless data architecture for trade surveillance at Deutsche Bank

Serverless data architecture for trade surveillance at Deutsche Bank

Ensuring compliance with regulatory requirements is crucial for every bank’s business. While financial regulation is a broad area, detecting and preventing market manipulation and abuse is absolutely mission-critical for an investment bank of Deutsche Bank’s size. This is called trade surveillance.

At Deutsche Bank, the Compliance Technology division is responsible for the technical implementation of this control function. To do this, the Compliance Technology team retrieve data from various operational systems in the front office and performs scenario calculations to monitor the trades executed by all of the bank’s business lines. If any suspicious patterns are detected, a compliance officer receives an internal alert to investigate the issue for resolution.

The input data comes from a broad range of systems, but the most relevant are market, trade, and reference data. Historically, provisioning data for compliance technology applications from front-office systems required the team to copy data between, and often even within, many different analytical systems, leading to data quality and lineage issues as well as increased architectural complexity. At the same time, executing trade surveillance scenarios includes processing large volumes of data, which requires a solution that can store and process all the data using distributed compute frameworks like Apache Spark.

A new architectural approach

Google Cloud can help solve the complex issues of processing and sharing data at scale across a large organization with its comprehensive data analytics ecosystem of products and services. BigQuery, Google Cloud’s serverless data warehouse, and Dataproc, a managed service for running Apache Spark workloads, are well positioned to support data-heavy business use cases, such as trade surveillance.

The Compliance Technology team decided to leverage these managed services from Google Cloud in their new architecture for trade surveillance. In the new architecture, the operational front-office systems act as publishers that present their data in BigQuery tables. This includes trade, market and reference data that is now available in BigQuery to various data consumers, including the Trade Surveillance application. As the Compliance Technology team doesn’t need all the data that is published from the front-office systems, they can create multiple views derived from only the input data that includes the required information needed to execute trade surveillance scenarios.

Scenario execution involves running trade surveillance business logic in the form of various different data transformations in BigQuery, Spark in Dataproc, and other applications. This business logic is where suspicious trading patterns, indicating market abuse or market manipulation, can be detected. Suspicious cases are written to output BigQuery tables and then processed through research and investigation workflows, where compliance officers perform investigations, detect potential false positives, or file a Suspicious Activity Report to the regulator if the suspicious case indicates a compliance violation.

Surveillance alerts are also retained and persistently stored to calculate how effective the detection is and improve the rate of how many false positives are actually detected. These calculations are run in Dataproc using Spark and in BigQuery using SQL. They are performed periodically and fed back into the trade surveillance scenario execution to further improve the surveillance mechanisms. Orchestrating the execution of ETL processes to derive data for executing trade surveillance scenarios and effectiveness calibrations is done through Cloud Composer, a managed service for workflow orchestration using Apache Airflow.

Here is a simplified view of what the new architecture looks like:

This is how the Compliance Technology team at Deutsche Bank describes the new architecture: 

“This new architecture approach gives us agility and elasticity to roll out new changes and behaviors much faster based on market trends and new emerging risks as e.g. cross product market manipulation is a hot topic our industry is trying to address in line with regulator’s expectations.”
– Asis Mohanty, Global Head, Trade Surveillance, Unauthorized Principal Trading Activity Technology, Deutsche Bank AG

“The serverless BigQuery based architecture enabled Compliance Technology to simplify the sharing of data between the front- and back-office whilst having a zero-data copy approach and aligning with the strategic data architecture.” 
– Puspendra Kumar, Domain Architect, Compliance Technology, Deutsche Bank AG

The benefits of a serverless data architecture

As the architecture shows above, trade surveillance requires various input sources of data. A major benefit of leveraging BigQuery for sourcing this data is that there is no need to copy data to make it available for usage by data consumers in Deutsche Bank. A more simplified architecture improves data quality and lowers cost by minimizing the amount of hops the data needs to take.

The main reason for not having to copy data is due to the fact that BigQuery does not have separate instances or clusters. Instead, every table is accessible by a data consumer as long as the consumer app has the right permissions and references the table URI in its queries (i.e., the Google Cloud project-id, the dataset name, and the table name). Thus, various consumers can access the data directly from their own Google Cloud projects without having to copy it and physically persist it there. 

For the Compliance Technology team to get the required input data to execute trade surveillance scenarios, they simply need to query the BigQuery views with the input data and the tables containing the derived data from the compliance-specific ETLs. This eliminates the need for copying the data, ensuring the data is more reliable and the architecture is more resilient due to fewer data hops. Above all, this zero-copy approach does enable data consumers in other teams in the bank besides trade surveillance to use market, trade and reference data by following the same pattern in BigQuery. 

In addition, BigQuery offers another advantage. It is closely integrated with other Google Cloud services, such as Dataproc and Cloud Composer, so orchestrating ETLs is seamless, leveraging Apache Airflow’s out-of-the-box operators for BigQuery. There is also no need to perform any copying of data to process data from BigQuery using Spark. Instead, an out-of-the-box connector allows data to be read via the BigQuery Storage API, which is optimized for streaming large volumes of data directly to Dataproc workers in parallel ensuring fast processing speed. 

Finally, storing data in BigQuery enables data producers to leverage Google Cloud’s native, out-of-the-box tooling for ensuring data quality, such as Dataplex automatic data quality. With this service, it’s possible to configure rules for data freshness, accuracy, uniqueness, completeness, timeliness, and various other dimensions and then simply execute them against the data stored in BigQuery. This happens fully serverless and automated without the need to provision any infrastructure for the rules execution and data quality enforcement. As a result, the Compliance Technology team can ensure that the data they receive from front-office systems complies with the required data quality standards, thus adding to the value of the new architecture. 

Given the fact that the new architecture leverages integrated and serverless data analytics products and managed services from Google Cloud, the Compliance Technology team can now fully focus on the business logic of their Trade Surveillance application. BigQuery stands out here because it doesn’t require any maintenance windows, version upgrades, upfront sizing or hardware replacements, as opposed to running a large-scale, on-premises Hadoop cluster. 

This brings us to the final advantage, namely the cost-effectiveness of the new architecture. In addition to allowing team members to now focus on business-relevant features instead of dealing with infrastructure, the architecture makes use of services which are charged based on a pay-as-you-go model. Instead of running the underlying machines in 24/7 mode, compute power is only brought up when needed to perform compliance-specific ETLs, execute the trade surveillance scenarios, or perform effectiveness calibration, which are all batch processes. This again helps further reduce the cost compared to an always-on, on-prem solution. 

Here’s the view from Deutsche Bank’s Compliance Technology team about the associated benefits: 

“Our estimations show that we can potentially save up to 30% in IT Infrastructure cost and achieve better risk coverage and Time to Market when it comes to rolling out additional risk and behaviors with this new serverless architecture using BigQuery.” 
Sanjay-Kumar Tripathi, Managing Director, Global Head of Communication Surveillance Technology & Compliance Cloud Transformation Lead, Deutsche Bank AG

Source : Data Analytics Read More

Manual Testing Tools Can Outperform AI with Quality Assurance

Manual Testing Tools Can Outperform AI with Quality Assurance

Did you know that software developers are expected to spend over $169 billion on generative AI by 2032? This underscores the benefits AI offers to the software development profession. AI is transforming software development by automating repetitive tasks, accelerating coding processes, and enhancing decision-making capabilities. With AI-powered tools for code generation and debugging, developers can […]

Source : SmartData Collective Read More

Data Recovery Services Are Crucial in the Big Data Era

Data Recovery Services Are Crucial in the Big Data Era

In a world increasingly reliant on data analytics for decision-making and strategic planning, the importance of data recovery cannot be overstated. Data loss or corruption can have severe consequences, disrupting business operations, compromising valuable insights, and potentially leading to financial losses or reputational damage. Given the complexity and volume of data involved in analytics processes, […]

Source : SmartData Collective Read More

How to Optimize Facebook Ad Campaigns for Cheaper Leads

How to Optimize Facebook Ad Campaigns for Cheaper Leads

Analytics has become very valuable in the marketing sector. Mordor Intelligence reports companies will spend over $6.31 billion on marketing analytics this year. Data analytics has become very helpful for the digital marketing sector. One of the many ways that data analytics has shaped marketing is with Facebook advertising. In an era where digital footprints […]

Source : SmartData Collective Read More

Looker Hackathon 2023 results: Best hacks and more

Looker Hackathon 2023 results: Best hacks and more

In December, the Looker team invited our developer and data community to collaborate, learn, and inspire each other at our annual Looker Hackathon. More than 400 participants from 93 countries joined together, hacked away for 48 hours and created 52 applications, tools, and data experiences. The hacks use Looker and Looker Studio’s developer features, data modeling, visualizations and other Google Cloud services like BigQuery and Cloud Functions.

For the first time in Looker Hackathon history, we had two hacks tie for the award of the Best Hack. See the winners below and learn about the other finalists from the event. In every possible case, we have included links to code repositories or examples to enable you to reproduce these hacks.

Best Hack winners

DashNotes: Persistent dashboard annotations

By Ryan J, Bartosz G, Tristan F

Have you ever wanted to take note of a juicy data point you found after cycling through multiple filterings of your data? You could write your notes in an external notes application, but then you might lose the dashboard and filter context important to your discovery. This Best Hack allows you to take notes right from within your Looker dashboard. Using the Looker Custom Visualization API, it creates a dashboard tile for you to create and edit text notes. Each note records the context around its creation, including the original dashboard and filter context. The hack stores the notes in BigQuery to persist the notes across sessions. Check out the GitHub repository for more details.

Document repository sync automation

By Mehul S, Moksh Akash M, Rutuja G, Akash

Does your organization struggle to maintain documentation on an increasing number of ever-changing dashboards? This Best Hack helps your organization automatically generate current detailed documentation on all your dashboards, for simplified administration. The automation uses the Looker SDK, the Looker API, and serverless Cloud Functions to parse your LookML for useful metadata, and stores it in BigQuery. Then the hack uses LookML to model and display the metadata inside a Looker dashboard. Checkout the GitHub repository for the backend service and the GitHub repository for the LookML for more details.

Nearly Best Hack winner

Querying Python services from a Looker dashboard

By Jacob B, Illya M

If your Looker dashboard had the power to query any external service, what would you build? This Nearly Best Hack explores how your Looker Dashboard can communicate with external Python services. It sets up a Python service to mimic a SQL server and serves it as a Looker database connection for your Looker dashboard to query. Then, clever LookML hacks enable your dashboard buttons to send data to the external Python service, creating a more interactive dashboard. This sets up a wide array of possibilities to enhance your Looker data experience. For example, with this hack, you can deploy a trained ML model from Google Cloud’s Vertex AI in your external service to deliver keen insights about your data. Check out the GitHub repository for more details.


What do I watch?

By Hamsa N, Shilpa D

We’ve all had an evening when we didn’t know what movie to watch. You can now tap into a Looker dashboard that recommends ten movies you might like based on your most liked movie from IMDB’s top 1000 movies. The hack analyzes a combination of genre, director, stars, and movie descriptions, using natural language processing techniques. The resulting processed data resides in BigQuery, with LookML modeling the data. Check out the GitHub repository for more details.

Template analytics

By Ehsan S

If you need to determine which customer segment will be most effective to market to, check out this hack, which performs Recency, Frequency, Monetary (RFM) analysis on data from a Google Sheet to help you segment customers based on their last transaction recency, how often they’ve purchased, and how much they’ve spent over time. You provide the custom Looker Studio Community Connector, along with a Google Sheet, and the connector performs RFM analysis on your Google Sheet’s data. The hack’s Looker Studio report visualizes the results to give an overview of your customer segments and behavior. Check out the Google Apps Script code for more details.

LOV filter app

By Markus B

This hack implements a List of Values (LOV) filter that enables you to have the values of one dimension filter a second dimension. For example, take two related dimensions: “id” and “name”. The “name” dimension may change, while the “id” dimension always stays constant.

This hack uses Looker’s Extension Framework and Looker Components to show “name” values in the LOV filter that translate to “id” values in an embedded dashboard’s filter. This helps your stakeholders filter on values they’re familiar with and keeps your data model flexible and robust. Check out the GitLab repository for more details.

Looker accelerator

By Dmitri S, Joy S, Oleksandr K

This collection of open-source LookML dashboard templates provides insight into Looker project performance and usage. The dashboards use Looker’s System Activity data and are a great example of using LookML to create reusable dashboards. In addition, you can conveniently install the Looker Block of seven dashboards through the Looker Marketplace (pending approval) to help your Looker developer or admin to optimize your Looker usage. Check out the GitHub repository for more details.

The SuperViz Earth Explorer

By Ralph S

With this hack, you can visually explore the population and locations of cities across the world on an interactive 3D globe, and can filter the size of the cities in real time as the globe spins. This custom visualization uses the Looker Studio Community Visualization framework with the clever combination of three.js, a 3D Javascript library, and clever graphics hacks to create a visual experience. Check out the GitHub repository for more details.

dbt exposure generator

By Dana H.

Are you using dbt models with Looker? This hack automatically generates dbt exposures to help you debug and identify how your dbt models are used by Looker dashboards. This hack serves as a great example of how our Looker SDK and Looker API can help solve a common pain point for developers. Check out the GitHub repository for more details.

Hacking Looker for fun and community

At Looker Hackathon 2023, our developer community once again gave us a look into how talented, creative, and collaborative they are. We saw how our developer features like Looker Studio Community Visualizations, LookML, and Looker API, in combination with Google Cloud services like Cloud Functions and BigQuery, enable our developer community to build powerful, useful — and sometimes entertaining — tools and data experiences.

We hope these hackathon projects inspire you to build something fun, innovative, or useful for you. Tap into our linked documentation and code in this post to get started, and we will see you at the next hackathon!

Source : Data Analytics Read More