Data Analytics Archives - Page 24 of 66

May 8, 2023 By Data Analytics

5 steps to automating and streamlining your regulatory reporting

Regulatory reporting requirements continue to become more complex. In response, financial services institutions continue to devote more resources to ensure compliance year after year.

This added complexity combined with technological challenges make it hard to produce reports that meet evolving regulatory needs.

As a result, financial institutions have struggled to produce reports without adding significant cost and complexity.

Google Cloud’s Regulatory Reporting Platform can help financial institutions automate and streamline the regulatory reporting process.

Take the next step

Explore how cloud-native architecture can enable a more flexible, streamlined approach to regulatory reporting and provide the insights you need.

Your new regulatory reporting mantra: simple, fast, and repeatable

Following the 2008 global financial crisis, financial services institutions have invested heavily to keep pace with the changing regulatory landscape.

Even well-resourced teams have been stretched thin trying to keep up with the evolving regulatory reporting requirements because of their dependencies on legacy technology and highly manual processes. At the same time, the velocity of regulatory changes for traditional areas like credit, liquidity, and capital continues to increase while the scope of regulation is also expanding to include newer risk types like climate risk and operational resiliency. The challenge of navigating the additional scope and complexity is compounded by increasing regulatory expectations for more granularity and consistency across all reporting.

Despite the substantial investments made by the industry over the years to address these challenges, current approaches to regulatory reporting are often still slow, expensive, and rife with data quality issues. This begs the question; Hhw can chief financial officers (CFOs) and chief risk officers (CROs) ensure they have the data architecture and technology they need to flexibly meet disparate financial, risk, and regulatory reporting requirements?

Growing reporting demands

“Regulators across the world have intensified their efforts to avoid a repeat of the 2008 financial crisis by generating a dizzying set of regulatory reporting obligations, which are still growing in volume, variety and velocity,” said Artur Kaluza, Head of Transformation for Risk Measures and Metrics at ANZ.

In the first 10 years following the global financial crisis of 2008, a worldwide push ensued to better supervise financial institutions. The result was a multitude of new financial regulatory reforms, data reporting requirements, global standards, and other rules which touched virtually every financial firm, from banks and insurers to asset managers, mortgage lenders, and more. This trend continues today, with regulatory expectations and complexity continuing to increase every year.

In addition to meeting reporting requirements, today’s regulators expect improved data accuracy and governance as digitalization, cybersecurity, data sovereignty, and environment, social, and governance (ESG) megatrends continue to shape the regulatory environment.

aside_block[StructValue([(u’title’, u’Insight’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e231d7d0810>), (u’btn_text’, u”), (u’href’, u”), (u’image’, None)])]

“As the volume of data we need to process continues to increase, the regulators become more demanding and customers expect faster services, it became clear that it wasn’t enough for us to speed up processes and carry on as usual. We instead needed to transform the way we do things,” explains Chris Conway, Head of Risk and Finance Technology, NatWest Markets.

The rise in demand for financial and risk data has, in fact, been significantly influenced by increased regulation. Every major financial regulation has reporting requirements that are becoming more data-intensive, forcing financial institutions to manage, clean, and analyze large amounts of information to reduce risk, run stress tests, and perform analytics.

Data consistency and quality issues

Despite financial institutions investing heavily in technology to improve reporting accuracy and efficiency, they still face constant data quality challenges. Data duplication and inconsistency across risk, finance, and regulatory reporting functions call for more reliability across the data supply chain to adhere to common definitions while re-using data to support each use case.

Today, and historically, financial services organizations rely on manual processes and controls to meet complex regulatory requirements. Because the underlying infrastructure is organized by product and business line, sanitizing and conforming data into horizontal views is more difficult. And, existing regulatory reporting processes and systems rely on single-purpose vendor ecosystems that increase infrastructure demands and require specialized talent.

As a result, firms are investing significant time, resources, and money to manage the complex web of tools and legacy technology stacks amid slow, manual data supply chain processes.

Transforming the data supply chain

Google Cloud has worked with financial services organizations to help pioneer a new Regulatory Reporting Platform to address these challenges.

ANZ began to reimagine an end-to-end financial risk and regulatory reporting process last year. Using Google Cloud, ANZ created a single unified data platform and architecture that helps deliver data quicker, cheaper, and in a more automated fashion.

“Google Cloud enables granular data processing, eliminating downstream disaggregation and adjustment processes. Ultimately, this led to a more efficient technology and operating model where employees’ focus shifted to focus on higher-value activities. By using Google Cloud’s technology stack and architecture pattern, ANZ has improved performance, elevated operational efficiency, and reduced costs. The outcome of the first phase of the project led to a 50% effort reduction in the overall reporting process, made the data readily available on business day one, and fully automated the data quality (DQ) monitoring, thereby shifting effort from DQ identification to resolution,” Kaluza said.

NatWest Markets migrated to Google Cloud to achieve flexible scalability, power predictive risk modeling with analytics capabilities, and streamline regulatory compliance. Knowing the importance of supporting its customers by collecting signals from a diverse set of data points and interpreting them to enable timely business decisions, it moved data processing workloads to BigQuery to turn data into insights quickly and cost-effectively – helping achieve a 60% faster compute time for overnight batch processing.

“Google Cloud is the ideal solution for us because it provides on-demand scalability, analytics capabilities that broaden the possibilities of what we can do for our customers, and automated services that free up our team from managing infrastructure to focus on our customers instead,” said Conway.

Granularity at massive scale

The increasing infrastructure demands of risk, financial, and regulatory reporting require massive, on-demand scalability with built-in quality controls, the ability to easily reconcile differences, and produce clear documentation and lineage.

Google Cloud’s Regulatory Reporting Platform provides financial institutions with four key pillars for delivering efficiency, automation, speed, and reusability to meet today’s reporting demands:

Identify and ingest needed data and create reporting rules as code, instead of writing separate logic and report documentation.

Transform, adjust, and configure data on-platform with automated data management controls, rather than using manual tools.

Separate storage and compute to run reporting jobs in minutes rather than days – and on demand.

Re-use data calculations and source data in shared libraries to drive consistency and support additional use cases across finance, risk, and regulatory reporting.

Illustration. Google Cloud has worked with its customers to help pioneer a new Regulatory Reporting Platform to address industry challenges.

With built-in data management tooling, open-source architecture to eliminate single-vendor risk, and a rich, seamless access space to analyze and transform data, you can leverage the power of Google Cloud to modernize and transform regulatory reporting and always have the insights you need. Learn more about Google Cloud’s Regulatory Reporting Platform.

aside_block[StructValue([(u’title’, u’White Paper’), (u’body’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e231fbe3b50>), (u’btn_text’, u’Download now’), (u’href’, u’https://cloud.google.com/resources/reimagining-regulatory-reporting-in-financial-services-whitepaper’), (u’image’, None)])]

Source : Data Analytics Read More

May 4, 2023 By Data Analytics

Jumpstart Your BigQuery Remote Function Development Today

Last year BigQuery introduced Remote Functions, a feature that allows users to extend BigQuery SQL with their own custom code, written and hosted in Cloud Functions or Cloud Run. With Remote Functions, custom SQL functions can be written in languages like Node.js, Python, Go, Java, NET, Ruby, or PHP, enabling a personalized BigQuery experience for each organization, while leveraging its standard management and permission models.

We’ve seen an amazing number of use cases enabled by Remote Functions. Inspired by our customers’ success, we decided to document the art of the possible on this blog, providing a few examples, sample code and video instructions to jumpstart your Remote Function development.

Dynamic Language Translation with SQL

Imagine multinational organizations storing, for example, customer’s feedback in various languages inside a common BigQuery table. Translation API could be used to translate all content into a common language and make it easier to act on the data.

For this specific example, we’ve created an end to end tutorial for extending BigQuery with the Cloud Translation API. You can get all the instructions at https://cloud.google.com/bigquery/docs/remote-functions-translation-tutorial.

BigQuery Unstructured Data Analysis

Analyzing unstructured data can be a daunting task. The combination of a Remote Function and Cloud Vision API can help organizations derive insights from images and videos stored in Google Cloud via SQL, without leaving the BigQuery prompt.

Imagine if organizations could assign labels to images and quickly classify them into millions of predefined categories or detect objects, read printed and handwritten text, and build valuable metadata into your image catalog stored in BigQuery. And all of this processing via BigQuery SQL. This is what this example is all about.

We’ve created an end to end, easy to follow tutorial for this use case as well. You can get all instructions at https://cloud.google.com/bigquery/docs/remote-function-tutorial.

Natural Language Insights with SQL

The Natural Language Processing API lets you derive insights from unstructured data with machine learning. With remote functions, this text processing can be combined with BigQuery SQL.

This example focuses on the ability of delivering insights from unstructured text stored in BigQuery tables using Google Machine learning and SQL. A simple use case could be an application gathering social media comments and storing them in BigQuery while performing sentiment analysis on each comment via SQL.

The sample code (main.py and requirements.txt) can be found in this repo.

Once the Python code is deployed as a Cloud Function, you can create the BigQuery Remote Function using the syntax below:

code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project>.<dataset>.call_nlp` (x STRING) RETURNS STRING rnREMOTE WITH CONNECTION `<your-connection-id>` rnOPTIONS (endpoint = ‘<your-endpoint>’, user_defined_context = [(“mode”,”call_nlp”)])rn;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8c63eef950>)])]

For more information on how to create a remote function, please see this documentation.

A screenshot of the working function can be seen below.

Security and Compliance

Protection of sensitive data, like personally identifiable information (PII), is critical to every business.

With Remote functions, a SQL call can be made to integrate functionality provided by the Cloud Data Loss Prevention API, without the need to export data out of BigQuery. Since the remote function calls are done in-line with SQL, even DML statements can be performed on the fly using the outcome of the function as an input value for the data manipulation.

This example focuses on the ability to perform deterministic encryption and decryption of data stored in BigQuery tables using Remote Functions along with DLP.

The sample code (main.py and requirements.txt) can be found here. Please notice:

References to <change-me> on main.py will need to be adjusted according to your GCP environment

The code is inspecting data for the following info_types: PHONE_NUMBER, EMAIL_ADDRESS and IP_ADDRESS. Feel free to adjust as needed

Cloud Key Management Service (KMS) and Data Loss Prevention APIs will need to be enabled on the GCP project

A DLP Keyring and Key will be required. For directions, click here

The key will need to be wrapped (instructions)

DLP User role will need to be assigned to the service account executing the Cloud Function (by default the compute engine service account)

Once the Python code is deployed as a Cloud Function, you can create BigQuery Remote Functions using the syntax below:

code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project>.<dataset>.cloudrun_dlp_encrypt` (x STRING) RETURNS STRING rnREMOTE WITH CONNECTION `<your-connection-id>` rnOPTIONS (endpoint = ‘https://<your-endpoint>.a.run.app’, user_defined_context = [(“mode”,”encrypt”)])rn;rnrnCREATE OR REPLACE FUNCTION `<project>.<dataset>.cloudrun_dlp_decrypt` (x STRING) RETURNS STRING rnREMOTE WITH CONNECTION `<your-connection-id>` rnOPTIONS (endpoint = ‘https://<your-endpoint>.a.run.app’, user_defined_context = [(“mode”,”decrypt”)])rn;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8c514d7f50>)])]

For more information on how to create a remote function, take a look at the documentation.

The deterministic encryption/decryption functions we are using on phone numbers are. The picture below demonstrates a phone number being encrypted by the function:

Since deterministic encryption and decryption techniques are being used, the picture below demonstrates the phone number can be decrypted back to its original value by calling the dlp_decrypt function with the hashed value created by the dlp_encrypt function.

Below is an example of a BigQuery table creation, selecting data from an existing table while encrypting the values of any phone_number, email address or email values found inside the call_details column:

Check the full demo video here.

ELT and Data Catalog Updates

Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source server to a target server such as BigQuery and then preparing the information for downstream uses. With ELT, the raw data is loaded into the data warehouse or data lake and transformations occur on the stored data.

When working with BigQuery, it’s common to see transformations being done with SQL and called via stored procedures. In this scenario, the transformation logic is self-contained, running inside BigQuery. But what if you need to keep external systems like Google Data Catalog updated while running the SQL transformation jobs?

This is what this example is all about. It demonstrates the ability to update Data Catalog, in-line with BigQuery stored Procedures using the catalog’s APIs and Remote Functions.

The sample code (main.py and requirements.txt) can be found here. Please notice:

References to <your-tag-template-id> and <your-project-id> on main.py will need to be adjusted according to your GCP environment

Data Catalog Admin role (or similar) will need to be assigned to the service account executing the Cloud Function (by default the compute engine service account) as a tag template values will be updated

A tag template with the structure below exists

Once the Python code is deployed as a Cloud Function, you can create A BigQuery Remote Function using the syntax below:

code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project>.<dataset>.catalog_upsert` (msg STRING, dataset STRING, table_name STRING, total_rows FLOAT64, changed_rows FLOAT64) RETURNS STRING rnREMOTE WITH CONNECTION `<remote-connection-id>` rnOPTIONS rn(endpoint = ‘https://<change-me>.cloudfunctions.net/catalog_handler’, rn user_defined_context = [(“mode”,”upsert”)]rn)rn;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8c63e18a50>)])]

See a remote function being used to update Data Catalog. The picture below demonstrates how to call the function, passing five parameters to it.

Below you can see how the tag template “BQ Remote Functions Demo Tag Template” gets updated after the function execution.

You can now use this function inside a BigQuery stored procedure performing a full ELT job. In the example below, remote_udf.test_tag table is being updated by the stored procedure and the number of updated rows + total number of rows in table remote_udf.test_tag are being stored in Data Catalog:

Check the full demo video here.

Event-Driven Architectures with Pub/Sub Updates from SQL

Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It’s equally effective as a messaging-oriented middleware for service integration or as a queue to parallelize tasks.

What if you need to trigger an event by posting a message into a Pub/Sub topic via BigQuery SQL? Here is an example:

The sample code (main.py and requirements.txt) can be found here. Please notice:

References to <change-me> on main.py will need to be adjusted according to your GCP environment to reflect your project_id and topic_id

The service account executing the Cloud Function (by default the compute engine service account) will need to have permissions to post a message into a Pub/Sub topic

Once the Python code is deployed as a Cloud Function, you can create A BigQuery Remote Function using the syntax below:

code_block[StructValue([(u’code’, u”CREATE OR REPLACE FUNCTION `<project>.<dataset>.publish_message` (x STRING) RETURNS STRING rnREMOTE WITH CONNECTION `<your-connection>` rnOPTIONS (endpoint = ‘https://<your-endpoint>.cloudfunctions.net/pubsub_message’)rn;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8c6167e450>)])]

A few screenshots of a remote function being used to post a message into a Pub/Sub topic can be found below:

Data Science with Vertex AI Models called from SQL

Vertex AI brings together the Google Cloud services for building ML under one, unified UI and API

What if you need to call online predictions from Vertex AI models via BigQuery SQL? Here is an example:

The sample code (main.py and requirements.txt) can be found here. Please notice:

References to <change-me> on main.py will need to be adjusted according to your GCP environment to reflect your project_id, location and model_endpoint

The service account executing the Cloud Function (by default the compute engine service account) will need to have permissions to execute Vertex AI models. Role “AI Platform Developer” should be enough

Once the Python code is deployed as a Cloud Function, you can create A BigQuery Remote Function using the syntax below:

code_block[StructValue([(u’code’, u’CREATE OR REPLACE FUNCTION `<project>.<dataset>.predict_penguin_weight` (species STRING, island STRING, sex STRING, culmen_length_mm FLOAT64, culmen_depth_mm FLOAT64, flipper_length_mm FLOAT64) RETURNS STRING rnREMOTE WITH CONNECTION `<your-connection-name>` rnOPTIONS rn(endpoint = ‘https://<your-location>.cloudfunctions.net/ai_handler’, rn user_defined_context = [(“mode”,”penguin_weight”)]rn)rn;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e8c681ddf50>)])]

The example function above will predict penguin weights based on inputs such as species, island, sex, length and other parameters.

A few screenshots of a remote function being used can be found below:

Calling External APIs in Real-Time

Another common use-case is BigQuery data enrichment by using external APIs to obtain the latest stock price data, weather updates, or geocoding information. Depending on the external service in use, deploy the client code as a Cloud Function and integrate with Remote Functions using the same methodology as the examples covered before.

Here is a screenshot of a remote function example calling an external/public API to retrieve Brazil’s currency information:

In Summary

A BigQuery remote function lets you incorporate GoogleSQL functionality with software outside of BigQuery by providing a direct integration with Cloud Functions and Cloud Run.

Hopefully this blog sparkled some ideas on how to leverage this super powerful BigQuery feature and enrich your BigQuery data.

Try out the BigQuery remote UDFs today!

Source : Data Analytics Read More

May 4, 2023 By Data Analytics

All data cloud, all the time: Recapping the Google Data Cloud & AI Summit

The Data Cloud & AI Summit is Google Cloud’s global event that showcases latest innovations and how customers are transforming their business with a unified, open and intelligent data platform. At our third annual event, we shared the latest product launches across generative AI and Data Cloud, learnings from customers and partners, and provided best practices to support your data driven transformation. In case you missed it, here are three highlights to help you level up your data and AI know-how.

New product innovations: A game-changer for your AI and data strategy

We announced new product innovations that can help optimize price-performance, help you take advantage of open ecosystems, securely set data standards, and bring the magic of AI and ML to existing data, while embracing a vibrant partner ecosystem.

Generative AI innovations: A range of foundation models were made available to developers and data scientists through Generative AI support on Vertex AI — developers benefit from easy API access and data scientists have a full suite of tuning options for customizing foundation models. Gen App Builder is a brand new offering that brings together foundation models with the power of search and conversation AI to enable enterprises to develop new generative AI apps.

BigQuery Editions provide more choice and flexibility for you to select the right feature set for various workload requirements. Mix and match among Standard, Enterprise, and Enterprise Plus editions to achieve the preferred price-performance by workload. We also introduced innovations for autoscaling and a new compressed storage billing model.

AlloyDB Omni, a downloadable edition of AlloyDB designed to run on-premises, at the edge, across clouds, or even on developer laptops. AlloyDB Omni offers the AlloyDB benefits you’ve come to love, including high performance, PostgreSQL compatibility, and Google Cloud support, all at a fraction of the cost of legacy databases.

Looker Modeler allows you to define metrics about your business using Looker’s innovative semantic modeling layer. Looker Modeler is the single source of truth for your metrics, which you can share with the BI tools of your choice, such as PowerBI, Tableau, and ThoughtSpot, or Google solutions like Connected Sheets and Looker Studio, providing you with quality data to make informed decisions.

New partnerships: Over 900 software partners power their applications using Google’s Data Cloud. We announced partnerships to bring more choice and capabilities for customers to turn their data into insights from AI and ML, including new integrations between DataRobot and BigQuery, ThoughtSpot and multiple Google Cloud services — BigQuery, Looker, and Connected Sheets — and Google Cloud Ready for AlloyDB, a new program that recognizes partner solutions that have met stringent integration requirements with AlloyDB.

Watch the keynotes and dive into the breakout sessions in the AI and Data Essential tracks to learn more.

Top Data Cloud customers: How Google Cloud is helping businesses thrive

Customers are at the heart of everything we do at Google Cloud. Here are some stories you might have missed from the event. Dig in!

Booking.com, one of the largest online travel agencies, talks about how Google Cloud has been a true platform-as-a-service for their business. In this session, they highlight how BigQuery, Dataflow, and Cloud Spanner force-multiply each other when used together. BigQuery accelerated petabyte-scale queries from hours to seconds, Dataflow reduced development time and run time by 30x and Spanner reduced complexity with online schema evolution and federated queries.

Dun & Bradstreet is building a data cloud with Google Cloud, a centralized data lake and unified processing platform to consolidate all its data, share it data with customers, and achieve better performance and reduce costs. Their session at the summit has all the details.

Orange France, a major Telco company in France, discusses how BigQuery and Vertex AI provide the foundation to increase revenue, maximize savings and improve customer experiences.

Richemont, a Switzerland-based luxury goods holding company, accelerates insights with SAP and Google Cloud Cortex Framework. In this session, Richemont talks about their innovation with advanced analytics powered by Google’s Data Cloud and how they’ve accelerated their time-to-value for their business.

ShareChat, an Indian social media platform with 340 million users, leverages Spanner and Bigtable to build differentiated features rather than worrying about managing underlying databases. Using an autoscaler with Bigtable and Spanner allowed them to reduce the cost of running these systems by 70%. Some of their data science clusters running on Bigtable scale from 30 nodes to 175 nodes and then back to 30 nodes during a single day. Learn more about their story in this session.

Tabnine joins CI&T, an end-to-end digital transformation partner, to discuss generative AI and why leveraging it for your developers is the ideal place to start.

Level up your education with these resources

Looking to hone your understanding of everything data and AI? Consider these resources.

Product and solution demos – Check out these demos for inspiration and insights into how Google Cloud’s products and solutions can solve your most pressing data and AI challenges. And in case you didn’t see this end-to-end data cloud & AI demo in action, you’re in for a treat, with methods and solutions developers can use today to unlock the power of data and AI with Google Cloud.

Learning and certifications – Find your learning path. Grow your cloud skills. Continue your cloud journey with the insights, data, and solutions across everything that’s cutting edge in cloud.

Hands on Labs – Try things out, without really breaking anything. Get started with Hands on Labs for products such as; BigQuery, Spanner, AlloyDB, Looker, LookML and more.

At Google Cloud, we believe that data and AI have the power to transform your business. Thank you to our customers and partners who are on this journey with us. To learn more about what you’ve read, watch the sessions on-demand and make sure to join our Data Cloud Live events series happening in a city near you. Get started at cloud.google.com/data-cloud to learn how tens of thousands of customers build their data clouds using Google Cloud.

Source : Data Analytics Read More

May 3, 2023 By Data Analytics

Meet our Data Champions: Credit Karma’s Scott Wong on doing 60 billion model predictions per day

Editor’s note: This blog is part of a series called Meet the Google Cloud Data Champions, a series celebrating the people behind data- and AI-driven transformations. Each blog features a champion’s career journey, lessons learned, advice they would give other leaders, and more. This story features Scott Wong, VP of Platform Engineering at Credit Karma, a heavy user of Google Data Cloud offerings like Cloud Bigtable and BigQuery to store and analyze financial information for its 130 million members.

Tell us about yourself. Where did you grow up? What did your journey into tech look like?

I grew up in a small beach town called Del Mar in San Diego County. For college, I attended Cal Berkeley, and around that time was when there was a significant growth in the technology sector. Out of school, I began my first internship at Sun Microsystems, and ended up working there for eight years. My role at Sun Microsystems was centered around manufacturing — we were literally fabricating motherboards and assembling servers. At this time, there was a major push to move a majority of manufacturing to Asia, so I was traveling a lot to Asia early on in my career. I then moved to Google and Twitter building out large-scale data centers.

What really stood out to me during my time at Sun Microsystems, and furthered my interest in tech, was the incredible growth I saw within the company, but also in the wider industry. The pace and culture seen in tech companies experiencing fast growth was very appealing — there was a lot of experimentation and innovation happening around me, and there were always opportunities to grow my skill set.

What’s the coolest thing you and/or your team have accomplished by leveraging our Data Cloud solutions?

Google’s Data Cloud has enabled us to scale at high velocity. We run nearly 60 billion model predictions daily to power financial recommendations for our nearly 130 million members, and Google services like Cloud Bigtable and BigQuery have helped us reach this scale — in terms of scaling our infrastructure and empowering our data scientists to work faster and more efficiently.

To quantify some of these gains:

Our data scientists deploy more than 300 models weekly compared to 2018 when they were deploying less than 10 models on a quarterly basis.

In terms of experiment velocity, our data scientists are doing 7x more experiments compared to 2018.

Using Bigtable and BigQuery, we’ve been able to deploy 10x more features daily w/ batch data

Technology is one part of data-driven transformation. People and processes are others. How were you able to bring the three together? Were there adoption challenges within the organization, and if so, how did you overcome them?

While our journey to cloud was a big undertaking for the company, there was always a clear directive from our CTO that we’d make this migration, and that it was an imperative in order to truly scale our business.

It did pose interesting challenges, primarily around people. We had a staff of engineers that were well versed in datacenter-centric skill sets, and weren’t necessarily used to using cloud technologies. We knew early on that we needed to allocate resources to train engineers while at the same time, hiring those who specialized in cloud and data engineering.

In terms of processes, it was clear early on how much velocity we’d gain from the cloud — there were a lot of processes we just didn’t have to do any longer. This still meant a lot of change was happening so we needed to ensure there was a clear understanding among the engineering organization of how to work with these new technologies — what could and couldn’t we leverage and why? This is why building a trusted partnership with the Google team was so important. We were moving fast and we needed to ensure there was a high level of service available to us in regards to response time, quality, availability, and more.

What advice would you give people who want to start data initiatives in their company?

The first and most important step is really understanding how you want to use the data — what is the business need and how your data strategy will get you to that point. Once you have alignment on the business need and the data you need to reach that state, you get into the more nitty gritty — but still important — details. What do the SLAs look like? How are you addressing data quality and data reliability? What data products do you need to leverage? What are your security guardrails to ensure protection of the data? What efficiencies are you putting in place to ensure the utilization of data is appropriate?

Unlocking velocity is great but when you’re moving fast with data, there could be tradeoffs within the security and reliability realms, and data leaders need to be hyper-aware of that. Especially when it comes to security, you need to retain full responsibility for those practices even if you’re leveraging management support for some of those practices elsewhere.

What’s an important lesson you learned along the way to becoming more data- and AI-driven? Were there challenges you had to overcome?

Credit Karma from the start has been a very data-driven company — data at scale drives our product and personalization is integral to our product vision. Scalability over the years has really demonstrated how important data quality is. The integrity of that data is imperative to running a good product, especially when you have well over 100 million people using your product. Over the years, as we continued to make a lot of investments in our machine learning infrastructure and model building practices, you realize how important data grooming and data governance is.

When it comes to data from a scalability perspective, there’s this notion of wanting to keep all historical data, even as your data grows and grows. In order to use data responsibly, there needs to be some efficiencies and guardrails put in place. When you function at scale, you shouldn’t be holding onto data just to have it, instead, be strategic about your data storage needs.

Which leaders and/or companies have inspired you along your journey?

One engineering leader who comes to mind is Urs Hölzle who has, and continues to play, an integral role in building Google infrastructure. Since I left Google more than a decade ago, his uncanny ability to reach a solution in such a quick and effective manner has really stuck with me over the years. His leadership style and ability to ask the right questions has inspired my outlook on great engineering leadership.

Thinking ahead 5-10 years, what possibilities with data/AI are you most excited about?

I might be biased, but so much opportunity exists for data and AI to disrupt personal finance. So many aspects of finance remain confusing, antiquated, inefficient and all in all, stacked against consumers. At Credit Karma, personalization is our north star, and we’ve made tremendous strides to provide each of our members a unique product experience, helpful to their individual situations, needs and goals. As we continue to prioritize investments in data and machine learning at scale, the hope is to automate financial decision making for millions of consumers.

To learn more about Google’s Data Cloud, please visit https://cloud.google.com/data-cloud

Download the complimentary 2022 Gartner Magic Quadrant for Cloud Database Management Systems report.

Learn why customers choose Google Cloud databases in this e-book.

Source : Data Analytics Read More

May 2, 2023 By Data Analytics

A leading Open SaaS ecommerce platform’s journey toward Kafka modernization and real-time data analytics

When a company utilizes an outdated IT infrastructure, management and maintenance become more problematic as it scales. This often prevents businesses from acting quickly to embrace new opportunities for innovation. Fortunately, avoiding these challenges is now easier than ever thanks to the availability of proven cloud services.

A few years ago BigCommerce (NASDAQ: BIGC), a leading Open SaaS ecommerce platform for fast-growing and established B2C and B2B brands, faced challenges as it looked to expand and meet changing customer demands. The company was managing its own Kafka cluster that required increasing maintenance on software patches, blind spots in data-related infrastructure, and system updates.

BigCommerce chose to migrate to a fully managed Apache Kafka® platform to modernize its approach with the help of Google Cloud and Confluent. By doing so, BigCommerce increased system performance, reduced maintenance, and was able to spend more time developing next-generation services for their customers.

A better approach to Kafka-based data streaming

BigCommerce endures a relatively unique set of technical demands given the broad scope of its ecommerce platform.

“We are an Open Software-as-a-Service (SaaS) ecommerce platform that equips mid-market and enterprise brands with a composable foundation to accelerate ambitious growth,” says Mahendra Kumar, vice president of data and software engineering at BigCommerce. “This means we have to deal with a lot of complex data across a range of commerce categories.”

BigCommerce recognized that Kafka was the right technology for its real-time data processing. As they continued to scale their Kafka footprint across the organization, the engineering team had to manage an exponential increase in data demands from a larger base of users.

“Three engineers spent part of their time on Kafka management making updates, patching and troubleshooting our Kafka clusters,” says Mahendra.

BigCommerce chose to migrate its data-streaming practices to Confluent Cloud because it provides everything needed to connect apps, data systems, and organizational frameworks for real-time data streaming and processing through a Kafka-based structure that minimizes maintenance and management demands.

“Confluent was ideal for our needs because it frees us from the operational aspect of Kafka through the managed service offering. The fact that Confluent is a company founded by the original creators of Kafka gave us a lot of confidence in their ability to understand our use case and help us implement best practices,” says Mahendra. “We are very comfortable with Kafka principles, so we didn’t experience migration pains or deal with a steep learning curve. Confluent is reliable, robust, performant, and developer friendly. It also saves us a lot of money, while improving our Kafka practices. The Confluent support team is very helpful and is quick to respond to open support cases.”

At the time of migration, BigCommerce processed 1.5B events daily. These events are processed with greater efficiency thanks to Confluent. These events include product orders, website visits, page views, cart information, and others. The improved system performance greatly enhances merchant experiences with the platform, while also dropping resource demands from Kafka cluster management to the equivalent of about one half of a full-time engineer’s workload.

Unlocking data insights for merchants at scale

In addition to saving time and money, the combination of Confluent and Google Cloud proved to be the right solution for another major priority — providing merchants with the freshest, most valuable data analytics in real-time.

“Merchants need actionable real-time analytics to remain competitive in their respective marketplaces. We set out to build a data platform to meet those requirements,” says Mahendra. “Open data is key to our Open SaaS strategy. BigCommerce provides native integrations for merchants to automatically upload their commerce data into BigQuery daily to build custom reports, integrate their commerce data with other sources such as ads and CRM, and unlock the power of that information.”

The highly-scalable, fully-managed features of Google Cloud and Confluent enable BigCommerce to more reliably support merchants regardless of spikes in demand, such as major shopping events like Cyber Monday.

“With Google Kubernetes Engine (GKE), Cloud SQL, Cloud Composer, and Confluent, we have more confidence in our Kafka streams and real-time data platform at any time during the year. We no longer have to worry about scaling because it’s all automated,” says Mahendra.

By running its storefront platform on Google Compute Engine and deploying a new data analytics platform using other Google Cloud technologies, BigCommerce has saved both time and costly investments on data transfers. Today, BigCommerce allocates fewer resources on backend infrastructure management allowing it to focus on increasing the power of its platform for merchants.

Learn more about how Google Cloud partners can help your business achieve a successful digital transformation. Get started with Confluent Cloud for free today, and get $400 in free credits to spend during your first 30 days.

Source : Data Analytics Read More

May 2, 2023 By Data Analytics

Running ML models now easier with new Dataflow ML innovations on Apache Beam

According to Harvard Business Review, only 20% of companies see their models go into production for AI. Google Cloud Dataflow builds on one of the most popular open source frameworks, Apache Beam, which is a unified programming model and SDK for developing batch and streaming pipelines. Continuing our commitment to building an open product, working closely with the Beam community, we’re excited to add three new machine learning (ML) focused features to Dataflow that tightly integrates with the vast resources from the community and help simplify running streaming ML models at scale in production:

Automatic Model Refresh: Because your data and ML model are ever changing, models need to be continuously retrained and improved to remain effective. These model updates shouldn’t require a large effort by the data engineering teams to redeploy. The new streaming Automatic Model Refresh feature allows you to update models, hot swapping them in a running streaming pipeline with no pause in processing the stream of data, avoiding downtime.

TensorFlow Hub integration: TensorFlow Hub is a repository of pre-trained machine learning models, and it’s a valuable resource for researchers and developers who want to quickly and easily deploy machine learning models in their applications. Open source models, individually or as an ensemble, are a common tool in any data scientist’s repertoire. For example, if you’re doing sentiment analysis on reviews posted by customers, you might use an open source model for the embeddings stage before passing the data to a custom, domain-specific model. To make this step easy for Apache Beam users, the TensorFlow Hub integration allows you to use just a few lines of code to download and consume a model in your batch or streaming Dataflow pipeline.

Model Ensembles: With the proliferation of ML frameworks, you don’t want to be restricted to using just one processing pipeline. The Apache Beam SDK allows you to use multiple frameworks such as Tensorflow, PyTorch, SKLearn… in a single pipeline, as an ensemble. The Apache Beam community recently started supporting the ONNX model handler, adding to the already expansive list of supported frameworks.

Streaming Automatic Model Refresh

Apache Beam has always been known for its ability to work with both streaming and batch using a single, unified API. This feature allows developers to move between batch and streaming without needing to change code or add a bunch of new import statements. Until now, however, every time a model was retrained, the operational engineers had to work through various pipeline lifecycle events. With the Automatic Model Refresh feature, the RunInference transform automatically updates the model handler without requiring an update to or redeployment of the whole pipeline.

Two modes are available with this feature:

Watch mode, which pulls updates from Google Cloud Storage: The RunInference WatchFilePattern class is used to watch for the latest file. It then updates by matching a file_pattern based on timestamps. It emits the latest ModelMetadata, which is used in the RunInference PTransform to update the ML model without stopping the Beam pipeline.

Event mode: By connecting the pipeline to an unbounded source, such as Pub/Sub, update events are sent directly to the transform, initiating a hot swap of the model on demand.

Tensorflow Hub integration

With the new native TensorFlow model handler, you can use TensorFlow Hub models in Dataflow pipelines by passing a URL to the model as an argument. TensorFlow Hub is a repository of pre-trained machine learning models available for use within the TensorFlow framework and its high-level wrapper Keras. You can leverage powerful models for a variety of tasks without having to train them. The repository contains over a thousand models, adding more through community contribution.

Let’s look at an example of image classification, common in a wide variety of industries, like retail store management and e-commerce sites. In the following example, we use the mobilenet_v2 model on tfhub. With just a few lines of code, our pipeline uses a model that can deal with millions of images. ( A runnable copy of this notebook is available here).

code_block[StructValue([(u’code’, u’from apache_beam.ml.inference.tensorflow_inference import TFModelHandlerTensorrnfrom apache_beam.ml.inference.base import PredictionResultrnfrom apache_beam.ml.inference.base import RunInferencernfrom typing import Iterablernrnmodel_handler = TFModelHandlerTensor(model_uri=”https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4″)rnrnclass PostProcessor(beam.DoFn):rn “””Process the PredictionResult to get the predicted label.rn Returns predicted label.rn “””rn def setup(self):rn labels_path = tf.keras.utils.get_file(rn ‘ImageNetLabels.txt’, ‘https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt’rn )rn self._imagenet_labels = np.array(open(labels_path).read().splitlines())rnrn def process(self, element: PredictionResult) -> Iterable[str]:rn predicted_class = np.argmax(element.inference)rn predicted_class_name = self._imagenet_labels[predicted_class]rn yield “Predicted Label: {}”.format(predicted_class_name.title())rnrnwith beam.Pipeline() as p:rn _ = (prn | “Create PCollection” >> beam.Create([img_tensor])rn | “Perform inference” >> RunInference(model_handler)rn | “Post Processing” >> beam.ParDo(PostProcessor())rn | “Print” >> beam.Map(print))’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb86f9425d0>)])]

To make this code work for millions of images, change the source to a bucket on an object store and use one of the many Apache Beam sinks to output the results to your desired destination.

Model ensembles

Increasingly, even single business use cases are built using more than one framework. Therefore, RunInference also needs to support as many frameworks as possible. It adds this support through its model handlers.

Before this release Apache Beam supported:

The Beam community has added the ONNX runtime with an ONNX model handler. You can see an example pipeline at onnx_sentiment_classification.py.

You can read more about the use of multiple models in one pipeline here, which also includes a notebook that shows the BLIP and CLIP models used together.

Conclusion

Data engineers, data scientists and developers can now take advantage of the latest features from Dataflow ML, while the Apache Beam community continues to add features to the RunInference transform to make ML productionisation and development easier and more flexible:

It’s now easier to create ensembles of models from different frameworks within the same pipeline.

We’ve reduced the amount of code that needs to be written when using TensorFlow and TensorFlow Hub models.

Continuous deployment to streaming pipelines is easier than ever with automatic model refreshes.

You can find details of these features and more example notebooks on the Dataflow ML documentation.

Source : Data Analytics Read More

May 1, 2023 By Data Analytics

Unleash your Google Cloud data with ThoughtSpot, Looker, and BigQuery

Today’s businesses have to deliver innovation and operational improvements at high velocity — finding new ways to delight customers while paying close attention to efficient revenue growth. This goal translates to leveraging data and analytics in every part of their value chain, be it within the product they build and sell, supply chain, customer support, and virtually every function within the enterprise. Infusing data into everyday processes proves to be a daunting task when using outdated, disconnected data visualization and analytics tools.

Businesses who embrace the modern data stack are rewarded with more flexibility and can choose the best options when modeling, analyzing, and consuming data. And the very brightest data leaders among them have invested in self-service analytics solutions that empower every business user to ask, answer, and operationalize insights in real-time.

For a number of years, ThoughtSpot and Google Cloud have been working together to help customers realize their goal of true self-service analytics. On the heels of launching ThoughtSpot Sage — a true natural-language search experience for data — our expanded partnership pushes that goal to new heights. Together, we’re helping customers reimagine how they use data, analytics, and AI to become a data-driven business.

Four ways this partnership transforms how companies leverage data

ThoughtSpot’s SaaS offering is now built directly on Google Cloud
Mutual customers have long been running ThoughtSpot on their data in BigQuery. But in the coming months, ThoughtSpot AI-Powered Analytics will be natively built on Google Cloud, empowering our mutual customers to choose their preferred cloud platform based on security, performance, or cost.

Looker Modeler integrates with ThoughtSpot
Data modeling helps organizations unlock the value of their cloud investments. Looker Modelercustomers have the ability to model, transform, and define metrics — providing vital speed and agility as organizations scale. With our new integration planned for availability later this year, business users can leverage ThoughtSpot’s intuitive interface to explore their trusted LookML data through natural language search, unlimited drill down, and interactive Liveboards directly on the Looker Semantic Model.

ThoughtSpot integrates with Google Connected Sheets and Google Slides
While the business intelligence industry would like to think otherwise, the reality is many business users still rely on spreadsheets for data analytics. After the successful launch of ThoughtSpot for Google Sheets, a free plugin for users who want to leverage Sheets for data analysis, we’re extending this integration. Now, users can bring trusted data from BigQuery into Sheets for live analysis with ThoughtSpot Connected Sheets and embed live charts and interactive data visualizations in Google Slides using ThoughtSpot Connected Slides.

Google customers now have new opportunities to try and buy ThoughtSpot
Users will soon be able to access ThoughtSpot SaaS directly in the Google Cloud Marketplace, the BigQuery Partner Center, and Google Workspace Marketplace. With these integrations, Google customers have even more opportunities to experience, purchase, and deploy self-service analytics.

Why ThoughtSpot and Looker teamed up to help you get more from Google Cloud data

It’s not every day that you see this level of co-innovation in our industry. By expanding our relationship with Looker and Google Cloud, ThoughtSpot customers have the freedom to select the best tools for their specific use case — modeling data, embedding analytics, and delivering self service data exploration across their organization.

For CNA, a leading commercial insurance company, delivering meaningful insights from data is paramount to their success. Jane Possell, EVP and Chief Information Officer at CNA, said it best:

“With a self-service analytics tool like ThoughtSpot, our team is able to quickly leverage data that allows us to better understand our business and make decisions at speed. To truly maximize the value of our data, we’ve been an early mover to the Google Cloud Platform, pairing the scale and capabilities of Google Cloud Platform with Looker, and the intuitive AI-Powered Analytics from ThoughtSpot. I’m excited to see these companies working more closely together to deliver a truly differentiated data experience for companies like CNA.”

Since the initial announcement, it’s been incredible to see the excitement and positive responses from our customers and the industry. I look forward to the continued collaboration with our partners at Google Cloud as we work to help customers move beyond the dashboard and build their businesses on live, interactive data.

Stay tuned as these integrations are made available later this year. For more information, join ThoughtSpot and Google Cloud at Beyond 2023, or try ThoughtSpot for yourself.

Source : Data Analytics Read More

May 1, 2023 By Data Analytics

BBC: Keeping up with a busy news day with an end-to-end serverless architecture

Editors note: Today’s post is from Neil Craig at the British Broadcasting Corporation (BBC), the national broadcaster of the United Kingdom. Neil is part of the BBC’s Digital Distribution team which is responsible for building the services such as the public-facing www bbc.co.uk and .com websites and ensuring they are able to scale and operate reliably.

The BBC’s public-facing websites inform, educate, and entertain over 498 million adults per week across the world. Because breaking news is so unpredictable, we need a core content delivery platform that can easily scale in response to surges in traffic, which can be quite unpredictable.

To this end, we recently rebuilt our log-processing infrastructure on a Google Cloud serverless platform. We’ve found that the new system, based on Cloud Storage, Eventarc,Cloud Run andBigQuery, enables us to provide a reliable and stable service without us having to worry about scaling up during busy times. We’re also able to save license fee payers money by operating the service more cost effectively than our previous architecture. Not having to manually manage the scale of major components of the stack has freed up our time, allowing us to spend it on using, rather than creating the data.

A log in time

To operate the site and ensure our services run smoothly we continually monitor Traffic Manager and CDN access logs. Our websites generate more than 3B log lines per day, and handle large data bursts during major news events; on a busy day our system supports over 26B log lines in a single day.

As initially designed, we stored log data in a Cloud Storage bucket. But every time we needed to access that data, we had to download terabytes of logs down to a virtual machine (VM) with a large amount of attached storage, and use the ‘grep’ tool to search and analyze them. From beginning to end, this took us several hours. On heavy news days, the time lag made it difficult for the engineering team to do their jobs.

We needed a more efficient way to make this log data available, so we designed and deployed a new system that deals with logs and reacts to spikes more efficiently as they arrive, improving the timeliness of critical information significantly.

In this new system, we still leverage Cloud Storage buckets, but on arrival, each log generates an event usingEventArc. That event triggersCloud Run to validate, transform and enrich various pieces of information about the log file such as filename, prefix, and type, then processes it and outputs the processed data as a stream into BigQuery. This event-driven design allows us to process files quickly and frequently — processing a single log file typically takes less than a second. Most of the files that we feed into the system are small, fewer than 100 Megabytes, but for larger files, we automatically split those into multiple files and Cloud Run automatically creates additional parallel instances very quickly, helping the system scale almost instantly.

The nature of running a global website which provides news coverage means we see frequent, unpredictable large spikes of traffic. We learn from these and optimize our systems where necessary so we’re confident in the system’s ability to handle significant traffic. For example, around the time of the announcement of the Queen’s passing in September, we saw some huge traffic spikes. During the largest, within one minute, we went from running 150 – 200 container instances to over 1000…. and the infrastructure just worked. Because we engineered the log processing system to rely on the elasticity of a serverless architecture, we knew from the get-go that it would be able to handle this type of scaling.

Our initial concern about choosing serverless was cost. It turns out that using Cloud Run is significantly more cost-effective than running the number of VMs we would need for a system that could survive reasonable traffic spikes with a similar level of confidence.

Switching to Cloud Run also allows us to use our time more efficiently, as we no longer need to spend time managing and monitoring VM scaling or resource usage. We picked Cloud Run intentionally because we wanted a system that could scale well without manual intervention. As the digital distribution team, our job is not to do ops work on the underlying components of this system — we leave that to the specialist ops teams at Google.

Another conscious choice we made whilst rebuilding the system was to use the built-in service-to-service authentication in Google Cloud. Rather than implementing and maintaining the authentication mechanism ourselves, we add some simple configuration which instructs the client side to create and send a OIDC token for a service account we define and the server side to authenticate and authorize the client. Another example is pushing events into Cloud Run, where we can configure Cloud Run authorization to only accept events from specific EventArc triggers, so it is fully private.

Going forward, the new system has allowed us to make better use of our data safely. For example, BigQuery’s per-column permissions allow us to open up access to our logs to other engineering teams around the organization, without having to worry about sharing PII that’s restricted to approved users.

The goal of our team is to empower all teams within the BBC to get the content they want on the web when they want it, make it reliable, secure, and make sure it can scale. Google Cloud serverless products helped us to achieve these goals with relatively little effort and require significantly less management than previous generations of technology.

Source : Data Analytics Read More

Apr 28, 2023 By Data Analytics

How Bud Financial turns transactional data into rich customer insight

Transactional data can be very messy. The information recorded from each transaction across a person’s financial life can come from many different banks and merchants, and the resulting data lacks structure. This makes it hard for financial institutions and their customers to use it to make financial decisions.

At Bud Financial Ltd (‘Bud’), we use machine learning (ML) technology to make sense of financial data so that financial companies can instead focus on building better services for their customers.

Bud began life in 2015 as an education platform to help people manage their money and improve their financial wellbeing. Now, we operate as a business to business firm. Global financial companies use our APIs to combine transactional data from any source so they can harness the power of Bud’s data intelligence to open up more financial opportunities for their customers. Throughout this journey, we’ve always relied on Google Cloud to develop and scale our technology.

Pushing the boundaries of financial technologies, with security in mind

As a fast-moving business that works with global banks, we need a cutting-edge platform that scales and helps us to maintain a high degree of trust from our clients.

Google Cloud makes it easy for us to demonstrate that we’re meeting industry compliance and security requirements with self-serve, painless reports such as SOC 2 and ISO 27001. The Operations Suite (formerly Stackdriver) gives us visibility into all of our data assets, making tracking and management easier. We always know what data is where, who has access to what, what data is interacting with other elements of the platform, and how. This traceability is key.

Google Cloud has an excellent reputation in the financial sector, which has made our conversations with clients easier from the get-go.

Using ML to open up lending opportunities for underserved communities

Personalization, made possible with ML, has flourished in other industries. But in financial services, it can be difficult to innovate at speed while taking all the necessary precautions for this to be done in a privacy-preserving, and compliant way that benefits end-users. Nonetheless, many pressing challenges are pushing organizations to seek answers.

The ongoing cost of living crisis is motivating financial services companies to identify and support customers who might be experiencing financial difficulties. Our clients want to engage with their customers in relevant, timely ways. They use Bud’s data intelligence to give their customers personalized insights into their spending and influence positive behavior to build financial resilience and achieve their goals.

Bud’s platform is also being used to improve lending processes. Current lending services may not meet the needs of every customer. If you’re someone who has moved countries, never had a credit card, or doesn’t have a perfect credit score, it can be difficult to get access to loans. This isn’t always fair because your history is not necessarily reflective of your current ability to pay back a loan. Bud wants to change this by using transactional data to provide companies with a better understanding of someone’s real-time financial situation and affordability. That way, without increasing their risk, these companies can open up their services to more people who need them. It’s not just underserved customers who are seeing this benefit. The simple affordability checks and credit risk insights we provide mean that every customer can get faster access to the products that are right for them.

Banks and financial institutions can use this data to improve their decisions around areas like credit affordability and application processes. At the same time, Bud helps these companies to deliver that context to their customers, alerting them to ways that they can improve their financial decisions. But all this relies on transactional data used in real time, covering millions of customers, which creates huge and highly-variable volumes of data.

Bud uses DataStax Astra DB on Google Cloud to handle this data volume and run this critical service for banking customers. With DataStax Astra DB on Google Cloud, Bud developers can take full advantage of different data models and APIs to accelerate new product and service launches. This frees up our developers to focus on banking data services rather than operational database tasks. Moreover, Bud only pays for the time and compute resources it uses.

Leveraging a scalable, flexible platform to process billions of transactions each month

Just as financial services companies need to be flexible and adapt to change, we’re always using new technologies at Bud to figure out how we can solve evolving challenges in this space. Google Cloud provides us this flexibility to experiment. For example, although we use Cloud SQL, as we’ve grown as a business we’ve also experimented with no-relational databases. Because it’s so flexible, Google Cloud enables us to be future-proof as an organization. We follow multicloud principles, but in practice we use Google Cloud as our solid foundation.

With Google Kubernetes Engine, we deploy and scale everything on our platform, not least because it gives us the flexibility to build and test in different kinds of environments. Our recent launch in the U.S. took only eight weeks. It was made easy by Google Kubernetes Engine, which enables us to spin up U.S. environments and scale them up to accommodate the different needs of local clients. Scaling the Bud platform has been very straightforward with Google Cloud. We now serve millions of our enterprise clients’ customers, enriching billions of transactions each month on our platform.

Meanwhile in the backend, we’ve got a team responsible for ensuring that our platform is reliable and scalable, providing a world-class developer experience so clients can integrate our APIs seamlessly. These two things are directly impacted by our platform provider. Having self-managed services such as Google Kubernetes Engine and CloudSQL means we can continue to focus on our core services instead of managing our 30 clusters and instances.

Simplifying transactional data to help more people make sense of their finances

Over the years, we increased our use of BigQuery for logging events in our business intelligence. It is easy to integrate it with tools such as Looker, which enables all teams across the company to better visualize how our product is performing and understand all of our key business metrics.

We concluded 2022 with an exciting launch on Google Cloud Marketplace in the U.K. and the U.S. As one of the first U.K. fintech on Google Cloud Marketplace, our focus for 2023 is to scale our solution to even more financial businesses and make it easier than ever for them to get started. Many companies still aren’t getting the full value of their transactional data, so we’re expecting this launch to open more distribution channels.

Source : Data Analytics Read More

Category Data Analytics

Take the next step

We’re here to help

Growing reporting demands

Data consistency and quality issues

Transforming the data supply chain

Granularity at massive scale

Dynamic Language Translation with SQL

BigQuery Unstructured Data Analysis

Natural Language Insights with SQL

Security and Compliance

ELT and Data Catalog Updates

Event-Driven Architectures with Pub/Sub Updates from SQL

Data Science with Vertex AI Models called from SQL

Calling External APIs in Real-Time

In Summary

New product innovations: A game-changer for your AI and data strategy

Top Data Cloud customers: How Google Cloud is helping businesses thrive

Level up your education with these resources

Meet our Data Champions: Jan Riehle, at the intersection of beauty and data with Beauty for All (B4A)

A better approach to Kafka-based data streaming

Unlocking data insights for merchants at scale

Streaming Automatic Model Refresh

Tensorflow Hub integration

Model ensembles

Conclusion

Four ways this partnership transforms how companies leverage data

Why ThoughtSpot and Looker teamed up to help you get more from Google Cloud data

A log in time

Pushing the boundaries of financial technologies, with security in mind

Using ML to open up lending opportunities for underserved communities

Leveraging a scalable, flexible platform to process billions of transactions each month

Simplifying transactional data to help more people make sense of their finances