Slingshot: Solving Data Management from the Inside Out

How our tech teams solved data management challenges in-house, at scale

More than 70% of companies migrated some part of their workloads to the cloud as of 2021 and there are no signs the trend will slow down. 

While there are many benefits of moving to the cloud—like instantly scalable computing resources and near-unlimited storage—many organizations find that costs can escalate quickly and well-managed governance becomes challenging.

Capital One faced many of these same obstacles managing our data, infrastructure and costs in our years-long data journey to operate in the cloud at scale. We also found a lack of existing solutions for the challenges we faced. 

So we set out to create our own tools for data management in the cloud. We wanted to empower our lines of business to take advantage of the speed and scalability gained in the cloud while also providing the governance and oversight needed to operate responsibly and cost effectively. Let’s take a look at how we came to build these tools. 

Challenges to Scaling Data in the Cloud

In 2016, we made the decision to go all-in on the cloud, migrating our entire data ecosystem and transforming our data infrastructure in the process. Our move to the public cloud required us to transform our data infrastructure and migrate our entire data workload of 500 terabytes and 6,000 analysts.

We needed a data cloud platform that could support our long-term storage and computing resourcing needs while providing reliable performance. We chose Snowflake and in 2017 became an early adopter of the cloud-native data platform. With Snowflake, we benefited from near-unlimited data storage and compute resources in the cloud. We could now run millions of queries a day without affecting performance. Lines of business had near real-time access to insights that previously took weeks to process. And queries that used to take three hours were completed in minutes. 

This new scalability and speed unlocked new opportunities for gaining valuable insights quickly from great volumes of data. But without proper controls in place, costs associated with data consumption escalated quickly across the organization and good data governance became more difficult. To operate effectively in the cloud and provide increasingly personalized experiences for our customers, we needed to solve these challenges. 

How Our Technologists Overcame Challenges in the Cloud

With a lack of tools in the market that could help us meet our goals, our team of talented engineers began creating internal tools to promote agility, ease of use and self-service for data users across the company. 

Encouraging Self-Service and Data Access 

Expanding access to critical data meant more innovation and faster insights. But with millions of queries running each day, data users needed a way to manage their own platform without the bottleneck of a central data team. To combat this, our data engineers built a self-service portal that empowered even nontechnical users to take full advantage of the speed and scalability of data in the cloud. This intuitive portal allowed users to easily make requests, track projects and align with other business stakeholders. Now users could suddenly gain access to the data with a few clicks. 

Performance Dashboards for Managing Cloud Costs 

As data usage and access increased across multiple business lines, we faced challenges in optimizing usage and preventing costs from ballooning. We needed a way to manage and optimize our Snowflake spend.

We built an easy-to-use dashboard that allows key stakeholders to monitor performance and manage costs with greater visibility into how and where resources were being allocated. In the dashboard, we provide insights into wastage of resources due to user behavior or misconfiguration.

Our user-friendly dashboards offer improved cost transparency by showing various cost breakdowns, such as the percentage of Snowflake credits used, average daily usage cost and costs by line of business. 

And alerts provide visibility into important changes, such as unusual Snowflake credit usage and cost spikes or Snowflake credits that are about to run out so users can fix inefficiencies right away. Next, recommendations on the best scheduling and sizing for warehouses help users determine the best next steps to optimize costs. These alerts and recommendations allow us to right-size data usage at critical moments rather than waiting until the monthly invoice to realize inefficiencies.

Dynamic Warehouse Scheduling for Cost Savings

We realized that data usage never stays the same, varying throughout a day, week or month. We wanted to scale up warehouse capacity during peak usage times and scale down at low usage such as overnight hours. So we built a way for data users to provision their own compute capacity and to do so in a cost-effective way. With dynamic scheduling for warehouses, data users can set warehouse parameters to change dynamically on custom schedules so they can precisely provision compute capacity, incurring costs only for what they need.

Scheduling warehouses based on size, scaling capacity (i.e., minimum and maximum number of clusters) and scaling policies helps control Snowflake credit consumption, and is easy to adjust based on factors such as warehouse usage, environment and time of day.

Automating Governance Across Lines of Business

Well-governed data is critical to the health and vitality of our businesses. As our data usage and access scaled in the cloud, we created centrally managed policies and custom approval workflows to create a governance process that safeguards standards for policy adherence and governance. These automated processes removed data and analytics tasks from our IT team’s backlog and saved us thousands of hours in manual work.

Centrally Managed Policies and Approval Workflows 

With the new tools we built, our users were able to create warehouses while leveraging pre-configured templates with centrally managed governance built in. Using custom workflows, a new warehouse request could then be sent to the owner with the appropriate permissions for approval. 

Strengthening Federated Data Ownership

One of our goals was to empower individual lines of business to take responsibility for data management. We found that data management could no longer be delegated to a central data team without the risk of creating bottlenecks. 

We strengthened data ownership by assigning warehouses in Snowflake to specific lines of business within the organization. By associating each warehouse to a line of business, our tools help assign data ownership and accountability to business teams while automating governance and optimizing costs.

The Impact of Innovation

In all, these in-house solutions for managing costs and data infrastructure saved Capital One 27% on projected Snowflake costs over three years starting in 2019. 

Built-in automated infrastructure management saved the company 50,000 hours of manual effort, freeing our data engineers to focus on driving customer value. 

And our cost per query dropped by 43%.

Slingshot: Built for Better Data Management at Scale

Recognizing a gap in the market, we decided to share the internal tools we created to help other companies who face similar challenges.

In June 2022, Capital One launched Slingshot, a SaaS product that helps businesses accelerate adoption of Snowflake, manage cloud costs and automate critical cloud governance processes. 

Leveraging the data and cloud management expertise we developed on our own journey, Slingshot helps other businesses harness the scalability and speed of the cloud in a way that is efficient for each data owner in the organization. 

The creation of internal tools and the launch of Slingshot is a testament to our tech teams’ commitment to overcoming the hurdles of adopting modern data and cloud technologies and to making banking better for our customers. As the migration to the public cloud and adoption of cloud technologies increases, we’re ready to bring the solutions we developed to improve financial services to companies in other industries looking to scale their data in the cloud and deliver better experiences to their customers. That's technology at Capital One.

Interested in transforming the banking industry with us?

Salim Syed, Vice President and Head of Engineering, Capital One Software

Salim Syed is Vice President and Head of Engineering for Capital One Software. He led Capital One’s data warehouse migration to AWS and is a specialist in deploying Snowflake to a large enterprise. Salim’s expertise lies in developing Big Data (Lake) and Data Warehouse strategy on the public cloud. Salim has more than 25 years of experience in the data ecosystem. His career started in data engineering where he built data pipelines and then moved into maintenance and administration of large database servers using multi-tier replication architecture in various remote locations. He then worked at CodeRye as a database architect and at 3M Health Information Systems as an enterprise data architect. He has a bachelor’s degree in math and computer science from Lewis & Clark College and a master’s degree from George Washington University.

DISCLOSURE STATEMENT: © 2022 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Related Content