How to Choose a Data Warehouse

A data warehouse is the central home for your data, and the rest of your data processes are built off it. With a data warehouse, you can see and search across all your data sources in one location. Consolidating data from your siloed sources is the first step in creating a universal source of truth that all teams can draw from for reliable, accurate data.

A close up of a laptop and an open notebook and man's hands that look like he is explaining something.

But choosing a data warehouse can get complicated, especially if you aren’t familiar with technical considerations. Here’s what you need to consider to confidently choose a warehouse for your business.

Make sure you know why you need a data warehouse — and what you’ll be storing

This might sound obvious, but understanding the basics of what you’ll be storing, and how you’ll be working with that data, should affect your decision.

Storing large amounts of video or text files requires a different approach than storing everything you need to know about your customers. If you’re storing data about your customers, and have data in other sources like CRMs, payment solutions, and marketing tools, your considerations need to change too.

We’re going to cover this in greater detail in another post, but here’s what you need to know now:

  • Working with large files like video: You need a file-based data warehouse, and should likely be looking at cloud storage for this. Services like Google Cloud and S3 should be on your radar.

  • Pulling information on one customer at a time: You’re looking at a row-based warehouse here. Major players in this space are PostgreSQL and MySQL.

  • Performing complex analysis and combining multiple data sources: You need a columnar data warehouse (also known as a column-based or column-oriented warehouse), like Snowflake, Redshift, or BigQuery.

I know I need a columnar data warehouse — now what?

We’re going to focus on column-based warehouses, because they’re most important for the kinds of complex analysis your company needs to perform to drive growth. It’s also a core offering of our modern data platform, but we’ll get to that in a bit.

These are the five major considerations any business should have when exploring data warehouse options.

Administrative hurdles

Make sure you choose a data warehouse that’s going to work for administrative needs, like access controls, publishing workflow controls, and the ability to publish personalized views.

With these types of controls, you can make sure your teams have access to the data they need, without access to information they don’t, or the ability to edit data in a way that is unnecessary.

This is especially important if you’re dealing with a lot of personally identifiable information (PII) and need to keep security best practices or legal requirements in mind. You may need to store information about customers that your marketing team has no reason to access, for example. Your data warehouse should have the tools to help you keep certain information private, while relevant data is still readily available for the teams that need to use it.

Compatibility with other tools

This is very important if you’ve already made a decision on business intelligence (BI) tools, or another tool that needs to be directly connected to your data warehouse. Your data warehouse needs to be compatible with the rest of your technology stack.

Data latency or egress costs

Similar to the last point, you need to consider the cascading effects of which tools you use. An organization that has all other data tools in the AWS suite should have different considerations than a Google Cloud-based organization. A Snowflake data warehouse can live on AWS or Google, so it can be co-located with a cloud provider. This can help offset data egress costs, as Google and AWS both charge these fees.

Data latency is also something companies should consider. The relative speed at which data becomes available is important for actual usage and can also contribute to data egress costs.

Ease of use

Let’s be frank: data engineers, data scientists, data analysts, etc. are not the cheapest employees on the average company’s payroll. Any tool that doesn’t free up their time is one that’s costing you money. Make sure the tool is user-friendly before you make a commitment.

Features that help take time-intensive manual processes off your team’s plate can be especially valuable. For example, in some warehouses standard SQL can be used to transform data efficiently — and some of these steps can be automated. Because SQL is an incredibly common skill set in the data industry (and can also be quickly learned) warehouses that support SQL are a great option.

Defining indices can also be a burdensome process, but a warehouse like Snowflake can handle this automatically as it sees how your team queries data.

When tools make the warehouse easier to use, less technical users can still access data and more expensive technical users can focus on the aspects of their job that are more valuable to your organization.

Scaling to your needs — without wrecking your budget

Costs for data warehouses are tied to two things: storage and computation.

Storage costs are a straightforward concept: you pay for the amount of information you’re storing in your warehouse. In a columnar data warehouse, this is typically measured by the number of rows you’re storing in all of your tables.

Computation, or compute, costs are tied to the number of seconds of computation you use for your account. You’ll be using computation when you query one of your tables. This might be done as you transform a large data set to make it manageable for analysis. Transforming large data sets into smaller aggregates or summaries can also reduce the computation time for future queries — making those queries faster, and thus cheaper.

As your company grows and amasses more information, you’re going to see these costs increase. And depending on the plan you’re on, you could find yourself running up against limits — you don’t want to run out of space or computation, especially unexpectedly.

On one hand, if these costs are increasing, good things are likely happening at your company. You have more customers with information that requires storage. Maybe you’ve expanded to another market and have the corresponding marketing and sales data for that market now, too.

But, especially if you’re at a start-up or other smaller organization, you likely don’t have the luxury of spending without care on your warehouse service. You need a tool that works and scales with your needs.

Snowflake data warehouses are excellent in this area because storage and computation are decoupled, giving Snowflake customers greater flexibility. With many data warehouses, you’re paying for a certain amount of storage with a corresponding amount of computational resources. If your needs exactly match the level you paid for, great! But if they don’t, you’re either overpaying or you’re lacking resources.

With Snowflake, you can pay for the storage you need, without automatically signing up for too much unused data storage. But, you also aren’t at risk of running out of computational resources, because you can ramp these resources when you need them. For example, if you have significant transforms that you need to run once a day, you can run them and temporarily increase your computational resources, without paying for constant access to that much power. This can lead to substantial cost savings, without compromising on performance.

Why Snowflake is best-in-class

A Snowflake warehouse is the best in its category.

  • Easy implementation: Setting up a Snowflake warehouse is very easy, and the default settings for decisions like syncing data sources are sensible, leading to minimal work for most companies. It’s not just a robust tool, it’s a user-friendly tool.

  • Decoupling storage and computation: We covered this above, but the key takeaway is that you’re paying for what you need, without the risk of running into walls.

  • Industry-standard tool: Because Snowflake is an industry-standard tool, you can easily find the talent you need to work on the product. You don’t want a warehouse that limits the pool of qualified data analysts, data scientists, etc.

Mozart Data provides the easiest way to set up a Snowflake warehouse

Mozart Data’s modern data platform includes a Snowflake data warehouse, as well as support from our in-house data team. One of the most important considerations for companies, especially start-ups, is the ability to rapidly implement a tool and get to work. With Mozart, that’s never been easier.

No set-up or maintenance

Mozart Data takes care of setting up your warehouse and managing it, so you don’t need to devote engineering resources or hire a full-time data engineer or consultant. With Mozart, Modern Treasury was able to set up scalable data infrastructure and delay hiring their first analytics engineer.

Mozart Data has the technical expertise you lack in-house. We’re not just here to help you get set up, but also to troubleshoot issues, help you understand your data, and make sure you’re set up for long-term success with your data.

Cost savings

When you get your Snowflake warehouse through Mozart, we can pass along the bulk discounts we have, saving you about 30%.

Mozart Data is the fastest way to get the data infrastructure you need to tackle data projects with confidence. Want to learn more? Contact us here: https://www.mozartdata.com/contact-us.

Become a data maestro

Data analysis

Is Steph Curry a Good Shooter?

This post was written by Mozart Data Co-Founder and CEO, Peter Fishman.  In 2015, I became a season ticket holder

Education

Everyone Uses Data

This post was written by Shai Weener on Mozart’s data analyst team.  I was on a hike through the Marin

Business intelligence

The SQL Hurdle

This post was written by Shai Weener on Mozart’s data analyst team.  A couple of years ago, as I was