• Mozart Data

Data Dictionary

Data dictionary with magnifying glass and a book

Columnar vs. Row storage: Column stores are ideal for highly analytical query models. Row stores are better suited for transaction processing. Columnar storage is excellent at aggregating large data volumes for column subsets. Row storage is great at writing data quickly.

Foreign Keys: A field in a table that is a primary key in another table. It is essentially a reference to a record in a separate system that is either a different table or an entirely separate database/system. Primary keys never accept NULL values, but a foreign key can be NULL. There is just one primary key for each table, but you can have multiple foreign keys in a database table.

DAG: Directed Acyclic Graph or DAG is a graph with arrows pointing from one event to another. It forms a cycle that never actually closes. DAG is often used to define dependencies and relationships between different events.

ERD: An Entity Relationship Diagram or ERD describes how concepts, entities, and other things relate to each other. Developed for business analysts and business users, ERD is leveraged to connect concepts to database structures and understand the business domain better.

ELT: Extract, Load, Transform, or ELT extracts data from disparate sources and loads it into the data warehouse. This approach helps enhance the processing of massive data sets for business intelligence tools.

ETL: Extract, Transform, Load, or ETL is a process of extracting data from disparate sources and transforming it to a digestible format before loading it into a data warehouse. This approach makes it easier for businesses to analyze data and make critical business decisions quickly.

Indexes: This is a layer on top of your data that helps improve speed when querying. It is like a glossary at the back of a book that allows you to scan for a term quickly. But like a glossary, if the index contains every word in the book, it slows down the process. Indexes are typically best used on the most queried part of a table, for example, a user ID field.

Reverse ETL: This is the opposite of ETL. It is the process of copying data from data warehouses and loading it into operational systems like SaaS tools leveraged for growth, marketing, sales, and customer support.

Schema: A schema refers to how data is organized and constructed in a database. A database schema embodies the logical configuration of a whole or part of a relational database. It can exist both as a set of formulas (integrity constraints) and as a visual representation.

Database: A database is a collection of structured data that is organized and stored electronically in a computer system. Popular database types include MySQL, MariaDB, MongoDB, and PostgreSQL.

Data Warehouse: An enterprise data warehouse or data warehouse is a system that aggregates data from disparate sources to enable easy access and analysis. Data warehouses often store massive amounts of data that is ready to be queried by data professionals and business analysts.

Data Lake: A data lake is like a data warehouse but at a grander scale. It is a centralized repository that stores both structured and unstructured data at any scale.

Data Lakehouse: A data lakehouse is a data repository structure that combines the key benefits of both data lakes and data warehouses. The data lakehouse makes it easy for both data scientists and business users to work on the same sources seamlessly. This approach also makes it easier to enforce data governance policies.

OLTP: Online Transactional Processing or OLTP is a data processes protocol that executes transaction-focused tasks. Often used for financial transactions and retail sales, the process involves inserting, deleting, and updating small amounts of database data.

OLAP: Online Analytical Processing or OLAP is a tool that performs multidimensional analysis at high speeds on large data volumes from a data mart, data warehouse, or a centralized and unified data store.