Guides

Data Vault: the best fit for automation

The pattern-based design of Data Vault 2.0 greatly enhances the automation of the enterprise data warehouse, lakehouse, or mesh.

What is Data Vault 2.0?

Data Vault 2.0 is a data modeling method that offers a flexible, scalable, and agile approach to organizing and storing data in any data warehouse, lakehouse, or mesh.

Introduced by Dan Linstedt in the 1990s, Data Vault has gained popularity since.

It is particularly well-suited for the automation of data integration while accommodating changes in source data structures over time.

What is Data Vault modeling?

Data Vault modeling breaks down all incoming data into three simple standard components, forming a model that is engineered to connect all the dots:

Hubs: Represent business entities (e.g., product or customer) and serve as the central point for connecting relationships.
Satellites: Contain descriptive attributes about the entities stored in hub tables, capturing changes over time.
Links: Capture relationships between entities and enable the modeling of complex business scenarios.

A hub comprises a unique business key for identification, a hash key to support parallel loading, a load date for technical historization, and a record source for debugging. The business can vary; for instance, an employee can be identified by an employee number, and a car by a vehicle identification number (VIN). Multi-part business keys, utilizing multiple columns, are common.

Links and satellites follow a similar structure but with variations. For links, it involves implementing business key relationships, and for satellites, it's the structure of descriptive data. Despite small differences, a clear pattern exists in these entities. The loading procedures exhibit similar patterns; all hub loading procedures, for instance, share similarities.

What is Data Vault architecture?

The Data Vault standard comprises architectural guidelines for the structure of a data warehouse, lakehouse or mesh which VaultSpeed entirely follows:

The integration and storage area is crucial, absorbing changes and additions of sources and serving as a backward-compatible layer for incoming requests on subsequent data consumption layers. It consists of 3 layers:

The landing zone is where all source data initially enters the data platform. The data maintains its source format and model.

The Raw Data Vault contains raw, historical, unfiltered data from the sources. The raw data describes the facts of the source system. They prove that something exists or has occurred.

The Business Data Vault harmonizes business keys/terms from the source system with the anticipated model, ensuring alignment and compliance. It is also the layer where additional business logic is implemented.

VaultSpeed ensures no disruption in ingestion, transformation, and modeling by delivering automated code adhering to Data Vault standards more than any other automation tool, earning the first Data Vault certification.

What are the benefits of applying Data Vault?

Common Understanding: Data Vault uses standard components understood by all stakeholders. Your business model is represented in the Data Vault model
Repeatable Patterns: Automation requires repeatable patterns, precisely what Data Vault delivers, transforming and grouping data at the highest level of abstraction.
Continuity: Adherence to Data Vault standards promotes team-wide use of the same model and methodology, reducing dependence on individual team members or external consultants.
Agility: New datasets can be added without reloading the entire Data Vault, facilitating the ongoing integration of new sources and accommodating changes in existing sources.
Historical Tracking: Satellite tables capture changes over time, providing a historical perspective on data.
Scalability: The architecture is designed to be scalable, making it capable of handling large volumes of data with ease.
Powering AI: LLM and AI generate better results with structured data in the Raw and Business Data Vault.

The best way to use Data Vault?

If you have multiple sources to integrate and your source architecture is prone to changes, you should definitely implement a Data Vault.

But forget about trying to do this manually. Automation is what makes your Data Vault come alive.

VaultSpeed's automated data transformation is the 4th generation of data automation, which doesn’t require any coding from your data team to make automation work for your specific technology or data stack.

It suggests a Data Vault model from the start, based on smart analysis of metadata harvested from sources. VaultSpeed’s graphical user interface (GUI) comprises a comprehensive data modeler to accept, correct, or enrich the proposed solution, the final decision rests with the user.