Unparalleled data warehouse automation for Databricks

Databricks

Lakehouse Platform combining the best elements of data lakes and data warehouses.

Databricks Lakehouse Platform

The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance and performance of data warehouses with the openness, flexibility and machine learning support of data lakes.

With a lakehouse, you can eliminate the complexity and expense that make it hard to achieve the full potential of your analytics and AI initiatives.

Databricks architecture

How VaultSpeed can integrate with Databricks

Data Vault is well suited to the lakehouse methodology

The goal of Data Vault modeling is to adapt to fast-paced changing business requirements and support faster and agile development of data warehouses by design. A Data Vault is well suited to the lakehouse methodology since the data model is easily extensible and granular with its hub, link and satellite design so design and ETL changes are easily implemented.

Lakehouse architecture

Databricks Bronze, Silver & Gold layers

Data Vault modeling recommends using a hash of business keys as primary keys. Databricks supports hash, md5, and SHA functions out of the box to support these primary keys.

VaultSpeed Data Vault automation

VaultSpeed helps you to model the Data Vault and delivers the data structures and the ETL needed for data loading.

The tool blends both data-driven and model-driven approaches:

  • ingest metadata from the source to speed up the modeling process
  • incorporate the business model to build a Data Vault model that resembles your business

VaultSpeed’s data architecture is very closely related to the bronze-silver-gold setup proposed by Databricks.

Databricks ref architexture

Reference Architecture

For Bronze & Silver, VaultSpeed brings you no-code Data Vault automation. Data Vault is a pattern that works, no need to break it.

As for the Gold layer, the structure can be anything, from star schemas to flattened tables. VaultSpeed’s Template Studio luckily allows you to code almost any use case.

Create workflow schedules

Use VaultSpeed’s flow management control (FMC) add-on module to ensure that all data pipelines are executed at the right time, and in the right order. Deploy and schedule your workflows in best-of-breed schedulers like Azure Data Factory or Apache Airflow.

Screenshot 2022 07 05 at 14 45 57

ADF for Databricks

A solid foundation for analytics

First things first: to build data products, you need data. VaultSpeed helps you to get your data into the lakehouse continuously. Now you can run state-of-the-art analytics to help answer the complex end-to-end questions that drive better business decisions. When specifications change, you only need to rebuild and reload your data in the analytics or application area, while the Data Vault layer safely collects and stores all the data you need.

Streaming Data Vault

VaultSpeed’s Streaming add-on module enables you to stream data into your Data Vault using Spark Structured Streaming. VaultSpeed — out of the box — supports conventional solutions for loading data from source to target. Multiple flavors of batch and CDC loading are available. But if conventional data loading isn’t enough for you, VaultSpeed will generate two types of code:

  • Scala code to be deployed into your Databricks cluster becoming the runtime code.
  • DDL code to create the corresponding Data Vault structures on the target platform.
Learn more
Screenshot 2022 06 27 at 22 07 49

Spark Streaming code example

See it in action

Discover how to deploy your Databricks lakehouse faster.

Without the risk. Without the stress.