Databricks automation demo | VaultSpeed

Databricks automation demo

Automate your data lakehouse

See how VaultSpeed automation speeds up the design, creation and deployment of a Databricks lakehouse platform.

Accelerate your Databricks Lakehouse setup

Watch the video that will guide you through the steps of how VaultSpeed automation makes it possible to get your lakehouse up and running in days or weeks and not months.:

  1. Harvesting metadata
  2. Tech stack parametrization
  3. Business model mapping
  4. DDL & DBT model generation
  5. Pipeline deployment
  6. Orchestrated data loading

VaultSpeed + Databricks integration

VaultSpeed extracts metadata from any source and creates an integrated data model that allows your business users to extract valuable insights from your data. It effortlessly integrates with the Databricks Lakehouse platform and provides loading patterns in the form of Databricks Spark SQL notebooks for DDL and DML code and Scala notebooks to run streaming processes.

Databricks architecture

Data Vault is well suited to the Lakehouse methodology

The aim of Data Vault modeling is to meet evolving business needs and enable quick and flexible development of data warehouses through deliberate design. Data Vault aligns well with the lakehouse approach because of its adaptable and detailed hub, link, and satellite structure, facilitating seamless implementation of design and ETL changes.

Databricks layers

Data Vault modeling recommends using a hash of business keys as primary keys. Databricks supports hash, md5, and SHA functions out of the box to support these primary keys.

VaultSpeed Data Vault automation


VaultSpeed helps you to model the Data Vault and delivers the data structures and the ETL needed for data loading.
The tool blends both data-driven and model-driven approaches:

  • ingest metadata from the source to speed up the modeling process
  • incorporate the business model to build a Data Vault model that resembles your business

VaultSpeed’s data architecture is very closely related to the bronze-silver-gold setup proposed by Databricks.

Databricks ref architexture

For Bronze & Silver, VaultSpeed brings you no-code Data Vault automation. Data Vault is a pattern that works, no need to break it.

As for the Gold layer, the structure can be anything, from star schemas to flattened tables. VaultSpeed’s Template Studio luckily allows you to code almost any use case.

More on Data Vault 2.0 on the Lakehouse in the Microsoft industry blog

Create workflow schedules

Use VaultSpeed’s flow management control (FMC) add-on module to ensure that all data pipelines are executed at the right time, and in the right order. Deploy and schedule your workflows in best-of-breed schedulers like Azure Data Factory or Apache Airflow.

ADF for databricks

Streaming

VaultSpeed’s Streaming add-on module enables you to stream data into your Data Vault using Spark Structured Streaming. VaultSpeed — out of the box — supports conventional solutions for loading data from source to target. Multiple flavors of batch and CDC loading are available. But if conventional data loading isn’t enough for you, VaultSpeed will generate two types of code:

  1. Scala code to be deployed into your Databricks cluster becoming the runtime code.
  2. DDL code to create the corresponding Data Vault structures on the target platform.
Spark streaming code example

Spread the word

Start accelerating the deployment of your Databricks Lakehouse faster

Without the risk. Without the stress.