See how VaultSpeed automation speeds up the design, creation and deployment of a Databricks lakehouse platform.
Accelerate your Databricks Lakehouse setup
Watch the video that will guide you through the steps of how VaultSpeed automation makes it possible to get your lakehouse up and running in days or weeks and not months.:
Harvesting metadata
Tech stack parametrization
Business model mapping
DDL & DBT model generation
Pipeline deployment
Orchestrated data loading
VaultSpeed + Databricks integration
VaultSpeed extracts metadata from any source and creates an integrated data model that allows your business users to extract valuable insights from your data. It effortlessly integrates with the Databricks Lakehouse platform and provides loading patterns in the form of Databricks Spark SQL notebooks for DDL and DML code and Scala notebooks to run streaming processes.
Data Vault is well suited to the Lakehouse methodology
The aim of Data Vault modeling is to meet evolving business needs and enable quick and flexible development of data warehouses through deliberate design. Data Vault aligns well with the lakehouse approach because of its adaptable and detailed hub, link, and satellite structure, facilitating seamless implementation of design and ETL changes.
Data Vault modeling recommends using a hash of business keys as primary keys. Databricks supports hash, md5, and SHA functions out of the box to support these primary keys.
VaultSpeed Data Vault automation
VaultSpeed helps you to model the Data Vault and delivers the data structures and the ETL needed for data loading. The tool blends both data-driven and model-driven approaches:
ingest metadata from the source to speed up the modeling process
incorporate the business model to build a Data Vault model that resembles your business
VaultSpeed’s data architecture is very closely related to the bronze-silver-gold setup proposed by Databricks.
For Bronze & Silver, VaultSpeed brings you no-code Data Vault automation. Data Vault is a pattern that works, no need to break it.
As for the Gold layer, the structure can be anything, from star schemas to flattened tables. VaultSpeed’s Template Studio luckily allows you to code almost any use case.
Use VaultSpeed’s flow management control (FMC) add-on module to ensure that all data pipelines are executed at the right time, and in the right order. Deploy and schedule your workflows in best-of-breed schedulers like Azure Data Factory or Apache Airflow.
Streaming
VaultSpeed’s Streaming add-on module enables you to stream data into your Data Vault using Spark Structured Streaming. VaultSpeed — out of the box — supports conventional solutions for loading data from source to target. Multiple flavors of batch and CDC loading are available. But if conventional data loading isn’t enough for you, VaultSpeed will generate two types of code:
Scala code to be deployed into your Databricks cluster becoming the runtime code.
DDL code to create the corresponding Data Vault structures on the target platform.