Automation² API, Matillion for Azure Synapse and Custom ETL settings (Release 4.2.6)

VaultSpeed is creating a habit of launching a major release just before the holiday season! This year is no exception, so 4.2.6 is loaded with Santa’s gifts!

So, what’s new?

We completely redesigned our API to make it publicly accessible and consumable. Matillion ETL for Azure Synapse is now available. And we added functionality to your data pipelines: you can now customize your ETL mappings and add additional code.

Automation² API

Our API has been substantially reworked. You can start calling corresponding API endpoints for all the data and actions available in our application. We’re proud of this achievement making VaultSpeed the first tool to deliver a REST API for data vault automation. The API enables further integration with other tools and allows users to truly automate the automation = Automation².

Rest API docs

 

With our API, you can start automating tasks such as:

  • the creation of a new source version
  • the configuration for similar sources
  • loading metadata into your preferred data lineage or data governance tool
  • the import of business view definitions
  • the migration of your existing Data Vaults into VaultSpeed
  • and much more

The screenshot below shows the setup for automatic agent download via the API.

 

Download the agent using curl to the API endpoint

 

Another example:

This screenshot illustrates how VaultSpeed metadata is extracted via Snowflake’s Matillion ETL API. A schedule running this mapping would sync all Data Vault lineage and metadata straight into Snowflake!

 

VaultSpeed API to Snowflake mapping built in Matillion ETL

 

Data Vault metadata loaded in Snowflake

 

Not all endpoints will be included in our standard licenses, but some will always be available, such as downloading the Agent or the Airflow plugin.

Matillion Synapse

On the ETL side, we’ve added support to run Matillion Synapse. VaultSpeed now generates Matillion ETL code for Synapse Data Vaults.

Matillion users can automate the pipelines that load data in the Data Vault area and focus on tailor-made transformations in the other layers of the cloud data warehouse.

It is good to know that Matillion has just released CDC support. This opens the opportunity to land the data from different sources, making Data Vault’s integration even more effortless.

Our current support for Matillion includes both Snowflake and Synapse , and we are looking to extend it to other cloud data platforms in the near future.

 

Generated SAT mapping in matillion for Synapse

Data pipelines

This release contains a significant development to make your data pipelines run smoother.

VaultSpeed has offered the possibility to add custom code snippets to generated DDL code for quite some time now. Think of examples like DDL for transient tables or partitioning definitions.

We’re now allowing users to add custom code to the generated mappings as well. Depending on your preferred ETL solution, different settings can be applied.

The example below shows SQL procedures.

 

Example of custom ETL snippets added to procedures

 

The possibilities are endless — from changing execution grants by adding “Execute as owner” to adding custom logging statements with row counts after every DML statement.

The complete documentation can be found at  https://vaultspeed.atlassian.net/wiki/spaces/VPP/pages/2701370816/Generating+Code#ETL-Settings.

Other important changes:

  • We added the possibility to define DDL settings for all the standard BV objects (no VaultSpeed Studio templates at this stage, those will be added later).
  • The initial load STG mappings not only use the extraction table but also the SATs to look up BKs. This comes in handy, mainly for delta generations when loading the initial data for a new object with references to an existing object.
  • We added two extra DV parameters: CAST_TO_NVARCHAR_IN_HASH and CAST_TO_VARCHAR_IN_HASH. These can be used to control the hashing behavior and determine which type the business keys are cast before hashing them. These parameters are beneficial for SQL Server and Synapse and are mutually exclusive.
  • A new logic applies to the BV release creation to catch the cases where bridges become invalid. When an object gets deleted from a Data Vault while being used in a bridge, the initial BV release created when locking that DV release will be unlocked. No code can be generated for this new DV release. It can only be rendered after resolving the issue in the bridge and locking the business vault. While there is still an invalid bridge in a BV, hovering over the (grayed out) lock button will display the faulty bridge.
  • Hard deletes can now be generated for ODI. The deletes are implemented using a setBeginCmd containing the delete SQL statement.
  • We updated our template language to allow for repeating templates. You can now generate a query for every SAT of a HUB or all DV objects in a bridge in the VaultSpeed Studio templates.

Example of the template code:

Template $ DVO_TEMPL 
templaterepeatedbycomponent DVO
...

So, lots of new stuff to play with for next year. All we want to do now is wish you happy holidays. More exciting features are coming in 2022.
Spoiler alert, some of them involve Spark Streaming!