Spark SQL & Non-Historized Links (Release 4.2.0) | VaultSpeed

Spark SQL & non-historized links (Release 4.2.0)

November 3rd, 2020

DSC08181 2
Jonas De Keuster
Releases non hist link thumb 1191x555

A new major release for VaultSpeed is available and it comes with a few key changes.
We introduce a new target platform with Apache Spark. Second, our users can now experience a giant leap in UX from our new source editor. We added support for non-historic links and last but not least: VaultSpeed Studio becomes available in open alpha.

Apache Spark & Spark SQL

From now on we support Spark SQL. VaultSpeed has added a new target platform type APACHE SPARK. The ETL language is Spark SQL. Finally, the actual object storage behind Spark can be Hive, Delta Lake or others.

 

For this first implementation we only support batch mode, but Spark streaming support is in the works. You can generate DDL for Spark together with the ETL. We do not support Auto-deploy yet, but we will add it later. VaultSpeed delivers the ETL code in the form of SQL files and our flow management solution uses a combination of JDBC and Spark SQL CLI to execute the Spark code as optimally as possible.

Non-historized links

In our quest to fully support data vault 2.0, we added support for non-historized links.
Also known as the transactional link, this object type is very important in Data Vault for loading large tables with transactional data like sales, payments or other events.
VaultSpeed supports 2 variants: one variant with a unique identifier (such as transaction id) and one without. You can set the transaction ID in the source editor, this is a new attribute type.

A Non-Historized link by itself is a variant of a Many to Many link table but it does not have a satellite so it does not track changes (no Hash Difference). Obviously, you will only insert records into this table type. When you are using a unique identifier it also has a where not exists filter. Firewall views will filter out records based on that attribute when no CDC solution is available.

 

The payments table is modeled as a non-historized link

Source Graphical editor

Current users might have noticed in the screenshot above: Our source graphical editor was completely redesigned! It works faster, smoother, more user friendly, contains more information and is more pixel perfect than ever before. It also answers better to the standards of modern day browsers. In the video below, you can see how to model your sources and prepare them for introduction in your data vault model.

 

Source Editor Demo Video

VaultSpeed Studio

Vaultspeed Studio is now in open alpha. Everyone can try it for free for 30 days, this trial period starts when you create your first template and is limited to 5 templates. Read more about VaultSpeed Studio in this previous post.

 

Other Changes

  • You can now ping the agents, this will display the host names of the machines where agents are running. On the Agent page you can also kill agents.

 

  • We added support for PITs on sources with different loading frequencies, VaultSpeed’s Flow Management will dynamically scale the Business Vault loading window based on the loading windows of the sources that where loaded before.
  • In addition of the non-historized link, we also added support for Same-as Links and Hierarchical Links.
  • We improved the level of parallelism between tasks, the following tasks can be run at the same time now: DDL and ETL generation, deploy and generation.



Spread the word