Snowflake Procedures (Release 4.1.17)

Snowflake Procedures (Release 4.1.17)

In the latest VaultSpeed release we included important improvements for Snowflake customers ❄️.
We also used valuable customer feedback to build some improvements for VaultSpeed Studio. Finally, we partly redesigned ELT generation for Talend to increase generation speed and robustness.

Snowflake

Snowflake is one of the most popular data platforms around these days. The success of their recent IPO emphasizes that. We support Snowflake as a target and starting now, the integration just got better.
Following up on exciting developments in Snowflake, VaultSpeed now generates ELT code for Snowflake that is wrapped in Javascript procedures. From now on you can actually deploy and store your procedures inside Snowflake. One of the main advantages of this is that our workflow solution in Airflow can call these procedures instead of executing saved SQL files.
We also enabled auto deploy for Snowflake: you can now deploy DDL and ETL to the target using VaultSpeed’s Agent. Both changes together make it possible to start loading your data warehouse without any manual interaction by your developers.

 

Snowflake stored procedures

VaultSpeed Studio

In a previous release we announced VaultSpeed studio in closed alpha version. In the past few weeks we went to work with initial customer feedback. First thing we improved is the integrated template code editor. It includes smoother navigation options, changing and saving a template is more solid and you now have the option to compare your changes to the pre-edit version.

Template code editor

We added a view all option to the target definition where you can see all attributes of the template target. Previously, both existing and newly created target attributes were shown in separate windows.

In a previous post we explained the need for signature fields when doing automation. We did a complete redesign of the signature definition screen. You can create and select signature attributes on top of the screen and assign them to attributes in a list below. This list can be filtered before assigning a certain signature to a filtered set of fields.

Signature attributes

VaultSpeed studio will move to open alpha in one of the next releases. From then on, all clients can start a one month free trial period with VS Studio.

Talend

Talend is one of the first ELT tools for which we supported automation. This is done by generating jobscript that can be deployed to generate ELT mappings inside Talend Studio’s repository.

Due to improvements and changes in their product, the need for a major update of our template compiler for Talend became apparent. The result is that the generation of Talend jobscript is much more robust. We also took a giant leap in terms of speed: jobscript generation is up to 3 times faster compared to the previous release.

Quality of life changes

Some smaller changes will certainly improve user experience:

  • We added a download button to the automatic deployment screen. This allows users to download generated code through the browser instead of having to obtain it from the agent.

 

 

  • We improved the description of the attribute based SAT split.
  • You can directly upload CSV files to update short and abbreviated names in the source trough your browser.
  • We moved the automatic deployment menu out of the generation settings since it didn't really belong there.
  • Users will experience improved loading performance of parameter screens.
  • We added an extra tab to the DDL settings page where you can get an overview of all applied settings per layer.
  • Business views are checked for naming conflicts with other objects. We also provided users with a button to disable/enable generation for all business views (to use in combination with filters).
  • Releases in the delta generation tab are now sorted such that the latest one is always shown first.
  • We added subscription updates as data points to the generations graph on the dashboard. It will also show the next subscription reset date.
    Based on the number of jobs granted in your subscription, we set a warning when the mapping counter is above 80% or at 100% of the subscription limit.
  • And a lot more that did not make this post... 🤷‍♂️

 

Our next release is coming up quite fast. Would you like to stay tuned on future VaultSpeed releases? Fill in this form 👇

 


Automated workflows for Apache Airflow

Automated workflows for Apache Airflow

Data Warehouse Automation is much broader than the generation and deployment of DDL and ELT code only. One of the area's that should also be automated is run and monitoring. In this area, VaultSpeed chose to integrate with Apache Airflow. Airflow is one of the most extensive and popular workflow & scheduling tools available and VaultSpeed generates the workflows (or DAG's) to run and monitor the execution of loads using Airflow.

Intro

The ultimate goal of building a data hub or data warehouse is to store data and make it accessible to users throughout the organisation. To do that you need to start loading data into it. One of the advantages of data vault 2.0 and the use of hash keys in particular is that objects have almost no loading dependencies. This means that with data vault, your loading process can reach very high degrees of parallelism.
Another advantage is that data can be loaded at multiple speeds. You might want to load some objects in (near) real time. Hourly, daily or even monthly load cycles might be satisfactory for other data sources.
In every case, you need to set up the workflow to load each object. And you need to be able to schedule, host and monitor this loading process.
In this article we will explain how to leverage automation for another important part of the data warehouse: develop, schedule and monitor your workflows.

 

VaultSpeed setup

VaultSpeed generates DDL and ELT code for your data warehouse, data hub, ... Once deployed on the target environment, these components form the backbone of your data ingestion process. The next step is to organize these objects into a proper workflow and build proper Flow Management Control.

Parameters

The first step in this process is to setup some parameters inside VaultSpeed. This setup will influence how the Airflow DAGs are built:

 

FMC parameters

 

  • USE_FMC: Whether or not you will use Flow Management Control, this will cause some extra objects to be generated with your DDL.
  • FMC_DYNAMIC_LOADING_WINDOW: You can choose whether the beginning of the loading window is determined by the end of the last successful run. Or alternatively, you can choose that the load will have a static loading window but will wait for the successful execution of the previous run.
  • SCHEMA_PL_PROCEDURES: schema where your Presentation Layer procedures are stored, this is used when generating code for loading the PL.
  • FMC_GENERATE_SRC_LOADING_SCRIPTS: If enabled, extra tasks will be added to the generated workflow that will transfer data from your source system to the data warehouse.
  • FMC_SKIP_COMPLETED_TASKS: In case of an error you need to restart your load. This parameter indicates whether tasks in the FMC are run only once if they were successful, or if all tasks are rerun upon restarting.
  • SOURCE_PL_DEP: This parameter determines whether the load of the Business Vault & Presentation Layer should wait for the load of a source to be successful or simply completed.

Add workflows

Once you have setup the parameters it is time to generate Airflow code. In our Flow Management Control screen, you can find all workflows and settings. Adding an FMC workflow is quite easy:

 

Adding an FMC Workflow

 

You can generate flows for your Raw Data Vault or Business Vault.
You have to select a load type, data vault and one of your sources. VaultSpeed always builds a single flow for a single source. You can add dependencies between them in Airflow.

Choose a DAG name (A Directed Acyclic Graph (DAG) is the name Airflow uses for a workflow). This name should be unique to this workflow e.g. <dv name>_<source name>_<load type>. Each DAG should also have a description.

We also offer a feature to enable task grouping. When using SQL code and mini batches, this reduces the overhead of creating connections to the DB. So if connecting takes a lot of time compared to actual mapping execution, enabling this option should save you some load time.

Choose a start date, this is the date and time at which your initial load will run, and the start of your first incremental load window.

By setting the concurrency, you can indicate how many tasks in this workflow are allowed to be executed at the same time. This depends on the scalability of the target platform and the resource usage (CPU) of your airflow scheduler and workers (can easily be changed later).

Finally you enter a name for your source and target database connections, these connections will be defined in Airflow later.

When you choose the incremental flow type, you need to set a schedule interval. This is the time between incremental loads, this can be

"@hourly" : Run once an hour at the beginning of the hour
"@daily" : Run once a day at midnight
"@weekly" : Run once a week at midnight on Sunday morning
"@monthly" : Run once a month at midnight of the first day of the month
"@yearly" : Run once a year at midnight of January 1
cron expression:(e.g. "0 0 * * *")
python timedelta() function  (e.g. timedelta(minutes=15))

Generate code

To generate code for a workflow, you will need to pick a version of your generated ELT’s. Select the one for which you want to build your workflow. Hitting "Start Generation", will launch generation of the code.

A few files will be generated in the back, first is a python script containing the actual DAG. Also some JSON files are added containing the rest of the setup.

 

Generating workflows

 

You can always have a look at previous workflow generations, their settings at that time, the data vault and source release, when they were generated and the name of the zipfile containing the generated python code.

Airflow Setup

To get started with Airflow you need to setup a metadata database and install Airflow. Once you have a running Airflow environment you need to finish some setup. Our Vaultspeed FMC plugin for Airflow does most of this for you.

After setting up database connections and adding some variables you can import your newly generated DAGs. You can do this manually, but of course automated deployment from VaultSpeed straight to Airflow (or through a versioning system) is also possible.

 

DAGs in Airflow

 

To start executing loads, start the scheduler and the workers.
When you unpause the freshly imported workflow, your DAG will start running.
Current day and time need to be past the start date in case of the initial load. The incremental load will start after start date + interval.

The picture below shows the DAG we generated in this example. You can clearly see we used the grouping function as it has grouped tasks into four parallel loading sets per layer.

 

Generated Airflow DAG

 

Airflow offers an intuitive interface in which you can monitor workflows, view logs and more. You can get a good overview of execution times and search for outliers. Upon failure you can inspect logs of separate tasks and start debugging.

 

Conclusion

Building a data warehouse, data hub or any other type of integration system requires building data structures and ELT code. But those two components are useless without workflows. You can build them manually, but VaultSpeed offers a solution that is less time consuming and more cost effective.

Automation in VaultSpeed is not limited to code generation and deployment. VaultSpeed also automates your workflows. When considering a standard workflow solution, Apache Airflow is our tool of choice.

About Apache Airflow: It is very well established in the market and is open source. It features an intuitive interface and makes it easy to scale out workers horizontally when you need to execute lots of tasks in parallel. It is easy to run both locally or in a (managed) cloud environment.

With a few steps, you can setup workflow generation, versioning and auto-deployment into your Airflow environment. Once completed, you can start loading data.

VaultSpeed’s FMC workflow generation is sold as a separate module in addition to the standard environment. It targets clients looking for a proper solution to organize workflows in their integration platforms. More details on pricing for this automation module are available upon request.

 

 

 

 


Multi-Active Satellites and Related Links

Multi-Active Satellites and Related Links

by Dirk Vermeiren, CTO - VaultSpeed

Introduction

The power of VaultSpeed lies in the extended pre-defined templates that generate code based on a number of configuration parameters. In case you work with a template based solution, you will have to solve these extended issues inside the templates you build. Additionally you will have to test and maintain them.
In a number of blogs we want to address some complex challenges and how a solution is provided out of the box by VaultSpeed.

This blog describes the challenges and the solution for the implementation of Links between Multi-Active Satellites with subsequence source attribute(s).

The challenge described in this post is twofold: First we need to define the level of detail for the link. Second, we also need to decide about what happens when a delete occurs on the source. Do we choose for the link record to be “virtually” deleted or not?

What is a Multi-Active Satellite ?

A Multi-Active Satellite is a Satellite that can have multiple active records for the same business key. There can be different types of Multi-Active Satellites :

A Multi-Active Satellite with subsequence source attribute(s) :

In this case there is an attribute in the source object that will make the data in the Satellite unique again. The combination of the Business Key(s), subsequence attribute(s) and load date will be the unique key of the Satellite.
A language code for example can be a subsequence attribute in a source system.

A Multi-Active Satellite without a subsequence source attribute

This is a less common type but it means there is no unique source attribute. In this case we intodruce a generated sequence number per Business Key to ensure the combination of the Business Key(s), the generated subsequence attribute and load date is unique.

This blog describes one of the scenario’s possible when combining Multi-Active Satellites with Links. It explains how VaultSpeed’s logic solves these challenges as a standard functionality. Just by setting a few parameters.

The case uses data models describing a fictive motor bike company. An example that is commonly used in VaultSpeed demo’s or documentation.

The Case

Sources

The challenges are explained using 2 Multi-Active Objects within the VaultSpeed MOTO SALES demo source. These objects are the Product_Features and Product_Features_Category tables as shown on Figure 1. You can see some example records on Figure 2 for Product Features and Figure 3 for Product Feature Categories.

 

Figure 1
The Language codes in both of the objects are the source Subsequence attributes and there is a foreign key from the product feature object to the product feature category object based on the product_feature_cat_id attribute.

 

Figure 2
Product Features
Figure 3
Product Feature Categories

 

The source case has the following characteristics :

  • The Subsequence Attribute is a source attribute
    • In both tables the Business Key Attribute (indicated in Red) is not the same as the Primary Key attribute (indicated in Blue).
      It means that the relationship is not based on the Business Key but on other attributes. A lookup is needed for building the Link HashKey as described in a previous blog on our website.
  • The Primary key is at a lower level of detail than the business Key in both the source objects, product_feature and product_feature_cat
    • The rows indicated in Blue show the primary key, the rows indicated in Red, contain the Business Key.
      As you can see, for the same Business Key value you have multiple primary Key Values.
      It means that for a lookup via the Business key, multiple primary keys will be returned. The logic must be able to cope with it.
  • Both Objects in the relationship are Multi-Active Objects
    • The product_feature object as well as the product_feature_cat object are multi-active objects. A lookup of a business key for the link can result in multiple records being returned.

Raw Data Vault

  • The Raw Data Vault uses the INSERT ONLY LOGIC approach
    With INSERT ONLY LOGIC, there are NO load_end_dates and the only actions allowed are inserts. This means that a DELETE results in the insert of a new record with the same attribute values as the previous record and the DELETE_FLAG is set equal to Y

Setup in VaultSpeed

The Setup of the source and the Raw Data Vault for the above implementation of Source and Data Vault is very simple and requires changing 1 parameter, setting 1 switch per object and indicating 1 subsequence attribute per source object.

  • Build the Raw Data Vault using INSERT ONLY LOGIC
    VaultSpeed System Parameter : INSERT_ONLY_LOGIC = Y
  • The Product_Features and Product_Features Category Source Object are Multi-Active.
    Set VaultSpeed Switch Multi-Active to YES
    using the Graphical Source Editor. This indicates that the Product Feature Source Object is a Multi-Active Satellite (Figure 4).

    • By right clicking on the product feature and product feature class Language attributes you can set their attribute types to “Subsequence Attribute” to indicate that that they are a source based Subsequence Attribute (Figure 5).

 

Figure 4
Figure 5

Links between Multi-Active Objects

We encounter a number of specific choices and challenges that need to be covered by the loading logic for the Links between Multi-Active Objects:

Subsequence in Link or Satellite on Link ?

Link

No: the Subsequence Attribute is a part of the Key of the Satellite, but it is not a part of the Business key. So when building the Link you just load the link object with the unique combinations of the hash on the business keys of the relationship.

Satellite On Link

Here you could make the choice to add the SubSequence number to the Satellite on Link. In our example, we created a link between 2 Hubs with Multi-Active Satellites.

In this case, if you take the Subsequence Attribute in account in the Satellite on Hub you would get the cartesian product of all the combinations of all the subsequence attributes values of a Link key as shown on figure 8. The example case would have been even more compelling if the Multi-Active data had different Subsequence Attributes.

Considering this, VaultSpeed has chosen not to include the Subsequence Attributes in the Satellite on Link.

 

Figure 8

What about the Delete logic ?

When no subsequence attribute is present in the Satellite on Link there is some special management required when handling deletes on the Link Satellite.

A delete flag for the Satellite on Link can be placed to ‘Y’ only if all the multi-active records have been deleted. So not when only part of the multi-active records are being deleted.

We need to build a join with the Satellite that owns the data at the foreign key side of the relationship to ensure all the records are really deleted. Only then the delete record with delete_flag = Y can be inserted in the Satellite on Link (This insert is because of INSERT ONLY logic).

The logic explained


,dist_del_stg as
(
SELECT 
distinct product_features_hkey
FROM moto_stg_ms_tst_ref_scn1.stg_product_features stg1 
WHERE operation = 'D'
)

DIST_DEL_STG selects the delete records arriving from staging. They are used to limit the staging content and the Satellite content to the new records that have deletes.



,find_del_and_related_stg_rec as 
(
SELECT 
stg1.product_features_hkey,
stg1.pro_fea_lan_code_seq,
stg1.load_date,
stg1.load_cycle_id,
stg1.trans_timestamp,
stg1.operation,
stg1.error_code_fkprfe_pfca 
FROM moto_stg_ms_tst_ref_scn1.stg_product_features stg1
     JOIN dist_del_stg dds 
          ON dds.product_features_hkey = stg1.product_features_hkey
)

FIND_DEL_AND_RELATED_STG_REC selects the Delete record data from staging.



,find_del_related_sat_rec as 
(
SELECT 
sat.product_features_hkey,
sat.pro_fea_lan_code_seq, 
sat.load_date, 
sat.load_cycle_id,
sat.trans_timestamp,
sat.delete_flag,
sat.load_end_date,
max(load_end_date) 
          over (partition by sat.product_features_hkey,
                             sat.pro_fea_lan_code_seq
                order by     sat.load_date) 
       as max_load_end_date 
FROM moto_fl_tst_ref_scn1.sat_ms_product_features sat 
     JOIN dist_del_stg dds 
          ON dds.product_features_hkey = sat.product_features_hkey 
     LEFT OUTER JOIN moto_stg_ms_tst_ref_scn1.stg_product_features stg1 
          ON stg1.product_features_hkey = sat.product_features_hkey 
             AND stg1.pro_fea_lan_code_seq = sat.pro_fea_lan_code_seq 
             AND stg1.load_date = sat.load_date 
WHERE stg1.product_features_hkey is NULL 
      AND stg1.pro_fea_lan_code_seq IS NULL 
      AND stg1.load_date IS NULL
      AND sat.delete_flag = ‘N’
) 

FIND_DEL_RELATED_SAT_REC selects the records from the foreign key related satellite which have deletes in the incoming data. We need the MAX(Load_date) to find the active record in an INSERT ONLY Satellite. Also notice that the PARTITION BY is on the Hash Key AND the Subsequence Attribute as this is the unique key for a record in combination with the load_date.



,find_active_del_related_sat_rec as 
(
SELECT
sat.product_features_hkey,
sat.pro_fea_lan_code_seq,
sat.load_date,
sat.load_cycle_id,
sat.trans_timestamp,
sat.delete_flag,
sat.load_end_date
FROM find_del_related_sat_rec sat 
WHERE sat.load_end_date = sat.max_load_end_date
) 

FIND_ACTIVE_DEL_RELATED_SAT_REC does the selection of only the active records based on the MAX_LOAD_DATE calculated by the analytical function in FIND_DEL_RELATED_SAT_REC.



,not_deleted_records_lks AS 
(
SELECT 
COALESCE(fdsat.product_features_hkey, fdstg.product_features_hkey) product_features_hkey,
COUNT(1) count_existing_active 
FROM find_active_del_related_sat_rec fdsat
     FULL OUTER JOIN find_del_and_related_stg_rec fdstg 
          ON fdstg.product_features_hkey = fdsat.product_features_hkey 
             AND fdstg.pro_fea_lan_code_seq = fdsat.pro_fea_lan_code_seq 
WHERE (fdstg.product_features_hkey is null 
       OR fdsat.product_features_hkey is null)
GROUP BY COALESCE(fdsat.product_features_hkey, fdstg.product_features_hkey)
)


NOT_DELETED_RECORD then calculates how many active records there still are in the Satellite based on the incoming data with delete records.

The FULL OUTER JOIN is very important as there could be INSERT records after the DELETE in staging that need to be taken taken into account.

The logic of the Delete Flag that needs to be used in further logic is than shown below.

CASE WHEN COALESCE (NOT_DELETED_RECORDS_LKS.count_existing_active,0) = 0 
AND stg.operation = 'D' THEN 'Y'::text ELSE 'N'::text END delete_flag

If all the incoming records match with the still active records in the Satellite than the count = 0 and the Delete flag of the Satellite on Link can be set equal to Y otherwise the count > 0 and then the delete_flag will be set to “N”.

Conclusion

Through some simple parameterisation and a few manipulations inside the Graphical source Editor in VaultSpeed you get the full support of this feature in the Raw Data Vault layer of your integration system.

The choices made in this setup and their logical consequences were carefully analysed and all necessary logic to support all possible cases were subsequently implemented inside the standard templates of VaultSpeed.

The complexity is hidden for customers and delivers high value to with easy, out of the box configuration and implementation.

 

 


VaultSpeed Studio (Release 4.1.16)

VaultSpeed Studio (Release 4.1.16)

VaultSpeed STUDIO

Yesterday, VaultSpeed launched release 4.1.16. We are proud to announce a brand new module to our automation tool: VaultSpeed Studio.

Purpose

The introduction of VaultSpeed Studio is an important milestone for VaultSpeed and our Customers. It opens up the powerful VaultSpeed template language and allows you to build, test and put to work your own custom templates to generate any logic required.

VaultSpeed Studio offers a solution for those clients looking to put automation to work in other parts of their data warehouse. The raw Data vault consists of highly repeatable patterns. On the contrary, pre-staging area's or presentation area's require more specific logic. Studio enables customers to automate these patterns as well.

This first version is released in closed alpha (i.e. only accessible on request). It allows Customers and Partners to build templates for the creation and loading of objects in the Business Vault Area. The following types of objects can, for example, be created: calculated satellites, effectivity satellites, end date satellites (for insert only), custom PITs, custom Bridges and much more.

Repeatability versus customization

VaultSpeed covers almost any object in the Raw Data Vault. We also cover standard PITs and Bridges tables in the Business Vault area. These objects are highly repeatable as they look the same in every organisation.

Pre-staging area's, presentation area's and access layers and certain business vault objects require a different approach. While still being a good target for automation, they require more customization and creativity. They do not necessarily look the same across organisations.

We situate VaultSpeed Studio in the area where objects become a less repeatable and require more customization.

The Interface

VaultSpeed Studio comes as a separate module. In this release we added the studio module for the Business Vault. Once you select a Business Vault release it will appear in the top menu.

VS Studio template overview

After opening Studio, a main window will appear where you can see a list of saved templates. When starting a new template or when changing an existing one, you can use populair code editors like Visual Studio Code (for which we deliver a code plugin) or you can view, edit and test the code in the editor screen.

VaultSpeed Studio template example

Template Language

Our template language was built to distinguish logic from implementation details. By result the language is quite easy to learn.

Signature attributes

When building templates based on metadata it is necessary to make a categorization in fields. In VaultSpeed, we do this by defining signature attributes. A signature attribute from a source model can be a Business Key, Relation Key, or other. Next to the standard signature objects you can now define new signature attrubutes on which generic logic can be applied (repeatable pattern).

The more generic a signature attribute, the more repeatable the patterns that you can create with them. VaultSpeed's standard templates use these highly generic signature attribute for the generation of Raw Data Vault objects and Business Vault objects like PITs and Bridges.

However, when you are building custom templates, you must be able to create custom signature attribute. So when you would need to perform a certain operation on a set of custom fields, you will be able to define your own signature attributes in VaultSpeed Studio.

Apply a custom signature to a certain set of fields

Template language

The VSS template language has a hierarchical structure. Each level in the hierarchy starts with a certain keyword and continues until the next one or the end of the file. Each of these levels can be conditioned based on certain parameters and variables, this allows you to combine the logic for different types of objects into 1 template.

Pricing

VS Studio will be released in open alpha in one of the following releases. By then it will include a limited 30 day trial period of VSS for everyone, full usage of VSS will require an additional subscription (more details about pricing coming soon).


While VSS is probably the most important change in VS 4.1.16, there is some other new stuff included:

Foreign keys to REF tables

We added the ability to define foreign keys to reference tables. You can create relations between an object and a reference table and this will result in an additional ref_fk attribute in the satellite, these foreign key attributes will be coalesced with null values. This, combined with the addition of ghost records to the REF objects allows for inner joins between Satellites and Reference tables.

Multi Active Satellites

You can now create multi active satellites with a sub-sequence attribute in the source. An attribute can be set to be the sub-sequence attribute in the source graphical editor, this attribute in combination with the business key should be unique.

setting a multi-active SAT with a subsequence

Other features

We added a status indication for new and deleted objects in the grouping and DV overview screens. This change also introduces some performance improvements for these screens.

Selecting a reset option (i.e. not “keep everything”) when creating a new source release will reset the release to the latest one linked to a DV production release. If no production date is set yet, then it will reset fully like before.

There is an additional attribute type available in the source graphical editor: modification date. When the cdc type is modification date then setting a column to this type will indicate that it is the cdc-timestamp metadata attribute for a certain object.

It is now possible to delete entire projects: this can be useful for deleting a sandbox project that had been created for testing purposes. In this way you can keep your account nice and clean from clutter.

We added the ability to define unique keys in the source graphical editor, they can be defined by selecting attributes while holding ctrl/command and then right clicking and selecting unique key (and optionally a name for the key). You can also set an attribute to UK from the attribute sidebar.

Conclusion

VaultSpeed invests heavily in making the complex process of enterprise wide data integration as easy as possible. With this in mind, we will keep improving towards intuitive graphical interfaces that should assist the user in the decision-making process by implementing smart suggestions and eliminating as much of the manual work as possible.

Not all layers of a data warehouse can match the repeatability of the raw data vault layer. Pre-staging layers, Business data vault objects or data marts demand a higher level of customization. Vaultspeed will extend its capabilities to add custom logic so that our customers can work towards tailor-made solutions for their data integration challenges.

“VaultSpeed aims to deliver economies of scale in all layers of the modern data warehouse. We realize that a trade-off exists between customization and repeatability. We also realize that the economies of scale decrease with increasing levels of customization.

By designing your data pipeline using repeatable patterns you can reduce the customizations and deliver high quality and high value through automation.

With VS Studio, we took another step towards these goals. Eventually, Studio will help customers to exploit the benefits of automation all the way from source to presentation layer.

Additionally we will work together with our partners to create industry specific repeatable patterns based data pipelines that will create value for our customers.”

Dirk Vermeiren - CTO VaultSpeed

So stay tuned... more features are coming soon!

Do you want to stay informed about updates? Become part of our mailing group:


Referential Integrity Error Handling (Release 4.1.15)

Referential Integrity Error Handling (Release 4.1.15)

VaultSpeed release 4.1.15 just went live! Here’s what’s new...

Referential Integrity

When integrating multiple source systems into one data vault, you might encounter some issues with referential integrity. This can have numerous reasons like issues with source batch processing, late data arrival, source mismatches or the possibility of early arriving data.

In any case it will cause you some headaches to fix these kind of problems. With VaultSpeed, you now have an extra set of tools to tackle these problems.

In the latest release we added referential integrity error handling logic. We added two new parameters: REFERENTIAL_INTEGRITY_VALIDATED and REFERENTIAL_INTEGRITY_FORCE_LINK_LOAD.

What happens when you activate the first parameter? VaultSpeed will store records in an error table if it cannot find the referenced key. These records will not be inserted in the link tables. Not loaded unless you activate forced link loading, it that case the tool will add a reference to the ‘unknown’ record.

In addition VaultSpeed will add error table records when processing subsequent loads. This allows us to check whether the reference can be resolved with new data available.

Other Changes

  • You can now create a self reference relationship which contains an attribute that references itself i.e. (c,b) → (a,b). You can do this by right clicking the attribute while in relation creation mode.
Self Reference options in VaultSpeed
  • We added the ability to update and save multiple records at the same time to the following screens: Source objects & attributes, DV name management, DV data types screens, Business vault views.
New Save buttons
  • You can now revert SAT splits.
  • We added subscription info to the dashboard. You can also see the actual number of concurrent users.
VaultSpeed Dashboard page
  • You can now switch the ODI_FOLDER_SPLIT parameter after having already deployed the project (for customers using Oracle Data Integrator).

SAT split editor improvements

Our developers also made some improvements to the SAT-split editors based upon customer feedback (we added a new SAT-split screen in a previous release):

    • We now have an option to change the number of visible attributes in a split (just click the button with 3 vertical dots and go to settings).
    • We added an option to sort the attributes in descending order in the context menu.
    • The editor now has the option to show only unused attributes to the context menu.
    • You can now remove attributes from a split or add them to another split by dragging and dropping.
    • Long table names are now displayed correctly.
    • Improved handling of multiple splits.
    • The split name will no longer disappear when you scroll down the attribute list of a split.

Stay tuned for more release news!


Data Lineage Export (Release 4.1.14)

Data Lineage Export (Release 4.1.14)

In the upcoming release, we incorporated some important improvements. First, we added a first version of our data lineage export functionality. Second, we added support for higher customization of DDL Options. We also added an alternative screen to handle satellite splitting. And finally, after introducing a lot of new objects in the business vault, we added business vault loading to our Flow Management solution in Apache Airflow.

At VaultSpeed, we invest a lot of time analyzing customer feedback and enhancement requests. At regular points in time, we try to align this feedback with our product roadmap in order to stay in line with customers needs.

Data Lineage Export

With VaultSpeed we gather source metadata. This metadata can be enhanced, maintained and fine-tuned by users. In the end we use all this metadata to generate a valid Data Vault 2.0 model. With this in mind, we have all necessary information to provide customers with proper data lineage. It is now possible to export this information to a CSV format.

Exported Data lineage in csv format

Starting at the Data Vault overview screen, users will see 2 new buttons. The first button is for object level data lineage, the second one you can use for attribute level lineage. When you click these buttons a CSV file concerning the entire data vaults object or attribute lineage will be downloaded trough your browser.

This is the first iteration of data lineage functionality in VaultSpeed, we are looking at alternative export formats and integrations with popular data lineage tools and possibly also some data lineage views inside VaultSpeed.

DDL Options for each table type

Using VaultSpeed, you were already able to define certain DDL settings. With these settings, you can create optional DDL layouts for the tables that will be created throughout all layers of your data warehouse. For example you are able to define a certain level of parallelism, or set a specific partitioning type or other, depending on the specificities of your target database.

We now made it possible to create these DDL settings at table type level. So you could for example create a different setting for a HUB compared to a satellite.

Enhanced SAT split screen

Our UX designers try to improve VaultSpeed's look and feel continuously. They came up with a different layout for our satellite splitting screen. This version should work a lot better when you are splitting tables into more than 2 sattelite splits. Also, it is now possible to show multiple satellites in one canvas. Finally, it is more aligned with other graphical screens in the tool. So users should recognise the menu's from other screens.

New SAT-split screen

All previous functionalities are still available in this screen. The old screen is still present for user that prefer to work in the old one. We will analyze customer feedback as to which screen gets the best reviews.

Business Vault loading added to FMC

The FMC module or Flow Management Center is our Apache Airflow based solution to orchestrate flows. This module can be purchased in addition to a standard licence. It offers a solution for customers that additionally require an orchestration tool. In recent releases, we added a lot of new functionality in the Business Vault layer. This new logic can now also be loaded using our FMC module. VaultSpeed will generate Airflow DAGs for loading of PITs, Bridges and Cross source links. Read more about FMC on our features page.

Our FMC module generates DAGs for Apache Airflow

Tanks for reading. That's all for this release, stay tuned!

Best regards,
The VaultSpeed Team

Subscribe below and become part of VaultSpeed's ambassador community. Receive regular updates about new releases and share them within your network.


Cross Source Links (Release 4.1.12/13)

With release 4.1.12 and 4.1.13 we brought some interesting new features.

First of all, we included support for cross source exploration links. The main idea behind building an enterprise wide data warehouse is to be able to integrate data from multiple sources. This requirement is now delivered out of the box in VaultSpeed.

Vaultspeed already provided HUB Groups to integrate between sources at business key level. With our new Cross Source Link objects, you can create links between multiple sources. This adds an additional way to integrate and explore between sources.

In the source graphical editor an option has been added to select entities from other sources and their specific releases. By selecting and adding the object to the canvas, a relationship can be defined between objects from different sources. Notice that foreign objects will be shown in a different color and that the source name is mentioned in the object name.

Import object from another source and add a relationship

Once the object and the relationship have been added, new data vault and business vault releases can be generated. The resulting relationship will add a link object between both sources.

New link is added to the data vault model

We integrate cross source link objects at the Business Vault level. There are several reasons why this is the best place for the integration:

First of all, cross source links create dependencies between sources, we do not want inter source dependencies between sources in the raw data vault because it would complicate the loading process.

Second, we have to make sure that all the data has arrived in the raw data vault to process our cross source links. With data coming in from different sources at different times, there might be issues with “missing links”. VaultSpeed takes care of this issue.

Finally, by placing the cross source link in the business vault layer, we make it less dependent from underlying source changes. This facilitates version management and release management of these structures.

Also included in this release is a new type of delete logic, INSERT_ON_DELETE: when this parameter is enabled, deletes will insert a new record in the satellite instead of executing an update, these delete records will be end dated just like regular records. This parameter has no effect when using INSERT_ONLY_LOGIC.

We also improved performance and look and feel for the source graphical editor. The objects will now also be displayed in a tree like structure to give a better overview of the source.

Tree like structure to show source objects in the graphical editor

More new stuff coming soon!


Bridge tables (Release 4.1.11)

The start of a new year welcomes a new release. VaultSpeed kicked of 2020 with some brand new features in version 4.1.11. We added support for bridge tables in a new graphical editor and added the possibility to import and export business names in the business vault. By adding bridge tables, VaultSpeed now supports the 5 main objects in data vault 2.0: Hubs, Links and Satellites in the Raw Data Vault. But also PITs and Bridges in the Business Data Vault. The graphical bridge editor enables users to generate bridge tables based upon raw data vault objects. All available Hubs can be dragged and dropped to the editor.

 

 

Once a certain hub is selected on the canvas, the adjacent hub tab becomes active. It contains all adjacent hubs for the selected hub. Each adjacent hub can be dragged into the editor. Hubs can have multiple links to the same adjacent hub. It is not possible to add more than one link connection for a starting / ending hub.

 

 

The Bridges tab contains all available bridges. It is possible to remove, edit or create a bridge. To edit a bridge, first load it by dragging it into the editor, then you can start modifying it. To save the changes, click the edit button which is above bridge list in order to complete the edit. To create a new bridge, build it in the editor and click the create new bridge button. Creating a new bridge and editing a bridge is only possible when the bridge combination is valid, otherwise a warning message will be displayed.

 

 

 

It is possible to let the app generate possible paths between two hubs. To do this, Right click and select draw shortest path relation, then drag a line between the 2 hubs you want to connect. Then you can change the number of possible paths it will generate by setting “max generation options”, as well as excluding hubs from the paths. If there is only one viable option, then it will be directly applied to the editor. However, if there is more than one option, then an extra tab becomes active and it is possible to preview, regenerate or apply options to the paths. To preview or regenerate a path, you need to selected a bridge and click one of the buttons above the shortest path options list. During preview editor becomes read-only. To apply the selected shortest path option while in preview mode, click the apply button, otherwise you can drag and drop the desired path option into the editor.