With the release of version 4.1.0. the possibility to have multi-master within one source is now live. This seems to be the perfect occasion to write an article about how it works and how you could use it.

In the world of Data Vault, connecting tables from different sources and within sources is one of most powerful characteristics of Data Vault. Vaultspeed makes it possible to define if a table is single-master or multi-master and if it’s a slave of another table once grouped.

Single/Multi master orchestration

Hubs, which are the entities which contain the keys can be single master where you just have one source and no extra source/sourceobject business key field is stored within a hub or they can be multi master which means there will be stored an extra field to identify where the data is coming from. Which source or if multi-master within one source, which source table.

Vaultspeed itself let the user define on source level if a table/hub will be single or multi-master. This makes it easier later on if you plan on adding another source to the same hub in a next release without having to migrate and recalculate the hash keys because a source business key field was already included.

If the end user didn’t think of it in the first place, it’s no problem, Vaultspeed will recognise when an end user changed the value to multi-master in a next release. So a migration ELT-mapping will be generated which will recalculate the hash keys with the extra source business key.

Having defined if a hub is a single or multi master is just a preparing step for the real deal.

The group and master/slave management of the different hubs over and within sources.We can define different possibilities which we will discuss in this article:

HUB group management

When connecting multiple sources to a Data Vault, the possibility to connect sources goes through hubs, those building blocks can store keys of different source tables identified with a source business key.

Grouping of entities with the same meaning will create one and only one hub.

The group predictor integrated in the tool will connect objects of different sources with the same name automatically (can be unlinked if this is not the wanted result). Other objects with different names can be grouped together by the end user. Example SRC3 contains the table employees, SRC2 contains the table persons. The end user knows this is the same entity and so will group them by dragging and dropping the hub_persons from the unlinked hubs canvas to the group canvas. Result of the action is shown below.

Unlinked hubs on the left, grouped hubs on the right

Persons of SRC2 is grouped together with employees of SRC3

From now on, the tool knows, the entity employees and persons is the same. The impact of the grouping: one hub will be generated with a source business key, but off course for every entity its own satellite.As written in the title, from now on it is also possible to group tables of the same source together. The big difference between a grouping over and within a source is the extra element that is needed to identify the source. In this case the name of the source entity.To explain it with an example: 1) Over sources: SRC2 (source business key: S2) object persons – SRC3 (source business key: S3) object employees will be grouped in ‘EMPLOYEES’

When loading from source object persons:
– Source business key Value = ‘S2’
– Satellite name = SAT_S2_EMPLOYEES

When loading from source object employees: Source business key Value = ‘S3’
– Source business key Value = ‘S3’
– Satellite name = SAT_S3_EMPLOYEES

1) Within source: SRC2 (source business key: S2) object persons – SRC2 (source business key: S2) object employees will be grouped in ‘EMPLOYEES’

When loading from source object persons:
– Source business key Value = ‘S2_PERSONS’
– Satellite name = SAT_S2_PERSONS_EMPLOYEES

When loading from source object employees: Source business key Value = ‘S3’
– Source business key Value = ‘S2_EMPLOYEES’
– Satellite name = SAT_S2_EMPLOYEES_EMPLOYEES

HUB master slave management

For hubs grouped together, the end user can define a relationship between those entities. There is just one golden rule, are you sure that all business keys in one object cover all the business keys in another one. E.g. the employees table has all the business keys of the persons table, then employees can be master and persons can be slave. In the ETL logic this means that only one mapping is generated to load the hub, only one for the master, the slave will have the same hkeys, so they should not be loaded. On the other side, each entity will have its own satellite and can have his own unique descriptive data.
On the contrary when both entities will be master, both are loaded into the hub with their own source identifier.

In the tool itself the end user can choose to make a source slave or master for all tables at once or do this action on group level.

Master/Slave management on source level

Master/Slave management on group level

Hub business key concatenation

When the hubs are grouped and the master/slave relation has been decided upon, there is still one action regarding this topic which can be edited. The concatenation of the business keys. As said earlier on, different entities can have the same or different business keys. In a master/slave relationship, only the master keys will be loaded into the hub. So no issue here. On the other side when having a master-master relationship where the entity is the same, but the keys are different, a concatenated business key can be chosen. This will store the different business keys always in one and the same column.

E.g. the persons table identifies a person using its first and last name, the employees table identifiesa person using an employee_id. In this case a concatenated business key is chosen so a concatenation of the first and last name can be stored in that column and also the employee_id can be stored here. In the business vault a link can then be made to link the employee_id with the first and last name. When both entity are using a same identifier, but just have different data, concatenation can be done or not, as the end user prefers.

In the tool itself, it can be chosen with a toggle switch. The concatenation can only be done for multi-master hubs. In the same screen the user can choose another short name and abbreviated name for the group itself. Short names are used to create the link name e.g. LNK_COUN_EMPL, the abbreviated names are used for the hub and sat itself.

I hope this post will give some ideas about the possibilities in Data Vault and the tool regarding linking entities over and between sources.

See you next time in another blog post

Tim Van Brabant