What is Automated Data Transformation?
VaultSpeed recognized as a Leader in The Data Product Engineering Tools Survey 25. Read the full press release.
Hero With Dark BG 02 2
Automated data transformation

What is automated data transformation?

Introducing next-generation automation of data conversion, manipulation, and restructuring.

Automated data transformation

Automated Data Transformation (ADT) streamlines the data delivery process by automating data conversion, manipulation, and restructuring.

This reduces manual work in creating a data warehouse, lakehouse, or mesh, and enhances data quality, ultimately facilitating better decision-making.

Handling multiple data sources

ADT involves gathering vast amounts of source metadata and remodeling it to render it suitable for analysis, reporting, LLM ingestion, or other downstream applications that bring business value.

The outcomes of ADT include physical data structures (DDL), data transformations (DML), and orchestration workflows. The more metadata that can be processed, the more automated the process becomes.

Delivering cloud productivity

Cloud data platform vendors, such as Snowflake, Databricks, Microsoft, and Google, have created environments that enable the effortless creation of data infrastructure with limitless scalability.

However, productivity is unattainable without automation. Analysts estimate that the average enterprise organization has 115 different data sources. Forget about attempting to clean, format, aggregate, and integrate data from all these sources manually.

The missing link in the automation chain

Tools such as Fivetran, Airbyte, and several others cover the transfer of data from source systems to the data cloud with minimal coding and an almost fully automated setup.

Analytics platforms like Looker, Alteryx, and C3.ai make life of business users easier by automating reports and using artificial intelligence to help them find answers to their questions.

Smart automation of data transformation was considered undoable in the past, given the uniqueness of each company's business context and data stack. Until now.

2 ADT pitch illustrations 23 12 18 TRANS BG 01 1

Four generations of data transformation solutions

Automated data transformation represents the fourth-generation data transformation solution that fundamentally changes what data teams can deliver.

First generation: manual coding

Not so long ago, data teams were manually coding DDL and DML SQL statements using physical data runtime components like tables, attributes, and keys. This laborious task was error-prone and cumbersome.

Second generation: traditional ETL/ELT tooling

Traditional ETL/ELT tools allowed for the automation of SQL code by providing the engineers with a drag-and-drop GUI to manipulate physical data runtime components using common SQL operators like a join, filter, aggregate or lookup. The physical data runtime components were harvested and stored in a metadata repository.

Despite this progress, there was still a need to manually build every data mapping and a separate data modeling tool was often needed as well. This was acceptable for a smaller volume of transformation jobs, but proved impractical for larger tasks involving 500 to 1000 mappings a week.

Third generation: template engines

The insight surfaced that automated data transformation is only achievable at a certain level of abstraction, not at the detail level. Data automation requires repeatable patterns.

Once identified, repeatable source-to-target transformation patterns could suddenly be automated. Patterns to load a staging table, patterns to load a Data Vault hub, patterns to load a fact table, and so forth.

Data engineers eagerly started coding these abstract data transformation patterns to automate more data layers. But template coding is still coding: it is prone to error that becomes exponential fast. A template error does not repeat once, it repeats 500 to 1000 times. If you’re not careful, you rapidly descend into a path of data error automation.

What remains missing?

2 ADT pitch illustrations 23 12 18 TRANS BG 02

Fourth generation: Automated Data Transformation

The fourth generation of data transformation, as exemplified by VaultSpeed, reintroduces the metadata repository and GUI.

Components

This is primarily because the template engine comes with pretested, pre-built automation templates that eliminate the need for data engineers to engage in template coding.

The metadata repository stores metadata, including abstract signature components and their relationships. It features a smart rule engine that analyzes source metadata to suggest a target data model, a significant time-saving feature.

Data engineers have a GUI to view and customize the data model based on their business needs by tagging key entities and attributes with the right signature types.

These applied signature tags connect the entire metadata set to the automation templates, effectively bridging the gap between physical and abstract metadata.

The metadata is processed through the template engine, transforming the source data model into the target data model while simultaneously providing the transformation code that can be implemented in the physical data runtime.

One could perceive this as ETL at the abstract level, employing abstract signatures instead of physical data components. Welcome to the world of abstract ETL.

Learn more about automated data integration

Don't compromise

More business value

ADT involves gathering vast amounts of source metadata and enriching it to make it suitable for analysis, reporting, LLM ingestion, or other downstream applications that bring business value.

Speed & Efficiency

The results of ADT are physical data structures (DDL), data transformations, and orchestration workflows. The more metadata that can be processed, the more automated the data integration process becomes, making it faster and more efficient.

Increased data quality

By automating data transformation, businesses can ensure that their data is accurate, consistent, and easily accessible, which in turn leads to better decision-making.

Increased agility

Enterprise companies frequently change data sources, tools, models, or other architectural elements to keep up with rapidly evolving markets. ADT is engineered to adapt to these changes without rework and it plays a crucial role in converting data into a relevant format that has business value today and in the future.

AI readiness

LLMs deliver better results when trained on the reliable data input that automated data transformation provides.

Curious to learn more?