Designing a Data Platform for Continuous M&A at Thomson Reuters

Industry:

Legal, Tax & Regulatory Technology

Use Case:

M&A Data Integration and EDW Migration

Platform:

Snowflake on AWS

Technologies:

VaultSpeed, Snowflake, Data Vault 2.0

100+

data sources

Core

set of data products live

Thomson Reuters is a global information provider serving the legal, tax, and policy sectors, with approximately $7.5 billion in revenue and 25,000 employees worldwide. The company delivers subscription-based AI and content products to professionals across the globe. As part of its broader data and AI strategy, led by Thomson Reuters, focused on delivering trusted, fiduciary-grade AI built on governed data, the company set out to solve a foundational challenge: how to unify a constantly shifting data landscape across more than 100 business systems, while absorbing continuous M&A activity and modernizing its analytics infrastructure.

The Challenge: Unifying Data Across a Dynamic Enterprise

Thomson Reuters continuous growth through M&A naturally brought new systems, data models, and technical approaches into the organization over time. As the business evolved, so did its data landscape — with legacy systems transitioning gradually, data ownership adapting to organizational change, and documentation varying across teams.

As the business scaled, opportunities emerged to reduce duplication, align technical approaches, and improve visibility into data. At the same time, the organization's rapid growth into AI-powered products was driving exponential demand for trusted, accessible data — turning these challenges into a clear mandate for modernization.

The Approach: A Unified Data Layer Built on Data Vault and VaultSpeed

Rather than attempting a monolithic migration, Thomson Reuters designed a data product architecture centred on a unified data layer running on Snowflake. The architecture follows a clear flow: sources feed into a raw landing zone, then into curated data assets, through a unified data layer modelled using Data Vault 2.0 (with Hubs, Links, and Satellites), and finally into a data product marketplace for consumption by BI platforms, AI applications, and reverse ETL processes.

Vault Speed was selected as the automation engine to drive the Data Vault modelling and code generation. This model-driven approach replaced manual development, ensuring standardized patterns across every domain and enabling the team to keep pace with constant business change from ongoing M&A activity and internal transformations.

The enterprise data model was designed to be continuously evolving, covering key business domains including:

  • Customer and Party

  • Product and Sales

  • Order Management and Deliveries

  • Invoicing, Billing and Account Receivables

  • Finance and Subscriptions

  • Marketing, Channel and Events

  • HR and IT Management

Underpinning this architecture, an enterprise data catalogue, marketplace, and data lineage and access management layer provided governance and discoverability across the organization.

The Journey: Incremental Progress Over Two Years

The transformation began in early 2024 with team formation. At the outset, the team had little Data Vault experience, but they built foundations with AWS Snowflake data pipelines and began the EDW migration with the first core data products covering Customer and Product domains, as well as Invoicing and Account Receivables.

By mid-2024, four data products were live. Through 2025, community building and stakeholder engagement intensified, the internal data marketplace was launched, and the model expanded to cover Subscriptions, Orders, Deliveries, Finance, and Usage data — bringing the total to 11 live data products.

In 2026, the team is reaching a major milestone: the legacy Enterprise Data Warehouse is beginning its phased decommissioning, with additional data products continuing to go live. A major SAP integration is also in progress.

Transformation at a Glance


2024 — Start


2026 — Now


Fragmented data landscape


Unified data layer


Manual processes


Automated with VaultSpeed


Limited visibility


Enterprise catalogue live


Aging EDW


Migration in progress

 

The Impact: Flexibility, Speed, and Confidence

Looking back after two years, the results speak clearly. Data Vault delivered the flexibility the team needed to absorb constant changes from M&A and business transformation. Vault Speed automation kept the project on track and on schedule, replacing what would have been error-prone manual development with standardized, repeatable patterns. This foundation enables faster development of AI-powered products and more reliable outputs across critical professional workflows.

Key outcomes include:

  • AI Readiness: The approach delivers the governance, traceability, and context required to support trusted, professional-grade AI systems. Data Vault preserves full history, while Data Products provide curated, consumption-ready data for AI applications.

  • Speed: Automated code generation significantly accelerated data product delivery across domains.

  • Scalability: The architecture absorbed ongoing M&A activity with limited amount of rework, keeping pace with continuous business change.

  • Accessibility & discoverability: Data is now available and accessible through an internal marketplace significantly accelerating use case implementation

The organization has successfully migrated major legacy warehouses, and core data products are now operational across the enterprise.

The Takeaway: Lessons from Scaling Data Integration

Thomson Reuters experience offers valuable lessons for any enterprise navigating data integration at scale. A multi-year transformation requires sustained stamina and consistent stakeholder engagement. Different audiences — from technical teams to executive sponsors — need the value articulated in different ways. And investing in automation from the start is not optional: manual implementation at this scale would have been an invitation for failure.

The team also identified four key success factors for building with Data Vault: strong skills and training in business process knowledge and automation tooling; the right data platform supporting parallel processing; automation of development for faster delivery and standardized code; and an incremental build approach following agile practices rather than big-bang implementation.

Conclusion

As part of this modernization effort led by Thomson Reuters (TR), Vault Speed supported this effort by automating Data Vault modelling and code generation — enabling elements of TR platform to scale across domains without accumulating technical debt, even as the business continued to evolve through acquisitions and transformation. With the legacy EDW now being decommissioned and new domains coming online, Thomson Reuters is well positioned to deliver trusted, fiduciary-grade AI built on a governed, auditable data foundation

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.

It's time to 10x your data delivery

VaultSpeed automates the transformation of data scattered across dozens of source systems into governed, production-ready pipelines, native to your cloud data platform.