From Legacy DWH to Data Lake: A Tidy Migration Strategy

Reading Time 7 minutes

If you’ve been running a legacy data warehouse for a while, chances are you’ve heard the siren song of the data lake. AI everywhere, dramatically faster time to value, and the cloud vendors tell you it’s so easy…just move the data, right?

Not quite.

We’ve seen time and again that a ‘lift and shift’ approach often recreates old pain points at cloud scale. Most data lake migration efforts falter not because the technology is too complex, but because the strategic clarity isn’t there. When everything is “critical,” and every table gets migrated “just in case,” teams end up spending massive time and money recreating the old world (with all its inefficiencies and blind spots) on shiny new infrastructure. Instead of unlocking new value, teams end up duplicating technical debt in a shinier (and often more expensive) environment.

Let’s be blunt: your current warehouse probably contains a lot of junk. Old ETL jobs no one owns. Tables that haven’t been queried in years. Critical business logic buried in stored procedures that nobody understands. If you try to move everything, you risk burning out your team and dragging legacy pain into a new platform. Migration should be a chance to step back, rethink what your data ecosystem should look like, and clean house where necessary. As one of my colleagues likes to put it, “if you’re migrating data, don’t migrate the mess”. Or, if you prefer memes;

A successful migration requires more than just a technical blueprint. It demands strategic clarity:

knowing what data matters;

what processes are worth preserving; and (critically)

what you can leave behind.

You want to reframe your goal from ‘lift and shift’, to ‘audit, clean, and rebuild with purpose’. That starts with being clear about why you should consider the move in the first place.

Why (and Why Not) Migrate?

There are plenty of textbook reasons to move from on-prem data warehouses to data lakes, so let’s start with the greatest hits. Near-infinite storage and elastic compute promise freedom from on-prem server constraints. Schema-on-read and multi-format support unlock new data sources with minimal prep. Decoupled architecture means cost savings and scalability; integration with AI/ML tooling future-proofs your stack. And of course, there’s the lure of cloud-native everything: managed services, global availability, and theoretically, fewer 2 a.m. pager alerts.

Going beyond the usual vendor buzzwords, I think there’s an angle many people miss; the chance for a clean slate. A well-timed migration gives you a powerful forcing function to modernise not only your infrastructure but your entire data practice. Done right, it’s your chance to tackle technical debt, clarify ownership, improve governance, and build a more agile data platform. But beware: migration is expensive in both time and focus and requires strong business drivers and leadership buy-in to succeed.

Before rushing ahead to implementing strategy, it’s worth asking if you’re ready for the move. Not every organisation is ready (or even needs) to make this leap. Migration can be expensive in both time and focus, requires executive backing, and can potentially derail your existing roadmaps. You need clear business drivers (like breaking through scale limits, reducing runaway costs, or enabling new use cases), and the leadership and breathing room to execute the transformation. Employing a consulting firm can help you gauge readiness, but the will to succeed must come from within.

What Does a Tidy Strategy Look Like?

Assuming you’re ready for the move, let’s start with how we’ll approach strategy for migration. The first trap many teams fall into is overcommitting – after all, you want a big win. But real strategy isn’t about doing more; it’s about making deliberate trade-offs. The heart of strategy is subtraction. A tidy migration strategy is more about what you’re NOT going to do that what you actually do:

Prioritise ruthlessly: Focus first on data that delivers clear, ongoing business value. Use query logs and stakeholder interviews to surface what’s truly in use.

Retire aggressively: Don’t carry over stale data, dead jobs, or undocumented pipelines. Migration is the perfect chance to clean house.

Rebuild intentionally: When something must be migrated, ask: can we redesign it better? Avoid porting brittle ETL jobs as-is. This is your chance to move toward modular, observable, cloud-native pipelines.

Strategic focus prevents the classic pattern: spending huge effort migrating low-value assets, while starved of time for the work that moves the needle.

Next, we need to have a tangible, measurable outcome to push towards. After all, success in business is “we solved a problem and here’s the proof”, not “we moved the stuff”. Here are a few specific examples you might consider:

Speed to insight: Does the new architecture make it faster to get reliable answers?

Cost control: Are you reducing TCO, and not just shifting spend from one bill to another?

Flexibility: Can you now support new use cases (e.g. machine learning) that were blocked before?

Resilience: Is your new platform more observable, more testable, and easier to operate?

Without a clear optimisation lens, you risk optimising for the wrong thing (like raw throughput or perfect parity with your legacy system) which can derail your migration’s true value.

Finally, as Sun Tzu advised, victory comes from knowing both your own position and the landscape you operate in. In battle, it’s important to know how to act at what time, who to consult, and who’s responsible for success and failure. In data terms: before you move a single byte, you need to deeply understand your data estate and how it’s used. Before migrating, focus first on mapping your terrain:

Lineage: Know where your data originates, how it’s transformed, and who relies on it.

Ownership: Define clear data ownership and accountability up front. This cuts through ambiguity and ensures someone is on point when things (inevitably) break.

Governance: Set policies for data quality, privacy, and lifecycle management early. If you don’t solve governance before migration, you’re just moving the mess into a bigger room.

Investing in these foundations early saves major headaches down the line. It also forces you to ask tough but essential questions: What’s the minimal viable data estate we can get to first? Where are we duplicating effort? Planning from good assumptions won’t eliminate surprises down the line, but it will reduce their frequency and impact. Remember; “a plan is nothing, but planning is everything”.

Rethinking Your Architecture

Once you have a good grasp on your data ecosystem and your plan forward, it’s time to consider the underlying technology and tactics you can use in the migration. A data lake isn’t just a bigger storage bucket. It’s a fundamentally different way of thinking about data:

From static to dynamic: Schema-on-read allows for more flexibility but requires clear conventions and documentation to stay usable.

From monolithic to modular: Instead of one giant warehouse, you can break your data into domains or products, themselves ownable, discoverable, and evolvable.

From IT-owned to cross-functional: Modern data platforms thrive when ownership is distributed. Empower data stewards and business teams alongside engineers.

These changes aren’t just technological…they carry cultural impacts as well. Prepare your narratives accordingly.

Part of ensuring good governance is agreement around data structure, known as data modelling. Modelling the data might be intimidating at first, but it boils down to aligning data streams with real-world concepts and entities. As for how to govern this process, a proven pattern here is “medallion architecture”. This layers your lake into:

Raw/Bronze: Immutable data, as replicated from the source.

Trusted/Silver: Cleaned, transformed data ready for most advanced analytics needs. This is where you tend to build your business entities.

Curated/Gold: Business-ready data products, streamlined for business-unit analysts and dashboards.

This layered approach helps balance flexibility with trustworthiness and provides a roadmap for evolving your data maturity over time.

While there are commercial and HR considerations to your selection of data platform, the reality is that all the major players can likely support your technical requirements. Modern data lake platforms (e.g. Fabric, Databricks), and competing data warehouse approaches (e.g. Snowflake, BigQuery), have a broadly standard feature set and can trivially scale to most company’s data requirements. Concentrate on the relative advantages of operating each platform, and the ease of integration in your existing ecosystem.

Landmines to Avoid

Even with a good strategy, migrations are tricky terrain. Here are some of the common pitfalls we’ve seen, and how to sidestep them:

Skipping governance in the rush to deliver.

It’s easy to push governance down the road (“we’ll clean it up later”). This is the single biggest mistake you can make. Bad lineage and loose controls in a data lake quickly turn into chaos and are 10x-100x harder to fix post-migration. We have a charming metaphor for this kind of data lake…the “data swamp”. Spend the time creating standards and patterns upfront and keep the water clean.

Assuming ‘parity’ is the goal.

Trying to replicate your old warehouse 1:1 locks you into legacy constraints. Focus instead on what needs to be preserved and where you can evolve. Your business isn’t the same company it was 2 years ago, let alone when you first got locked into your legacy DWH data schema!

Underestimating transformation complexity.

Legacy ETL jobs (especially tools like SSIS or hand-coded SQL transformations) often hide business logic that’s poorly documented. These jobs sometimes won’t just “port”, they need re-interpretation. During this process, don’t be afraid to change your mind on how things are calculated, assuming you have that freedom.

Overloading the first migration wave.

Trying to migrate everything at once almost always leads to missed deadlines and budget overruns. Start with high-impact, well-understood datasets to build momentum. Build a “golden thread” end-to-end then widen to more use cases.

Forgetting about downstream consumers.

If you don’t engage BI teams, analysts, and data scientists early, you risk breaking reports/dashboards or creating a shiny new data lake nobody trusts or uses. Don’t be afraid to run both systems in parallel for a while to discover the rough edges.

In Closing

Migrating from a legacy data warehouse to a modern data lake is more than a tech refresh. It’s an opportunity to rethink how your organisation handles data end-to-end. When migration projects are framed as purely technical lifts, they tend to fail. But the migrations that succeed (both in the short term and long after the cutover) are those grounded in clear strategy.

To get the most value:

Audit before you move.

Clean before you lift.

Build for clarity, not just capacity.

Be intentional. Be ruthless about what you bring forward. And don’t let the pressure to “just move it” distract you from the bigger opportunity: to build a cleaner, more effective data foundation.

And remember: the goal isn’t just to move data, but to unlock new possibilities. A well-architected data lake sparks joy…and much more powerful Spark queries.

Get in touch to see how we can help you with your data analytics so you can make better decisions, faster.

From Legacy DWH to Data Lake: A Tidy Migration Strategy

Alex Stenlake

Why (and Why Not) Migrate?

What Does a Tidy Strategy Look Like?

Rethinking Your Architecture

Landmines to Avoid

More Success Stories

AI Takes Centre Stage: The Latest innovations from Microsoft Build 2025

AI Agents Revolution: Moving Beyond Models to Intelligent Action in 2025

Microsoft’s March 2025 Updates: AI-Powered Enhancements Across Power BI, Fabric, Azure, and Microsoft 365