
Enterprise retail data architecture accumulates technical debt silently.
Legacy warehouses, fragmented pipelines, and duplicate platforms compound over time.
Databricks retail solutions provide the migration path out of that complexity. Most programs stall because sequencing decisions are made too late. The migration pattern and the case studies behind it are documented here.
Why Retail Migration Programs Stall
Most retail data teams know exactly where they are going. They commit to Databricks retail solutions and define a target architecture. What stalls programs is the sequence between those two points.

The four failure patterns account for most stalled programs:
- Big bang cutover: attempting all workloads simultaneously makes rollback impossible when dependencies surface.
- Skipping Bronze: ingesting directly to Silver removes the raw layer needed for reprocessing when sources change.
- Governance deferred: migrating data without Unity Catalog forces expensive access control retrofitting later.
- Dual-platform operation: running legacy warehouse alongside Databricks doubles cost and creates two sources of truth.
Migration approach comparison:
| Approach | How it works | Where it breaks |
| Big bang cutover | All workloads migrate at once | Any failure halts the entire program |
| Workload-by-workload | One domain at a time, decommission behind | Slower but recoverable at every step |
| Lift and shift | Move existing pipelines without redesign | Imports technical debt to the new platform |
| Greenfield parallel | New Lakehouse built alongside legacy | Requires active decommission sequencing |
Signs a migration is already at risk:
- Legacy warehouse still running six months in
- No Bronze zone defined before migration starts
- Unity Catalog design deferred to a later phase
- Migrating five or more workloads simultaneously
The Three-Phase Model Behind Successful Databricks Retail Solutions
A consistent three-phase pattern emerges across successful enterprise retail migrations. Architecture dependencies impose it, not Databricks methodology.

- Foundation: Delta Lake storage design, Medallion architecture, Unity Catalog governance, and ingestion consolidation completed before any data moves.
- Domain migration: workload-by-workload, starting with lowest-risk domains; analytics before operational, historical before real-time.
- Legacy decommission: deliberate, sequenced shutdown of each system timed against upstream dependencies, not arbitrary cutover dates.
Phase 1 is where most programs fail. What must be resolved before migration begins:
- Unity Catalog structure: workspace layout, permission model, and PII masking rules documented before migration starts.
- Bronze zone definition: schema contracts and retention policy per domain agreed before any source connects.
- Ingestion pattern decision: Kafka, DLT, or Auto Loader confirmed per source type before build begins.
- Rollback criteria: explicit conditions under which each domain reverts to legacy, written before migration.
Databricks Retail Industry Case Studies: Trek, Myntra, and H&M
These three databricks retail industry case studies each start from a different legacy constraint. They share one outcome: production results from treating architecture design as the first deliverable, not a precursor to skip.

Trek Bicycle — from 48-hour ERP replication to near real-time
Trek operated 450 stores globally on regional, sequential data pipelines. ERP replication ran once per week, leaving all regions with stale data throughout the week.
- DLT Bronze to Silver to Gold: structured streaming from ERP via Qlik replaced weekly bulk copy jobs.
- Power BI on Gold tables: analysts query retail data directly without data exports or team involvement.
- Global refresh redesign: all three regions refresh three times daily, simultaneously rather than sequentially.
Outcomes:
- 80–90% faster retail analytics
- ERP replication reduced from 48 hours
- 3x daily global refreshes, all regions simultaneously
- C-level and store reports from the same Gold tables
Myntra — eliminating duplicate sources of truth at petabyte scale
Myntra serves 70 million monthly active users in fashion e-commerce. Legacy Hive architecture caused Spark job failures and duplicate data sources at scale.
- Medallion on Delta Lake: eliminated file locking conflicts that caused frequent Spark job failures.
- Unified batch and streaming: both workloads under one compute model, removing separate infrastructure costs.
- Real-time clickstream processing: click-through rates and order metrics now power continuous UX optimization.
Outcomes:
- Duplicate sources of truth eliminated
- 35% infrastructure cost reduction on Delta Lake
- 25% real-time pipeline performance improvement
- Month-over-month ML deployment growth
H&M — self-service ML deployment across 75 markets
H&M operates 4,700-plus stores across 75 markets globally. Data scientists could not deploy models independently before Databricks.
- Standardized ML deployment API: data scientists deploy via a single, consistent API without data team involvement.
- Online and batch serving built in: inference, Spark execution, and metrics tracking provided out of the box.
Outcomes:
- Independent model deployment without data team
- Online serving, batch execution, and metrics all standard
- Consistent API across all 75 markets
Source: Databricks Lakehouse for Retail launch, 2022
These databricks retail customer success stories share one pattern. Architecture design was completed in full before any workload moved.
Outcomes across all four databricks retail solutions migrations:
| Retailer | Before-State Problem | Migration Approach | Key Outcome |
| Pandora | 5-layer stack, dual compute and ingestion | Phased, architecture-first | 5 to 3 layers, single platform |
| Trek Bicycle | 48hr ERP replication, regional batch | DLT Medallion, Power BI on Gold | 80–90% faster, 3x daily global |
| Myntra | File locking, duplicate sources of truth | Unified batch and streaming | 35% cost down, duplicates eliminated |
| H&M | ML deployment bottleneck, 75 markets | Standardized API on Databricks | Scientists deploy independently |
Pandora and Zoolatech: Migrating a Global Retailer’s Five-Layer Stack
Pandora’s data stack had grown into five separate layers. Each added cost, latency, and governance complexity to every workload. Zoolatech, as a certified Databricks partner, was engaged to redesign and consolidate it.
The before-state
Pandora operated Azure Synapse, Databricks per product line, and Analysis Services simultaneously. Azure Data Factory and EDW completed the five-layer stack.
- Dual compute billing with no unified data lineage
- Dual ingestion surfaces with no consolidated monitoring
- Analysis Services adding latency to every Power BI change
- 500 global reports with unpredictable cascade risk on any change
What Zoolatech designed
Zoolatech designed the target architecture around five decisions:
- Delta Lake and Medallion: Bronze, Silver, and Gold zones replacing ad-hoc ADLS storage with auditable lifecycle governance.
- Unity Catalog: centralized governance for 5,000-plus users with row-level security and PII masking replacing manual processes.
- Kafka-only ingestion: Kafka was selected to support streaming, bulk, and master data ingestion as part of the target-state architecture, with Azure Data Factory planned for retirement during migration phases.
- Databricks SQL and Power BI direct: the target architecture connects enterprise Power BI reporting directly to Gold-layer tables, enabling phased retirement of the Analysis Services layer.
- Databricks Workflows: the architecture centralizes orchestration from Kafka ingestion through Bronze, Silver, Gold transformation to Power BI refresh within Databricks Workflows.
Migration sequencing and risk mitigation
- Architecture defined first: target state documented, approved, and validated before any workload moved.
- Synapse behind SAP S/4HANA: finance reporting continuity protected until the upstream ERP migration stabilized.
- Analysis Services phased by domain: lowest-dependency reports migrated first; each domain validated before the next moved.
- Kafka extended before ADF retired: no ingestion gap at any point during the transition.
Measuring Migration Success: A Retail KPI Framework
Migration success requires measurement at two levels. Operational metrics confirm the migration ran without disruption.
Commercial metrics confirm it delivered the outcomes that justified investment.
The retail analytics programs Zoolatech has delivered use both layers throughout.
| Category | Metric | Why It Matters |
| Migration velocity | Workloads migrated per quarter | Measures pace of legacy decommission |
| Pipeline reliability | Job success rate post-migration | Stability vs. pre-migration baseline |
| Legacy cost reduction | Legacy compute spend eliminated | Direct ROI from decommission |
| Data freshness | Source-to-Gold latency vs. baseline | Business value of the new architecture |
| Analyst self-service | Queries without data team involvement | Productivity impact across the organization |
| ML production ratio | Models in production vs. in experiment | AI operationalization improvement |
Decision Framework: What to Resolve Before Migration Begins
These six questions determine readiness before any workload moves. Answer them in writing before migration begins.
- Is the target architecture fully defined? No workload moves until Medallion zones, Unity Catalog, and ingestion patterns are documented and approved.
- Are workloads prioritized by isolation? Start with lowest-dependency domains; analytics before operational; historical before real-time.
- Are decommission triggers written down? Each legacy system needs explicit shutdown criteria, not an arbitrary cutover date.
- Are upstream dependencies mapped? Every downstream consumer of each legacy system must be identified before migration begins.
- Is governance designed first? Unity Catalog permissions, PII masking, and lineage configured before any data moves.
- Is there a rollback protocol? Each domain needs a documented reversion path and the conditions that activate it.
Conclusion
Retail migrations fail when sequencing is treated as a delivery detail. It is the architecture decision. Programs that delivered commercial outcomes treated Phase 1 as the first deliverable, not a prerequisite to skip.
Key findings:
- Big bang cutover, skipped Bronze, and deferred governance cause most program failures
- Target architecture must be fully defined before any workload moves
- All four retailers treated architecture design as a non-negotiable Phase 1 output
- Phased domain migration with explicit decommission criteria protects production continuity
- These databricks retail solutions migrations share one pattern: design before deployment
- Governance-first sequencing is the most commonly deferred and most costly Phase 1 decision
Questions You May Have
What are the biggest risks in Databricks retail migrations?
Governance deferred too late is the most common failure. Design Unity Catalog before any workload moves.
How long does a retail migration to Databricks take?
Phased migrations typically run 12 to 24 months. Architecture design must complete in full before data moves.
What is Medallion architecture in Databricks?
Medallion organizes data into Bronze, Silver, and Gold layers on Delta Lake. It preserves raw data for reprocessing without separate storage infrastructure.
How was the Pandora migration sequenced?
Synapse decommission was sequenced behind the SAP S/4HANA ERP migration. Finance reporting ran without interruption throughout.
What is the difference between Lakehouse migration and lift-and-shift?
Lift-and-shift moves existing pipelines without redesigning them. A proper migration rebuilds around Medallion, Unity Catalog, and consolidated ingestion.
How is ROI measured in Databricks migrations?
Legacy compute spend eliminated and source-to-Gold latency reduction are the primary metrics. ML production ratio improvement follows as AI operationalization matures.












