Global FMCG Leader
AI-Driven Retail Execution Platform at Scale
Key Results
12+ ML models in daily production, 40+ data sources, 10K+ outlets monitored
The Challenge
The company’s sales teams needed data-driven guidance at the outlet level: which stores to visit, what actions to take, and which KPIs to prioritize. Existing reporting was backward-looking and couldn’t keep pace with the scale — 5 retail chains, 10K+ outlets, 100K+ SKUs.
They needed a platform that could:
- Ingest and reconcile 40+ data sources daily (POS, inventory, promotions, pricing, compliance)
- Run multiple ML models to generate predictions and recommendations
- Deliver prioritized action lists to field teams every morning
- Scale without proportional increases in engineering effort
Our Solution
Medallion Data Pipeline
We built a comprehensive data pipeline processing 40+ sources:
- 25+ bronze tables for raw ingestion from retailer feeds, syndicated data, and internal systems
- 30+ silver tables for validated, conformed, and enriched data
- Gold layer with business-ready datasets feeding ML models and dashboards
12+ ML Models in Daily Production
The platform orchestrates multiple ML models, each solving a specific business problem:
- Demand forecasting — Predict outlet-level demand by SKU
- Customer segmentation — Cluster outlets by behavior and potential
- Compliance detection — Identify planogram and pricing violations from store data
- Pricing optimization — Recommend optimal pricing strategies
- KPI prioritization — Determine which actions will have the highest impact per outlet
OmegaConf Configuration System
All model configurations (features, hyperparameters, training windows, scoring schedules) are managed through OmegaConf YAML files. This lets data scientists adjust model behavior without pipeline code changes and enables consistent deployment through Databricks Asset Bundles (DABs).
Daily Orchestration
The full pipeline runs daily: data ingestion, transformation, model scoring, and result delivery. A dependency graph ensures models run in the correct order (segmentation before prioritization, for example), with retry logic and alerting for failures.
Results
- 12+ ML models running in daily production
- 40+ data sources ingested and reconciled daily
- 10K+ outlets with prioritized action recommendations
- 100K+ SKUs tracked across 5 retail chains
- Measurable improvement in sales execution effectiveness and outlet coverage
Technologies Used
Databricks, Delta Lake, PySpark, MLflow, OmegaConf, Databricks Asset Bundles (DABs), Python, SQL
Deep Dive
Orchestrating 12 ML Models Daily for Retail Execution at Scale →Inside the architecture of an AI-driven sales execution platform that runs 12 ML models daily across 10,000+ retail outlets and 100,000+ SKUs. We cover the medallion architecture, OmegaConf-based model configuration, and the orchestration patterns that keep it all running on Databricks.
Ready to Build Your Data Platform?
Let's discuss how proven architecture and engineering can solve your specific challenges.
Schedule a Consultation