FMCG & Retail / Healthcare (OTC)

Global FMCG Leader

AI-Driven Retail Execution Platform at Scale

DatabricksML PipelineOmegaConfRetail AnalyticsMLOps

Key Results

12+ ML models in daily production, 40+ data sources, 10K+ outlets monitored

The Challenge

The company’s sales teams needed data-driven guidance at the outlet level: which stores to visit, what actions to take, and which KPIs to prioritize. Existing reporting was backward-looking and couldn’t keep pace with the scale — 5 retail chains, 10K+ outlets, 100K+ SKUs.

They needed a platform that could:

  • Ingest and reconcile 40+ data sources daily (POS, inventory, promotions, pricing, compliance)
  • Run multiple ML models to generate predictions and recommendations
  • Deliver prioritized action lists to field teams every morning
  • Scale without proportional increases in engineering effort

Our Solution

Medallion Data Pipeline

We built a comprehensive data pipeline processing 40+ sources:

  • 25+ bronze tables for raw ingestion from retailer feeds, syndicated data, and internal systems
  • 30+ silver tables for validated, conformed, and enriched data
  • Gold layer with business-ready datasets feeding ML models and dashboards

12+ ML Models in Daily Production

The platform orchestrates multiple ML models, each solving a specific business problem:

  • Demand forecasting — Predict outlet-level demand by SKU
  • Customer segmentation — Cluster outlets by behavior and potential
  • Compliance detection — Identify planogram and pricing violations from store data
  • Pricing optimization — Recommend optimal pricing strategies
  • KPI prioritization — Determine which actions will have the highest impact per outlet

OmegaConf Configuration System

All model configurations (features, hyperparameters, training windows, scoring schedules) are managed through OmegaConf YAML files. This lets data scientists adjust model behavior without pipeline code changes and enables consistent deployment through Databricks Asset Bundles (DABs).

Daily Orchestration

The full pipeline runs daily: data ingestion, transformation, model scoring, and result delivery. A dependency graph ensures models run in the correct order (segmentation before prioritization, for example), with retry logic and alerting for failures.

Results

  • 12+ ML models running in daily production
  • 40+ data sources ingested and reconciled daily
  • 10K+ outlets with prioritized action recommendations
  • 100K+ SKUs tracked across 5 retail chains
  • Measurable improvement in sales execution effectiveness and outlet coverage

Technologies Used

Databricks, Delta Lake, PySpark, MLflow, OmegaConf, Databricks Asset Bundles (DABs), Python, SQL

Deep Dive

Orchestrating 12 ML Models Daily for Retail Execution at Scale →

Inside the architecture of an AI-driven sales execution platform that runs 12 ML models daily across 10,000+ retail outlets and 100,000+ SKUs. We cover the medallion architecture, OmegaConf-based model configuration, and the orchestration patterns that keep it all running on Databricks.

Ready to Build Your Data Platform?

Let's discuss how proven architecture and engineering can solve your specific challenges.

Schedule a Consultation