FMCG Retail Execution AI Platform | Sparkvern Case Study

The Challenge

The company’s sales teams needed data-driven guidance at the outlet level: which stores to visit, what actions to take, and which KPIs to prioritize. Existing reporting was backward-looking and couldn’t keep pace with the scale: 5 retail chains, 10K+ outlets, 100K+ SKUs.

They needed a platform that could:

Ingest and reconcile 40+ data sources daily (POS, inventory, promotions, pricing, compliance)
Run multiple ML models to generate predictions and recommendations
Deliver prioritized action lists to field teams every morning
Scale without proportional increases in engineering effort

Our Solution

Medallion Data Pipeline

We built a comprehensive data pipeline processing 40+ sources:

25+ bronze tables for raw ingestion from retailer feeds, syndicated data, and internal systems
30+ silver tables for validated, conformed, and enriched data
Gold layer with business-ready datasets feeding ML models and dashboards

12+ ML Models in Daily Production

The platform orchestrates multiple ML models, each solving a specific business problem:

Demand forecasting: Predict outlet-level demand by SKU
Customer segmentation: Cluster outlets by behavior and potential
Compliance detection: Identify planogram and pricing violations from store data
Pricing optimization: Recommend optimal pricing strategies
KPI prioritization: Determine which actions will have the highest impact per outlet

OmegaConf Configuration System

All model configurations (features, hyperparameters, training windows, scoring schedules) are managed through OmegaConf YAML files. This lets data scientists adjust model behavior without pipeline code changes and enables consistent deployment through Databricks Asset Bundles (DABs).

Daily Orchestration

The full pipeline runs daily: data ingestion, transformation, model scoring, and result delivery. A dependency graph ensures models run in the correct order (segmentation before prioritization, for example), with retry logic and alerting for failures.

Results

12+ ML models running in daily production
40+ data sources ingested and reconciled daily
10K+ outlets with prioritized action recommendations
100K+ SKUs tracked across 5 retail chains
Measurable improvement in sales execution effectiveness and outlet coverage

Technologies Used

Databricks, Delta Lake, PySpark, MLflow, OmegaConf, Databricks Asset Bundles (DABs), Python, SQL

Global FMCG Leader

Key Results