The challenge
The category leads bought stock the way you’d expect from a fast-moving retailer: by gut feel, anchored to last year’s sales adjusted for “more or less” — usually more. The result was a working capital problem hiding in plain sight. €30M+ tied up in inventory at any point, of which roughly a third was never going to sell at full margin.
The CFO wanted forecasting. The category leads were resistant — they had used a forecasting tool from their previous ERP and it had been wrong enough, often enough, that the team had stopped trusting any model. The new initiative had to clear that bar.
What we found in the first week:
- The data existed (3 years of daily sales, marketing spend, returns, weather, calendar effects), but no one had ever joined it cleanly.
- “Forecast accuracy” was not measured. So nobody could say whether the previous tool was actually as bad as people remembered, or whether the bar to clear was achievable.
Our approach
We didn’t promise a model. We promised an honest comparison.
Week 1 — Define the bar. With the CFO and two category leads we picked 20 representative SKUs covering the long tail and the bestsellers. We agreed on the metric (mean absolute percentage error, weighted by margin) and the holdout window (the last 12 months). The previous tool’s predictions were retrievable — we benchmarked them. The honest answer: it was as bad as people remembered on the long tail and surprisingly decent on bestsellers.
Week 2–3 — Baseline. Two simple models — moving average and exponential smoothing with calendar features. These beat the old tool by 18% on the long tail. We showed this to the category leads before mentioning machine learning. The point was that the win came from joining the data properly, not from a fancier model.
Week 4–5 — Real model. A gradient-boosted forecaster per category, with backtesting against the agreed holdout. It beat the simple baselines by another ~12%. We deployed it as a recommender, not an autopilot — every order went through a category lead with the model’s suggestion, the model’s confidence, and the historical accuracy on similar SKUs.
Week 6 — Black Friday rehearsal. We re-ran the model against a synthetic Black Friday week and showed which SKUs the model was wrong about, and by how much. The category leads adjusted those manually before the real event.
The outcome
That first Black Friday, the retailer was the only one in their cohort that didn’t run out of three of their top-decile SKUs. Stockouts on top SKUs dropped 22% versus the previous year. Gross margin on the top decile rose 18%, partly because they didn’t have to fire-sale residual stock in January.
The category leads — the people who were sceptical at week one — became the strongest advocates. The reason was simple: the system told them where it was uncertain. A model that admits “I have low confidence on this SKU because the data is sparse” is a model people will use. A model that always sounds confident is one they ignore.
The CFO’s working-capital target for the year was hit a quarter early.