Fashion Intelligence

What size recommendation actually does against returns

The vendor promise "25% fewer returns through personalised size recommendation", honestly measured on 2.33 million real order line items: the mechanism works, the lever is just much smaller.

Guido Winger

6 min read

A widespread vendor promise is that personalised size recommendation cuts returns by 25%. We recomputed the mechanism on 2.33 million real order line items. It is real, but the lever is much smaller than advertised, and the reason lies not in the model but in the question of what is a sizing problem at all.

1 · Bracketing is large, but not everywhere a sizing problem

The mechanism is called size bracketing: customers order the same item in several sizes, keep one, and send the rest back. In the dataset, 16.6% of all line items are such brackets, with a return rate of 73%; they account for 23.5% of all returns. That is the pool a size recommendation could theoretically address.

But not every bracket is a sizing problem. Looking at how many sizes are kept in the end, the pool breaks apart:

50.5% of brackets: nothing kept. That is a product or taste problem no sizing logic solves.
40.5%: exactly one size kept. Only this is a sizing problem a recommendation could solve.
7.5% keep two, 1.2% keep three sizes.

Only this split makes the calculation honest. Addressable are solely the brackets where exactly one size is kept, 17,352 in the test window.

2 · The recommender, built leakage-free

The personalised size recommender learns from the past: each customer's typical kept size per product group, only from earlier orders. Where that history is missing, it falls back to the item's most-kept size, then to a global default. All built on a temporal split, so nothing from the future leaks into the recommendation. The customer history covers 34.8% of cases.

Measured cleanly, the personalised recommender hits the actually kept size in 28.8% of cases, versus 19.1% for a plain item baseline (which always recommends the item's most-kept size). Personalisation thus has a demonstrable effect, and the gap is clear.

3 · What is realistically captured, and what is not

Now the honest ceiling. Even if the recommender hit every addressable bracket correctly, that would be 6.9% of all returns (the 17,352 addressable brackets correspond to 20,607 avoidable returns). That is the ceiling, not the result.

Realistically, with the measured hit rate, the personalised recommender captures around 2.0% of all returns; the item baseline reaches 1.3%. The added value of personalisation is real (2.0 versus 1.3), but the order of magnitude is a different one from the 25% promise.

25% is not reachable with size recommendation alone - not because the model is too weak, but because only a good 40% of brackets are a sizing problem at all, and half (nothing kept) are not.

4 · The strongest counter-position

The strongest rebuttal: with real fit and body data, more would be possible. True. Our recommender uses only purchase and keep/return behaviour, no garment measurements, no body data. A fit advisor with real measurements could raise the hit rate, and in premium ranges with a high return cost rate it pays off earlier. What we show is the limit of what is reachable from order data alone, measured cleanly, without the 25% markup.

5 · What this article does not cover

No fit or body data, only purchase and return behaviour. No measured intervention effect: that the extra returns vanish if the kept size had been ordered alone is the bracketing logic, not a causal proof. A single retailer dataset (DMC 2016), no industry generalisation.

Reproducibility

All figures stem from the public DMC 2016 dataset via the companion repository fashion-size-fit-prediction (bracket decomposition on the full data, recommender on the temporal split; results/size_metrics.json holds the values cited here). The raw data is not shipped for licensing reasons, but reproduced via a loader from the Kaggle download.

→ The economic framing (what a return costs, when prevention pays) is in the sister article What fashion returns really cost.

Notice

This is not legal or business consulting, but a methodological research state on a public dataset (research date: June 2026). Ratios and assumptions should be checked against your own figures before any operational decision.

Sources

Data basis: Data Mining Cup 2016 (online fashion returns), via Kaggle.
Sister article with the cost calculation: What fashion returns really cost.

Independent reviewer: open invitation. Companion repository fashion-size-fit-prediction with bracket decomposition, a leakage-free personalised size recommender and committed metrics; figures reproducible from the DMC 2016 dataset.