E-Commerce Intelligence

Why near-perfect conversion models mostly retell the basket

Rebuilt on real shop data: the near-perfect conversion AUC is largely a foregone conclusion. What matters in operations is the honest early prediction, not the headline figure from the end of the session.

Guido Winger

6 min read

The same number shows up regularly on sales slides: a model that predicts a session's purchase probability with a ROC-AUC close to perfection. We rebuilt that on real shop data and reproduced the high number. But it is largely a foregone conclusion, and that is what decides whether such a model is worth anything in operation.

1 · The data basis and a caveat that frames everything

The basis is the Data Mining Cup 2013 dataset, anonymised real data from an online shop with around 50,000 sessions and 429,000 click records. For each session it is recorded whether an order was placed in the end. Two things matter up front so that nobody reads the numbers wrong:

First, this is not a fashion dataset. The task description speaks of a general online shop with no industry. This article is therefore a general e-commerce study whose methodology transfers to any session-based shop.

Second, and this is the most common fallacy: the order rate in the dataset is 46 percent. That is not a real funnel conversion. Real online shops convert in the low single-digit percentage range. The 46 percent are a curated competition population already containing heavily pre-filtered sessions. Whoever cites this number as a conversion benchmark is comparing apples with a competition selection.

2 · The leakage trap that produces the beautiful number

A session consists of many click records that all carry the same session label. Whoever splits randomly or row-wise into training and test has the same session on both sides. The model then memorises the answer. The only clean cut is one along the sessions, not along the rows. We split by session number: the older 80 percent into training, the newer 20 percent into the test set.

Even then the impressive number appears at first. A gradient-boosting model on the session's end state reaches an AUC of 0.961. The catch sits in the basket features. At the end of a session, the state of the basket almost gives away whether an order was placed. Remove the basket block from the features and the same prediction drops to an AUC of 0.860. A good ten AUC points of the headline figure are therefore not foresight, but a restatement of the result.

Without the basket block the conversion AUC drops from 0.961 to 0.860, close to the early prediction. — Figure 1. The near-perfect end-state AUC is largely basket tautology: without the basket block it drops from 0.961 to 0.860, right where the early prediction already sits.

3 · What is actually actionable

What matters for operations is not the prediction at session end, but the early prediction: what can already be said after the first interaction, at a point when an intervention, say a nudge or a voucher, can still change something? This is exactly where the honest number sits. A model on the first transaction of a session alone reaches an AUC of 0.848. That is solid and usable, and it is almost level with the full model without the basket block.

The message to decision-makers is therefore clear: the number you plan an intervention on is the 0.85 of the early prediction, not the 0.96 from the end of the session. The latter is pretty on the slide and worthless in operation, because it is only certain once the customer's decision has already been made.

4 · The strongest counter-position

The strongest rebuttal: the late, basket-driven prediction has a use too, for instance to accompany an almost finished order with a cross-selling suggestion or to rescue a stalled checkout. That is true, and that is why we do not discard the full model. The point is a different one: you must not sell the basket-driven AUC as evidence of early warning. Two different tasks, two different numbers, two different intervention points.

5 · What this article does not cover

We deliberately name no euro saving. The dataset contains no clean anchor for margin or order value, and an invented number would be exactly the dishonest step this article criticises. We report predictive quality and the leakage gap, not a constructed business benefit. Beyond that: a single, anonymised shop; a competition population instead of a real funnel; and no causal model of the intervention.

Reproducibility

All figures stem from the public DMC 2013 dataset via the scripts and the notebook in the companion repository ecommerce-conversion-prediction. The raw data is not shipped for licensing reasons, but reproduced via a loader from the Kaggle download.

Notice

This is not legal or business consulting, but a methodological research state on a public dataset (research date: June 2026). The order rate of the dataset is a competition artefact and not a transferable conversion benchmark; metrics and assumptions should be checked against your own figures before any operational decision.

Sources

Data Mining Cup 2013, task "Prediction of orders" (prudsys): scenario, session and order definitions, tasks 1 and 2 (task PDF in the dataset).
Data basis: Data Mining Cup 2013, public Kaggle mirror (oscarm524/prediction-of-orders).

Independent reviewer: open invitation. Companion repository ecommerce-conversion-prediction with EDA, model, leakage-free session split and notebook; figures reproducible from the DMC 2013 dataset.