The story
The idea that you should bet bigger when you're more confident has a serious pedigree β it isn't bar-room folklore like the Martingale. It traces to a single paper. In 1956, a physicist at Bell Labs named John L. Kelly Jr. published "A New Interpretation of Information Rate", taking Claude Shannon's brand-new information theory and turning it into a betting rule. The result is now known as the Kelly criterion: if you genuinely hold an edge, there's a mathematically optimal fraction of your bankroll to stake β and it scales with the size of that edge. Bet more when the advantage is bigger, less when it's slimmer, nothing when it's gone.
For a decade it was a curiosity. Then a young maths professor, Edward O. Thorp, picked it up. Thorp had worked out that blackjack was beatable β as cards are dealt, the deck's odds shift, and a counter can know when they've tilted in their favour. His 1962 book Beat the Dealer proved it, and to size his bets he used Kelly: tiny stakes when the count was cold, big ones when the deck ran hot. It worked. Thorp then took the same principle to Wall Street, running the hedge fund Princeton/Newport Partners for nearly two decades without a losing year.
That's the romance behind "bet more when you're confident": a Bell Labs physicist, a card-counting professor, and a betting rule built from the mathematics of information itself.
Sources: Wikipedia β Kelly criterion, Wikipedia β Edward O. Thorp.
Why everyone swears by it
The appeal is the card-counter fantasy, and it's a powerful one: the disciplined player sitting quietly at the table, flat-betting through the cold spells, then pressing up hard the moment the odds swing their way. It feels like what a sharp operator should do. Why stake the same on your strongest fancy of the month as on a race you can barely split? Surely the money should follow the conviction.
And unlike most betting folklore, the maths is real. The Kelly criterion is provably optimal β stake the Kelly fraction of your bankroll on every genuine edge and, over a long enough run, your money grows faster than under any other staking rule. Bet too little and you leave growth on the table; bet the right amount, scaled to your edge, and you compound as fast as the mathematics allows. It even protects you: because the stake is always a fraction of what you hold, full Kelly never bets the whole bankroll on one outcome.
So the instinct that "confidence should size the bet" isn't naΓ―ve β it's the correct conclusion from a theorem. Every serious punter arrives at it eventually: I pick winners at a decent rate, I can tell my strong fancies from my weak ones, so I should weight my stakes accordingly and let the good ones carry the account. On paper, it's not just appealing. It's optimal.
The catch
Here's what the romance leaves out: the Kelly criterion runs on a single input, and everything depends on getting it right. That input is your edge β the true gap between the real probability and the odds on offer. Feed Kelly a real, accurately-measured edge and it compounds beautifully. Feed it a wrong one and it does the opposite of protect you: it stakes most precisely where you think you're strongest, so it pours the biggest bets onto your biggest mistakes.
That's why Thorp's count worked and most "confidence" doesn't. The blackjack edge is mechanical and measurable β remove the low cards and the player's advantage genuinely, calculably rises. The count isn't a feeling; it's the actual changed probability. Thorp could bet more when the deck was hot because the deck was hot, by an amount he could compute.
Overbet a true edge and you suffer wild swings. Overbet a phantom edge β one that isn't really there β and Kelly mathematically accelerates your ruin, because the rule is designed to lean in hard exactly when your model is most sure. The size of the bet only ever helps if the conviction behind it is correct.
So the question is never really "should I bet more when I'm confident?" The maths already answered that: yes β if your confidence is a real edge. The only question that matters is the empirical one: is it? Does your sense of a strong fancy actually pick out more profitable bets β or just bets you happen to like more? That's something you can't argue. You can only measure it.
How we tested it
So we tested the only thing worth testing: is the model's confidence a real edge? We treated it exactly like Thorp's card count β a number that's supposed to tell you when to press up β and asked whether it actually predicts profit.
First we had to define "confidence" precisely. For us it's the model edge: how much higher the model rates a horse's chance than the market does. A big positive edge means the model thinks the price is too long β the equivalent of a hot deck. So we took every one of the model's top picks across 6,531 real races, sorted them into bands by that edge, and measured the ROI of each band at industry starting price. If confidence is a genuine signal, ROI should climb as the edge grows β the hot-deck bands should be the profitable ones.
Then we ran the staking question head-to-head. Flat stakes β a level Β£10 on every top pick β against sized-by-edge, a Kelly-style rule that bets up to 50% more on the highest-edge picks and up to 50% less on the weakest. Same races, same picks, same prices; the only difference is whether the stake leans on the model's confidence or ignores it entirely.
One honest note, the same as everywhere else in the Lab: the model loses to the market overall once you settle at SP β see the track record. So this isn't a search for a way to amplify a winner. It's the prior question β does the confidence signal carry any profit information at all that a staking rule could exploit?
What the data showed
The chart and tables below are pulled live from the experiment script, so they update as the ledger grows.
ROI by how much the model βlikesβ the horse
Flat-stake ROI for top picks in each model-edge band. If confidence predicted profit the bars would climb as edge grows β instead the most-confident bands are among the worst.
| Model edge | Bets | Strike | Profit / loss | ROI |
|---|---|---|---|---|
| below 0 | 3,632 | 36.4% | -Β£3,212 | -8.8% |
| 0β3% | 709 | 17.1% | -Β£993 | -14.0% |
| 3β6% | 664 | 13.0% | -Β£979 | -14.7% |
| 6β10% | 773 | 10.5% | -Β£1,734 | -22.4% |
| 10%+ | 753 | 12.7% | -Β£377 | -5.0% |
Flat stakes vs sizing by confidence (Β±50%)
The fair comparison is ROI (return per Β£ staked); absolute profit/loss just tracks how much was staked in total.
| Strategy | Staked | Profit / loss | ROI | Worst drawdown |
|---|---|---|---|---|
| Flat Β£10 | Β£65,310 | -Β£7,294 | -11.2% | -Β£7,864 |
| Sized by edge (Β±50%) | Β£60,725 | -Β£7,001 | -11.5% | -Β£7,781 |
Sizing by confidence staked Β£4,585 less overall and lost Β£293 less in cash β but per-Β£ ROI landed within a third of a percentage point (-11.2% vs -11.5%). The robust signal is the bucket pattern above, not the staking rule. The 6β10% band β the model's most-confident picks β returned the worst ROI; the 10%+ band (mostly favourites) returned the least-bad. See the full track record for the underlying ledger.
The headline pattern is unambiguous, and it's the wrong way round. The bars don't climb with edge. The model's most-confident bands β the picks where it most strongly disagrees with the market β are among its worst-performing, and the band where the model rates a horse below the market (mostly short favourites it thinks are too short) is the least-bad of the lot. The hot deck, it turns out, is the cold one.
That's a robust signal, not a blip. The picks the model loves most are, on average, the ones it's most wrong about β consistent with a model that's genuinely about twice as accurate as random at finding the winner, but that sees no extra value in the long-shot bands beyond what the market has already priced in. It can pick horses; it can't out-price the bookmaker.
The flat-versus-sized head-to-head confirms it β and this is where wording matters, so we'll be precise. Sized-by-edge staked less overall (it backs off favourites and presses up on long-shot edge picks) and therefore lost less in actual cash. But that's a consequence of betting less money, not of betting better. On the measure that controls for stake size β per-Β£ ROI β the two rules landed within a third of a percentage point of each other (roughly β11.2% flat versus β11.5% sized). That gap is noise at this sample size, and it flips with the exact sizing curve. There is no meaningful difference in profitability between the two. The signal you'd be amplifying with bigger stakes simply isn't there to amplify.
The verdict
Did the data back it up? Not for us β and the reason is exactly the one the history warned about. The card-counter logic is sound; Kelly's theorem is correct. Thorp beat blackjack because his confidence was a real, measurable edge β when his count said press up, the deck genuinely had. The maths was never the weak link. The edge was the thing that had to be real.
Ours isn't. The model's confidence β how far it strays from the market price β does not predict profit; if anything it leans the wrong way. So sizing your stakes by it doesn't help, doesn't hurt, and mostly just changes how much you happen to wager. The theorem still holds perfectly; we simply don't have the input it needs. Pointing a Kelly rule at a signal this weak is pressing up on a deck that was never hot.
So the honest verdict is no β not because betting to your edge is wrong, but because you have to own a measured edge first, and on this evidence the model's confidence isn't one. That's the order that matters, and almost everyone gets it backwards: find a signal that demonstrably predicts profit, then size to it. Variation only pays when the conviction behind it is correct.
If and when we can show a real edge, the honest tool to size it is a fraction of the Kelly criterion β never full, because overbetting a shaky estimate is its own way to go broke (the Martingale is the cautionary extreme). Until then, the grown-up rule is the dull one: a flat stake you can afford. And whether we hold any edge worth staking on at all, we publish either way β losses included β in the track record.
