Is sizing your stakes by confidence ever a good idea?

Only when your confidence signal genuinely predicts results. The principle behind it, the Kelly criterion, is mathematically sound. But it amplifies whatever edge you have, and if the edge points the wrong way (as the model's does here), bigger stakes just enlarge the loss. The rule is: verify the signal predicts profit before you size by it, don't assume it does.

What's the difference between this and the Kelly criterion?

Kelly tells you the optimal fraction of your bankroll to stake given a real, measured edge. This experiment tests the prior question Kelly assumes is already answered: does the edge exist at all? For the Stablebet model on this data, it doesn't, so there's nothing for Kelly to size. We test Kelly itself in a separate experiment.

Does this mean the model is useless?

No. It means the model's confidence isn't a profit signal. The model still picks the winner about twice as often as random, and its calibration is honest. What it can't do is tell you which of its own picks are the profitable ones; no edge band turns a profit, and returns don't rise as its confidence grows. That's a limitation of the edge, not the accuracy.

Why don't the model's most-confident picks turn a profit?

Its biggest 'edges' are mostly long-shots it rates well above the market. The market prices long-shots efficiently: a horse at 25/1 is usually a genuine 2-3% chance, not the 6-8% the model thinks. So the bands where the model disagrees most with the market are the bands where the market is usually the one that's right. Every edge band still loses, and which band loses most shuffles from run to run rather than tracking confidence.

Did flat staking actually beat confidence-staking in the test?

Per pound staked, they landed within about a third of a percentage point of each other, which is noise at this sample size. Confidence-staking happened to stake less overall (it backs off favourites), so it lost less in absolute cash, but the per-£ return was no better. There was no edge to amplify, so the staking rule made no meaningful difference.

Come on, you settled at SP. Take Betfair SP or best odds across the high-edge bands and the confidence-sizing would clear, surely?

It wouldn't, and the reason is built into where the model's confidence points. We settle every bet at industry SP with no commission deducted, the kind version, already better than most punters get. The high-edge bands the sizing rule presses up on are mostly long-shots, and the market prices long-shots tightly: a 25/1 horse is usually a genuine 2-3% chance, not the 6-8% the model thinks. Better odds shift the whole ledger up by a few points, but they shift every band by much the same amount, so no band climbs into profit and confidence still fails to predict it. Across the thousands of races tested the per-£ return landed around −11% whether staked flat or sized by edge at SP; a kinder price moves both numbers together, it does not make the signal appear. The shortfall is the bookmaker's margin sitting on bets with no real edge behind them, and that is structural, not a quirk of which price feed you settle at.

THE AI LAB

THE LAB · STAKING SYSTEMS

Should you bet more when you're confident?

Betting more when you're confident has a serious pedigree: the Kelly criterion, Ed Thorp, card counting. But it only works if your confidence is a real edge. We tested whether the Stablebet model's confidence predicts profit across thousands of the model's real races. The full story, then the honest data.

Doesn't workTested on thousands of racesBest vs worst edge band: no band turns a profit

18+ only·Research output, not advice·Methodology open · losses visible

Our in-house model lost 16.8% ROI on the pre-registered Oct-Nov 2024 backtest window.

This page publishes what it predicts and tracks every result. We do this because nobody else does. The methodology is open, the losses are visible, the analysis is honest. The model output is presented as a comparison to the market, not as a recommendation to back, lay, or stake on any runner.

Read the full methodology in our in-house AI horse-racing model write-up. Track the running ledger on the Stablebet track record page.

Gambling can be addictive. Please bet responsibly. Free, confidential support from GamCare, GamStop and BeGambleAware. See our responsible-gambling page for more.

The verdict

No. The model's 'confidence' doesn't predict profit, so sizing your stakes by it doesn't help.

Updated 22 July 2026 · 27,956 races settledSee where this ranks against every system →

What this experiment settles

Does the model's edge, how much higher it rates a horse than the market, predict which bets are profitable?
Does sizing your stakes by the model's confidence beat flat staking?
Which edge band performs best: the model's strongest 'value' picks, or the ones it rates below the market?

Methodology

Tested against the Stablebet model ledger, all-time, reconciled at industry SP. Returns measured to industry SP, flat £10 win on the model's top-rated pick per race unless stated. The underlying ledger and per-race results are public at /our-track-record/. For the detail, see how the AI model prices a race and how we settle every bet.

The story

The idea that you should bet bigger when you're more confident has a serious pedigree: it isn't bar-room folklore like the Martingale. It traces to a single paper. In 1956, a physicist at Bell Labs named John L. Kelly Jr. published "A New Interpretation of Information Rate", taking Claude Shannon's brand-new information theory and turning it into a betting rule. The result is now known as the Kelly criterion: if you genuinely hold an edge, there's a mathematically optimal fraction of your bankroll to stake, and it scales with the size of that edge. Bet more when the advantage is bigger, less when it's slimmer, nothing when it's gone.

For a decade it was a curiosity. Then a young maths professor, Edward O. Thorp, picked it up. Thorp had worked out that blackjack was beatable, as cards are dealt, the deck's odds shift, and a counter can know when they've tilted in their favour. His 1962 book Beat the Dealer proved it, and to size his bets he used Kelly: tiny stakes when the count was cold, big ones when the deck ran hot. It worked. Thorp then took the same principle to Wall Street, running the hedge fund Princeton/Newport Partners for nearly two decades without a losing year.

That's the romance behind "bet more when you're confident": a Bell Labs physicist, a card-counting professor, and a betting rule built from the mathematics of information itself.

Sources: Wikipedia, Kelly criterion, Wikipedia, Edward O. Thorp.

Why everyone swears by it

The appeal is the card-counter fantasy, and it's a powerful one: the disciplined player sitting quietly at the table, flat-betting through the cold spells, then pressing up hard the moment the odds swing their way. It feels like what a sharp operator should do. Why stake the same on your strongest fancy of the month as on a race you can barely split? Surely the money should follow the conviction.

And unlike most betting folklore, the maths is real. The Kelly criterion is provably optimal, stake the Kelly fraction of your bankroll on every genuine edge and, over a long enough run, your money grows faster than under any other staking rule. Bet too little and you leave growth on the table; bet the right amount, scaled to your edge, and you compound as fast as the mathematics allows. It even protects you: because the stake is always a fraction of what you hold, full Kelly never bets the whole bankroll on one outcome.

So the instinct that "confidence should size the bet" isn't naïve, it's the correct conclusion from a theorem. Every serious punter arrives at it eventually: I pick winners at a decent rate, I can tell my strong fancies from my weak ones, so I should weight my stakes accordingly and let the good ones carry the account. On paper, it's not just appealing. It's optimal.

The catch

Here's what the romance leaves out: the Kelly criterion runs on a single input, and everything depends on getting it right. That input is your edge: the true gap between the real probability and the odds on offer. Feed Kelly a real, accurately-measured edge and it compounds beautifully. Feed it a wrong one and it does the opposite of protect you: it stakes most precisely where you think you're strongest, so it pours the biggest bets onto your biggest mistakes.

That's why Thorp's count worked and most "confidence" doesn't. The blackjack edge is mechanical and measurable, remove the low cards and the player's advantage genuinely, calculably rises. The count isn't a feeling; it's the actual changed probability. Thorp could bet more when the deck was hot because the deck was hot, by an amount he could compute.

Overbet a true edge and you suffer wild swings. Overbet a phantom edge, one that isn't really there, and Kelly mathematically accelerates your ruin, because the rule is designed to lean in hard exactly when your model is most sure. The size of the bet only ever helps if the conviction behind it is correct.

So the question is never really "should I bet more when I'm confident?" The maths already answered that: yes, if your confidence is a real edge. The only question that matters is the empirical one: is it? Does your sense of a strong fancy actually pick out more profitable bets, or just bets you happen to like more? That's something you can't argue. You can only measure it.

Professor Furlong with a losing betting slip at the Stablebet AI Lab — The Professor has run this one through the numbers before. It still loses.

How we tested it

So we tested the only thing worth testing: is the model's confidence a real edge? We treated it exactly like Thorp's card count, a number that's supposed to tell you when to press up, and asked whether it actually predicts profit.

First we had to define "confidence" precisely. For us it's the model edge: how much higher the model rates a horse's chance than the market does. A big positive edge means the model thinks the price is too long: the equivalent of a hot deck. So we took every one of the model's top picks across thousands of the model's real races, sorted them into bands by that edge, and measured the ROI of each band at industry starting price. If confidence is a genuine signal, ROI should climb as the edge grows: the hot-deck bands should be the profitable ones.

Then we ran the staking question head-to-head. Flat stakes, a level £10 on every top pick, against sized-by-edge, a Kelly-style rule that bets up to 50% more on the highest-edge picks and up to 50% less on the weakest. Same races, same picks, same prices; the only difference is whether the stake leans on the model's confidence or ignores it entirely.

One honest note, the same as everywhere else in the Lab: the model loses to the market overall once you settle at SP, see the track record. So this isn't a search for a way to amplify a winner. It's the prior question, does the confidence signal carry any profit information at all that a staking rule could exploit?

What the data showed

The chart and tables below are pulled live from the experiment script, so they update as the ledger grows.

ROI by how much the model “likes” the horse

Flat-stake ROI for top picks in each model-edge band. If confidence predicted profit the bars would climb as edge grows, but instead they don't, and every band stays negative.

Model edge	Bets	Strike	Profit / loss	ROI
below 0	4,212	35.5%	-£4,089	-9.7%
0–3%	827	16.9%	-£1,072	-13.0%
3–6%	787	13.1%	-£1,037	-13.2%
6–10%	925	10.5%	-£2,042	-22.1%
10%+	925	12.1%	-£1,201	-13.0%

Flat stakes vs sizing by confidence (±50%)

The fair comparison is ROI (return per £ staked); absolute profit/loss just tracks how much was staked in total.

Strategy	Staked	Profit / loss	ROI	Worst drawdown
Flat £10	£76,760	-£9,441	-12.3%	-£9,868
Sized by edge (±50%)	£71,885	-£9,333	-13.0%	-£9,831

Sizing by confidence staked £4,875 less overall and lost £107 less in cash, but per-£ ROI landed within a third of a percentage point (-12.3% vs -13.0%). The strong signal is the bucket pattern above, not the staking rule. The 6–10% band returned the lowest ROI and the below 0 band the least-bad. Every band finished negative, and ROI does not climb as edge grows. See the full track record for the underlying ledger.

The headline pattern is clear, and it's a genuinely surprising one. The bars don't climb with edge. If the model's confidence were a real profit signal, ROI would rise steadily from the low-edge bands to the high-edge ones. Instead every band finishes in the red and the ordering scrambles: on the current data the worst band is a mid-range edge band (0.06–0.10), while the widest-edge band is merely the least-bad, still a loser, just a slower one. Confidence, it turns out, doesn't sort the good bets from the bad.

That's the robust signal: the missing slope, not which band happens to sit top or bottom on any given run. It's consistent with a model that's genuinely about twice as accurate as random at finding the winner, but that sees no extra value in its "edges" beyond what the market has already priced in. It can pick horses; it can't out-price the bookmaker.

The flat-versus-sized head-to-head confirms it, and this is where wording matters, so we'll be precise. Sized-by-edge staked less overall (it backs off favourites and presses up on long-shot edge picks) and therefore lost less in actual cash. But that's a consequence of betting less money, not of betting better. On the measure that controls for stake size, per-£ ROI: the two rules landed within a third of a percentage point of each other (both around −11%; the exact pair is in the head-to-head table above). That gap is noise at this sample size, and it flips with the exact sizing curve. There is no meaningful difference in profitability between the two. The signal you'd be amplifying with bigger stakes simply isn't there to amplify.

The verdict

Did the data back it up? Not for us, and the reason is exactly the one the history warned about. The card-counter logic is sound; Kelly's theorem is correct. Thorp beat blackjack because his confidence was a real, measurable edge, when his count said press up, the deck genuinely had. The maths was never the weak link. The edge was the thing that had to be real.

Ours isn't. The model's confidence, how far it strays from the market price, does not predict profit; if anything it leans the wrong way. So sizing your stakes by it doesn't help, doesn't hurt, and mostly just changes how much you happen to wager. The theorem still holds perfectly; we simply don't have the input it needs. Pointing a Kelly rule at a signal this weak is pressing up on a deck that was never hot.

So the honest verdict is no, not because betting to your edge is wrong, but because you have to own a measured edge first, and on this evidence the model's confidence isn't one. That's the order that matters, and almost everyone gets it backwards: find a signal that demonstrably predicts profit, then size to it. Variation only pays when the conviction behind it is correct.

If and when we can show a real edge, the honest tool to size it is a fraction of the Kelly criterion, never full, because overbetting a shaky estimate is its own way to go broke (the Martingale is the cautionary extreme). Until then, the sound rule is the simple one: a flat stake you can afford. And whether we hold any edge worth staking on at all, we publish either way, losses included, in the track record.

Frequently asked questions

Is sizing your stakes by confidence ever a good idea?: Only when your confidence signal genuinely predicts results. The principle behind it, the Kelly criterion, is mathematically sound. But it amplifies whatever edge you have, and if the edge points the wrong way (as the model's does here), bigger stakes just enlarge the loss. The rule is: verify the signal predicts profit before you size by it, don't assume it does.
What's the difference between this and the Kelly criterion?: Kelly tells you the optimal fraction of your bankroll to stake given a real, measured edge. This experiment tests the prior question Kelly assumes is already answered: does the edge exist at all? For the Stablebet model on this data, it doesn't, so there's nothing for Kelly to size. We test Kelly itself in a separate experiment.
Does this mean the model is useless?: No. It means the model's confidence isn't a profit signal. The model still picks the winner about twice as often as random, and its calibration is honest. What it can't do is tell you which of its own picks are the profitable ones; no edge band turns a profit, and returns don't rise as its confidence grows. That's a limitation of the edge, not the accuracy.
Why don't the model's most-confident picks turn a profit?: Its biggest 'edges' are mostly long-shots it rates well above the market. The market prices long-shots efficiently: a horse at 25/1 is usually a genuine 2-3% chance, not the 6-8% the model thinks. So the bands where the model disagrees most with the market are the bands where the market is usually the one that's right. Every edge band still loses, and which band loses most shuffles from run to run rather than tracking confidence.
Did flat staking actually beat confidence-staking in the test?: Per pound staked, they landed within about a third of a percentage point of each other, which is noise at this sample size. Confidence-staking happened to stake less overall (it backs off favourites), so it lost less in absolute cash, but the per-£ return was no better. There was no edge to amplify, so the staking rule made no meaningful difference.
Come on, you settled at SP. Take Betfair SP or best odds across the high-edge bands and the confidence-sizing would clear, surely?: It wouldn't, and the reason is built into where the model's confidence points. We settle every bet at industry SP with no commission deducted, the kind version, already better than most punters get. The high-edge bands the sizing rule presses up on are mostly long-shots, and the market prices long-shots tightly: a 25/1 horse is usually a genuine 2-3% chance, not the 6-8% the model thinks. Better odds shift the whole ledger up by a few points, but they shift every band by much the same amount, so no band climbs into profit and confidence still fails to predict it. Across the thousands of races tested the per-£ return landed around −11% whether staked flat or sized by edge at SP; a kinder price moves both numbers together, it does not make the signal appear. The shortfall is the bookmaker's margin sitting on bets with no real edge behind them, and that is structural, not a quirk of which price feed you settle at.

What this experiment doesn't cover, and what we're testing next

Does the Kelly criterion beat flat staking when the edge genuinely is real?
Does the model's confidence predict place results even where it misses on wins?
Would confidence-staking work on a flat-racing model rather than the National Hunt one tested here?

Related race coverage

Other Lab experiments

Browse every system tested →