StableBet
The Lab Β· Staking systems

Should you bet more when you're confident?

Betting more when you're confident has a serious pedigree β€” the Kelly criterion, Ed Thorp, card counting. But it only works if your confidence is a real edge. We tested whether the Stablebet model's confidence predicts profit across 6,531 real races. The full story, then the honest data.

Doesn't workTested on 6,531 racesBest vs worst edge band: the gap goes the wrong way
18+ onlyResearch output, not adviceMethodology open Β· losses visible

Our in-house model lost 16.8% ROI on the pre-registered Oct-Nov 2024 backtest window.

This page publishes what it predicts and tracks every result. We do this because nobody else does β€” the methodology is open, the losses are visible, the analysis is honest. The model output is presented as a comparison to the market, not as a recommendation to back, lay, or stake on any runner.

Read the full methodology in our in-house AI horse-racing model write-up. Track the running ledger on the Stablebet track record page.

Gambling can be addictive. Please bet responsibly. Free, confidential support from GamCare, GamStop and BeGambleAware. See our responsible-gambling page for more.

The verdict

No β€” the model's 'confidence' doesn't predict profit, so sizing your stakes by it doesn't help.

What this experiment settles

  • Does the model's edge β€” how much higher it rates a horse than the market β€” predict which bets are profitable?
  • Does sizing your stakes by the model's confidence beat flat staking?
  • Which edge band performs best: the model's strongest 'value' picks, or the ones it rates below the market?

Methodology

Tested against the Stablebet model ledger, all-time, reconciled at industry SP. Returns measured to industry SP, flat Β£10 win on the model's top-rated pick per race unless stated. The underlying ledger and per-race results are public at /our-track-record/; the model itself is described in the methodology write-up.

The story

The idea that you should bet bigger when you're more confident has a serious pedigree β€” it isn't bar-room folklore like the Martingale. It traces to a single paper. In 1956, a physicist at Bell Labs named John L. Kelly Jr. published "A New Interpretation of Information Rate", taking Claude Shannon's brand-new information theory and turning it into a betting rule. The result is now known as the Kelly criterion: if you genuinely hold an edge, there's a mathematically optimal fraction of your bankroll to stake β€” and it scales with the size of that edge. Bet more when the advantage is bigger, less when it's slimmer, nothing when it's gone.

For a decade it was a curiosity. Then a young maths professor, Edward O. Thorp, picked it up. Thorp had worked out that blackjack was beatable β€” as cards are dealt, the deck's odds shift, and a counter can know when they've tilted in their favour. His 1962 book Beat the Dealer proved it, and to size his bets he used Kelly: tiny stakes when the count was cold, big ones when the deck ran hot. It worked. Thorp then took the same principle to Wall Street, running the hedge fund Princeton/Newport Partners for nearly two decades without a losing year.

That's the romance behind "bet more when you're confident": a Bell Labs physicist, a card-counting professor, and a betting rule built from the mathematics of information itself.

Sources: Wikipedia β€” Kelly criterion, Wikipedia β€” Edward O. Thorp.

Why everyone swears by it

The appeal is the card-counter fantasy, and it's a powerful one: the disciplined player sitting quietly at the table, flat-betting through the cold spells, then pressing up hard the moment the odds swing their way. It feels like what a sharp operator should do. Why stake the same on your strongest fancy of the month as on a race you can barely split? Surely the money should follow the conviction.

And unlike most betting folklore, the maths is real. The Kelly criterion is provably optimal β€” stake the Kelly fraction of your bankroll on every genuine edge and, over a long enough run, your money grows faster than under any other staking rule. Bet too little and you leave growth on the table; bet the right amount, scaled to your edge, and you compound as fast as the mathematics allows. It even protects you: because the stake is always a fraction of what you hold, full Kelly never bets the whole bankroll on one outcome.

So the instinct that "confidence should size the bet" isn't naΓ―ve β€” it's the correct conclusion from a theorem. Every serious punter arrives at it eventually: I pick winners at a decent rate, I can tell my strong fancies from my weak ones, so I should weight my stakes accordingly and let the good ones carry the account. On paper, it's not just appealing. It's optimal.

The catch

Here's what the romance leaves out: the Kelly criterion runs on a single input, and everything depends on getting it right. That input is your edge β€” the true gap between the real probability and the odds on offer. Feed Kelly a real, accurately-measured edge and it compounds beautifully. Feed it a wrong one and it does the opposite of protect you: it stakes most precisely where you think you're strongest, so it pours the biggest bets onto your biggest mistakes.

That's why Thorp's count worked and most "confidence" doesn't. The blackjack edge is mechanical and measurable β€” remove the low cards and the player's advantage genuinely, calculably rises. The count isn't a feeling; it's the actual changed probability. Thorp could bet more when the deck was hot because the deck was hot, by an amount he could compute.

Overbet a true edge and you suffer wild swings. Overbet a phantom edge β€” one that isn't really there β€” and Kelly mathematically accelerates your ruin, because the rule is designed to lean in hard exactly when your model is most sure. The size of the bet only ever helps if the conviction behind it is correct.

So the question is never really "should I bet more when I'm confident?" The maths already answered that: yes β€” if your confidence is a real edge. The only question that matters is the empirical one: is it? Does your sense of a strong fancy actually pick out more profitable bets β€” or just bets you happen to like more? That's something you can't argue. You can only measure it.

How we tested it

So we tested the only thing worth testing: is the model's confidence a real edge? We treated it exactly like Thorp's card count β€” a number that's supposed to tell you when to press up β€” and asked whether it actually predicts profit.

First we had to define "confidence" precisely. For us it's the model edge: how much higher the model rates a horse's chance than the market does. A big positive edge means the model thinks the price is too long β€” the equivalent of a hot deck. So we took every one of the model's top picks across 6,531 real races, sorted them into bands by that edge, and measured the ROI of each band at industry starting price. If confidence is a genuine signal, ROI should climb as the edge grows β€” the hot-deck bands should be the profitable ones.

Then we ran the staking question head-to-head. Flat stakes β€” a level Β£10 on every top pick β€” against sized-by-edge, a Kelly-style rule that bets up to 50% more on the highest-edge picks and up to 50% less on the weakest. Same races, same picks, same prices; the only difference is whether the stake leans on the model's confidence or ignores it entirely.

One honest note, the same as everywhere else in the Lab: the model loses to the market overall once you settle at SP β€” see the track record. So this isn't a search for a way to amplify a winner. It's the prior question β€” does the confidence signal carry any profit information at all that a staking rule could exploit?

What the data showed

The chart and tables below are pulled live from the experiment script, so they update as the ledger grows.

ROI by how much the model β€œlikes” the horse

Flat-stake ROI for top picks in each model-edge band. If confidence predicted profit the bars would climb as edge grows β€” instead the most-confident bands are among the worst.

3%0%-25%-8.8%below 03632 bets-14.0%0–3%709 bets-14.7%3–6%664 bets-22.4%6–10%773 bets-5.0%10%+753 betsmodel edge (model% βˆ’ market%) β€” more confident β†’
Model edgeBetsStrikeProfit / lossROI
below 03,63236.4%-Β£3,212-8.8%
0–3%70917.1%-Β£993-14.0%
3–6%66413.0%-Β£979-14.7%
6–10%77310.5%-Β£1,734-22.4%
10%+75312.7%-Β£377-5.0%

Flat stakes vs sizing by confidence (Β±50%)

The fair comparison is ROI (return per Β£ staked); absolute profit/loss just tracks how much was staked in total.

StrategyStakedProfit / lossROIWorst drawdown
Flat Β£10Β£65,310-Β£7,294-11.2%-Β£7,864
Sized by edge (Β±50%)Β£60,725-Β£7,001-11.5%-Β£7,781

Sizing by confidence staked Β£4,585 less overall and lost Β£293 less in cash β€” but per-Β£ ROI landed within a third of a percentage point (-11.2% vs -11.5%). The robust signal is the bucket pattern above, not the staking rule. The 6–10% band β€” the model's most-confident picks β€” returned the worst ROI; the 10%+ band (mostly favourites) returned the least-bad. See the full track record for the underlying ledger.

The headline pattern is unambiguous, and it's the wrong way round. The bars don't climb with edge. The model's most-confident bands β€” the picks where it most strongly disagrees with the market β€” are among its worst-performing, and the band where the model rates a horse below the market (mostly short favourites it thinks are too short) is the least-bad of the lot. The hot deck, it turns out, is the cold one.

That's a robust signal, not a blip. The picks the model loves most are, on average, the ones it's most wrong about β€” consistent with a model that's genuinely about twice as accurate as random at finding the winner, but that sees no extra value in the long-shot bands beyond what the market has already priced in. It can pick horses; it can't out-price the bookmaker.

The flat-versus-sized head-to-head confirms it β€” and this is where wording matters, so we'll be precise. Sized-by-edge staked less overall (it backs off favourites and presses up on long-shot edge picks) and therefore lost less in actual cash. But that's a consequence of betting less money, not of betting better. On the measure that controls for stake size β€” per-Β£ ROI β€” the two rules landed within a third of a percentage point of each other (roughly βˆ’11.2% flat versus βˆ’11.5% sized). That gap is noise at this sample size, and it flips with the exact sizing curve. There is no meaningful difference in profitability between the two. The signal you'd be amplifying with bigger stakes simply isn't there to amplify.

The verdict

Did the data back it up? Not for us β€” and the reason is exactly the one the history warned about. The card-counter logic is sound; Kelly's theorem is correct. Thorp beat blackjack because his confidence was a real, measurable edge β€” when his count said press up, the deck genuinely had. The maths was never the weak link. The edge was the thing that had to be real.

Ours isn't. The model's confidence β€” how far it strays from the market price β€” does not predict profit; if anything it leans the wrong way. So sizing your stakes by it doesn't help, doesn't hurt, and mostly just changes how much you happen to wager. The theorem still holds perfectly; we simply don't have the input it needs. Pointing a Kelly rule at a signal this weak is pressing up on a deck that was never hot.

So the honest verdict is no β€” not because betting to your edge is wrong, but because you have to own a measured edge first, and on this evidence the model's confidence isn't one. That's the order that matters, and almost everyone gets it backwards: find a signal that demonstrably predicts profit, then size to it. Variation only pays when the conviction behind it is correct.

If and when we can show a real edge, the honest tool to size it is a fraction of the Kelly criterion β€” never full, because overbetting a shaky estimate is its own way to go broke (the Martingale is the cautionary extreme). Until then, the grown-up rule is the dull one: a flat stake you can afford. And whether we hold any edge worth staking on at all, we publish either way β€” losses included β€” in the track record.

Frequently asked questions

Is sizing your stakes by confidence ever a good idea?
Only when your confidence signal genuinely predicts results. The principle behind it β€” the Kelly criterion β€” is mathematically sound. But it amplifies whatever edge you have, and if the edge points the wrong way (as the model's does here), bigger stakes just enlarge the loss. The rule is: verify the signal predicts profit before you size by it, don't assume it does.
What's the difference between this and the Kelly criterion?
Kelly tells you the optimal fraction of your bankroll to stake given a real, measured edge. This experiment tests the prior question Kelly assumes is already answered: does the edge exist at all? For the Stablebet model on this data, it doesn't β€” so there's nothing for Kelly to size. We test Kelly itself in a separate experiment.
Does this mean the model is useless?
No β€” it means the model's confidence isn't a profit signal. The model still picks the winner about twice as often as random, and its calibration is honest. What it can't do is tell you which of its own picks are the profitable ones; its most-confident picks are, if anything, its worst. That's a limitation of the edge, not the accuracy.
Why are the model's most-confident picks its worst bets?
Its biggest 'edges' are mostly long-shots it rates well above the market. The market prices long-shots efficiently β€” a horse at 25/1 is usually a genuine 2-3% chance, not the 6-8% the model thinks. So the band where the model disagrees most with the market is the band where the market is most often right, and the model most often wrong.
Did flat staking actually beat confidence-staking in the test?
Per pound staked, they landed within about a third of a percentage point of each other β€” noise at this sample size. Confidence-staking happened to stake less overall (it backs off favourites), so it lost less in absolute cash, but the per-Β£ return was no better. There was no edge to amplify, so the staking rule made no meaningful difference.

What this experiment doesn't cover β€” and what we're testing next

  • Does the Kelly criterion beat flat staking when the edge genuinely is real?
  • Does the model's confidence predict place results even where it misses on wins?
  • Would confidence-staking work on a flat-racing model rather than the National Hunt one tested here?

Other Lab experiments