AlphaSense Kelly criterion · position sizing
RSS / EN

It’s not how much you make, it’s how big you bet.

Win rate doesn’t equal making money; getting direction right doesn’t equal getting position right. A trader with a 60% win rate and 1:1 payoff blows up on a single loss in five if they oversize; under-sizes for ten years and ends up with nothing if they bet too small. “How much should I bet” is a more fundamental question than “what direction.” The Kelly criterion is the mathematical answer — derived in 1956 by Bell Labs’ John Kelly for Claude Shannon’s information theory, later applied by Ed Thorp to blackjack and warrant arbitrage, and ultimately the bedrock of professional position management. This guide assembles Kelly together with expected value, risk of ruin, Sharpe ratio, vol targeting, mean-variance, risk parity, CPPI, and the rest of the position-sizing toolkit into a primer you can reread.

§01 · Framework — why sizing dominates

Red Casino-Style Die · red casino dice
Casino dice — the Kelly criterion has its roots in John Kelly’s work at Bell Labs on gambling and odds signals. This article assembles 12 position-sizing formulas (Kelly / Half / Quarter Kelly · Sharpe · Sortino · vol targeting · risk parity · CPPI / Optimal F) into an actionable primer.
Image: Wikimedia Commons / CC BY-SA 3.0.

Imagine a game: flip a coin, heads you win 2× your stake, tails you lose your stake. Expected value +50% per round — bet however you want and you’ll get rich? Wrong. Bet 100% every time and a single tails takes you to zero — even with a positive EV, the long-run probability of ruin is 1.

This is the problem John Kelly addressed in his 1956 paper A New Interpretation of Information Rate: the bet size that maximizes long-run exponential capital growth. The answer is the Kelly criterion.

  1. Win rate × payoff ≠ profit. 60% win rate, 1:1 payoff → positive EV, but a 100% bet size goes to zero with high probability after 6 trades. Bet size turns “arithmetic expectation” into “geometric expectation” — the heart of compounding.
  2. Arithmetic mean ≠ geometric mean. Two consecutive years of +50%, -50%: arithmetic mean 0%, but geometric mean = √(1.5×0.5) - 1 = -13.4%. In a compounding world only the geometric mean matters. That’s why big drawdowns matter more than big gains.
  3. Sizing is the primary risk-management tool. The same strategy at different sizes can turn a Sharpe 0.8 strategy into a “Sharpe 3 disaster” (oversized) or “Sharpe 0.2 plodder” (undersized). The Sharpe ratio doesn’t change, but long-run compound returns are completely different.
  4. Four answers to “how much should I bet”.Kelly (geometric optimum, but assumes the true probabilities are known); ② Vol targeting (lock the portfolio’s volatility); ③ Mean-variance (minimize variance for a given target return); ④ Risk parity (each asset contributes equal risk). Each method answers a different question.

Bottom Line · Picking direction only gives you expected value; picking size gives you compounding.

Professionals spend 70% of their effort on the risk budget and 30% on direction. Retail does the opposite. Ed Thorp, Jim Simons, and Paul Tudor Jones have all said publicly: “The key to wealth is not being wrong, it’s not being wrong big.”

§02 · Expected value — Expected Value

Before Kelly, you need expected value (EV). Without positive EV, no sizing formula in the world will save a losing strategy.

  1. EV · Expected Value. EV = p × W − (1−p) × L. p = win rate; W = win amount; L = loss amount. EV > 0 is the minimum bar for “bet it”, but doesn’t mean you can bet any size you want. Example: p=60%, W=$100, L=$100 → EV = $60 − $40 = +$20/round.
  2. b · Odds / payoff ratio. b = W / L. For every dollar lost, you win b dollars. Classic Kelly parameter. In stock trading it’s “take-profit room / stop-loss room.”
  3. Edge. Edge = EV / bet = p × b − (1−p). “Cents earned per dollar bet, on average.” Bet only when Edge > 0. Long-run edge for professional gamblers is typically 1–2%.
  4. Win Rate. Profitable trades / total trades. Not the same as profitability — trend following often wins 35% but pays 3:1 and still makes money.
  5. Payoff Ratio. PR = avg win / avg loss. Average size of winners / average size of losers. PR × win rate > 1 − win rate is the positive-EV condition.
  6. Breakeven Win Rate. BEWR = 1 / (1 + PR). Given a payoff ratio, how high must the win rate be to break even? PR=1 → 50%; PR=2 → 33%; PR=3 → 25%.
  7. Profit Factor. PF = total wins / total losses. One of the most-used backtest metrics. PF > 1.5 acceptable, PF > 2 strong, PF > 3 watch out for overfitting.
  8. Expectancy. E = (Win% × AvgWin) − (Loss% × AvgLoss). Average profit per trade. Expectancy × annual trade count ≈ annualized return (ignoring compounding).

Recommendation · Confirm positive EV first, then talk size.

Beginners often apply Kelly to clearly negative-EV strategies (e.g. “buy the open, sell the close every day”), and the result is “blow up faster.” Kelly is optimal sizing for positive-EV strategies, and “fastest path to zero” sizing for negative-EV ones — it’s an amplifier, not a magic wand.

§03 · Kelly criterion — the Kelly Criterion

Kelly comes in three common forms, mapped to: binary bets (gambling), independent multi-bet (portfolio), continuous distributions (financial assets). Start with the simplest.

  1. Kelly (discrete) · classic binary bet. f* = (bp − q) / b = p − q/b, where p = win rate, q = 1 − p, b = odds ($b won per $1). f* = the fraction of capital to bet. Example: p = 60%, b = 1 (1:1 payoff), f* = 0.6 − 0.4/1 = 20%. Bet 20% of capital each round and long-run geometric growth is maximized.
  2. Kelly (continuous) · financial-asset version. f* = (μ − r) / σ², where μ = asset expected excess return, r = risk-free rate, σ² = asset variance. Merton 1969 derivation. For continuously distributed assets / strategies, optimal sizing directly reflects excess Sharpe. Sharpe² = 2 × Kelly’s geometric growth rate — the deep link between Kelly and Sharpe.
  3. Kelly (multi-asset) · generalized Kelly. f* = Σ⁻¹ × (μ − r·1), where Σ = covariance matrix, μ = return vector. For multiple correlated assets, you cannot apply Kelly independently to each — covariance must be considered. Mathematically this is the Markowitz tangency portfolio scaled by a risk-preference parameter.

Three mathematical properties of Kelly: ① Maximizes long-run geometric growth (log-optimal); ② Minimizes time to reach any wealth target; ③ Never goes bankrupt (assuming continuously adjustable bet size).

But subject to three preconditions: ① Probability and odds are “accurately known”; ② Bets can be infinitely subdivided; ③ Only long-run log-wealth matters.

Kelly in different scenarios

ScenarioWin rate pPayoff bKelly f*Meaning
Thorp blackjack51%1:12%2% of bankroll per hand
Trend-following CTA40%3:120%20% per single instrument
Intraday momentum55%1:110%Slightly higher win rate, even payoff
Merger arb90%1:153.3%High win rate, very low payoff
Event-driven50%2:125%Good payoff offsets moderate win rate
SP500 (continuous)μ=6%, σ=16%234%Theoretically full position + 135% leverage (impractical)

⚠ Important · Kelly outputs the theoretical optimum “given the true probabilities are known.”

In reality the probabilities and odds are estimates — your “60%” might really be 52%. This “parameter uncertainty” means in the real world you should use Fractional Kelly (see §04), not full Kelly. Professionals typically size at Half Kelly or Quarter Kelly.

§04 · Fractional Kelly — why practitioners under-bet

Academic Kelly is optimal under the assumption that “parameters are perfectly known.” In reality parameters always have errors, so professionals discount — that’s Fractional Kelly.

  1. Half Kelly. f = 0.5 × f*. The most common “robust” version. Geometric growth loses only 25% (0.75 × full Kelly), but drawdowns shrink dramatically. Ed Thorp himself ran his own fund at Half Kelly.
  2. Quarter Kelly. f = 0.25 × f*. More conservative. Survival first, growth second. Many CTA funds run at roughly 0.2–0.3× Kelly in practice.
  3. Full Kelly drawdown · the cost of full Kelly. Full Kelly’s theoretical drawdown probability is 50%+, expected max drawdown ~50%. Psychological tolerance is far below the mathematical optimum — even if you understand the math, your clients / shareholders won’t.
  4. Kelly Leverage. The Kelly formula often outputs f* > 100% (requiring leverage). SPX theoretical Kelly ~234% = 2.34× leverage — almost no one actually does this.
  5. Overbet. f > f*. Geometric growth drops sharply — 1.5× Kelly has zero growth, 2× Kelly has negative long-run expected return.
  6. Underbet. f < f*. Safe but slow. “Below Kelly is always positive expectation,” so being conservative has almost no mathematical cost — only opportunity cost.

Geometric growth vs Kelly multiple

Bet multiple (vs Kelly)Geometric growth (relative)Max drawdown (sample)Verdict
0.25× (Quarter Kelly)44%~15%Extremely robust
0.5× (Half Kelly)75%~25%Sweet spot
0.75×94%~40%Aggressive
1.0× (Full Kelly)100%~50%+Mathematical optimum
1.5×75%~75%+Overbetting (negative)
2.0×0%extremeZero growth
> 2.0×Negativenear 100%Long-run guaranteed ruin

§05 · Risk of ruin — Risk of Ruin

Risk of ruin is the central concept in gambling theory. For a player with finite samples + concave utility, Sharpe, EV, and Kelly can all mislead — but “the probability of zeroing out” never lies.

  1. Risk of Ruin. ROR = ((1−A)/(1+A))^C, where A = edge (per bet), C = number of capital units. Finite-sample ruin formula. Smaller edge and fewer capital units → higher ROR. Kelly’s “never goes bankrupt” only holds with infinite time + continuously adjustable bets.
  2. Gambler’s Ruin. Classic probability problem: each round 50/50 win or lose $1, start with $N, leave at $M. Probability of ruin = M/(M+N). So small wins are easy, doubling is hard.
  3. Max Drawdown. Peak-to-trough decline of net asset value. After 50% MDD you need to double to recover — psychologically harder than the math. Professional money managers cap MDD red lines at 15–20%.
  4. Calmar Ratio. Calmar = annualized return / |MDD|. “How much annualized return per unit of max drawdown.” Calmar > 0.5 acceptable, > 1 strong, > 2 outstanding. Reflects the “experience” better than Sharpe.
  5. Ulcer Index. UI = √(Σ DD_t²/T). Root-mean-square of drawdowns. Proposed by Peter Martin, smoother than MDD. Used to penalize the combined psychological burden of “long shallow drawdowns” and “short deep drawdowns.”
  6. Time to Recovery. How long it takes to reach a prior high after an MDD trough. High-volatility strategies — even with strong Sharpe — can have very long recoveries. After 2008, the S&P took 5 years to recover.
  7. Stop-Loss. Pre-set “stop trading when account drawdown reaches X%.” Not to prevent losses — to prevent emotional collapse. Institutions commonly use 15% monthly / 20–25% annual stops.
  8. VaR · CVaR. VaR_95 = 5%-quantile loss; CVaR = mean loss beyond VaR. “95% probability one-day loss won’t exceed X.” CVaR (ES / Expected Shortfall) reflects tail risk better than VaR; post-2008 it replaced VaR as the regulatory standard.

§06 · Sharpe · Sortino — the performance ratios

Sharpe / Sortino / Calmar / Information Ratio are different versions of “return / risk.” If unsure, default to Sharpe — but each answers a slightly different question.

  1. Sharpe · William Sharpe 1966. Sharpe = (μ − r) / σ. Most general. Uses total volatility. Sharpe 1 = useful, > 1 = good, > 2 = outstanding, > 3 likely overfit. CTA long-run 0.5–0.7, StatArb 2–3, HFT 5–10+.
  2. Sortino. Sortino = (μ − r) / σ_down. Uses only downside volatility in the denominator (upside isn’t “risk”). Suitable for asymmetric distributions like put selling. Sortino is typically 30–50% higher than Sharpe.
  3. Calmar. Calmar = μ / |MDD|. Return / max drawdown. Most intuitive for clients — “how much did I lose at my worst?”
  4. Information Ratio. IR = α / TE, where α = excess vs benchmark, TE = tracking error. Active-management standard. Long-run IR > 0.5 = good active manager; > 1 top-tier.
  5. Treynor. Treynor = (μ − r) / β. Uses β as the denominator, measuring excess per unit of “systematic risk.” Only meaningful in fully diversified portfolios.
  6. Omega Ratio. Ω(r) = ∫(+)/(−) split by threshold r. Considers asymmetry of the entire return distribution — more complete than Sharpe. Popular academically, limited industrial use.
  7. MAR Ratio. MAR = CAGR / |MDD|. Coined by Managed Account Reports. A Calmar variant using CAGR rather than mean annualized return. Industry-standard CTA evaluation metric.
  8. Sharpe → Kelly · Sharpe vs Kelly. Geometric growth rate ≈ Sharpe² / 2. At Kelly-optimal sizing, long-run geometric growth ≈ half of Sharpe squared. Sharpe 1 = 50bps/year geometric gain; Sharpe 2 = 2%/year. Reveals the essence of “high-Sharpe strategy = steeper compounding.”

§07 · Vol targeting — the simplest sizing rule

Vol targeting is the most widely used sizing method by institutional investors. Simpler than Kelly — no need to estimate expected returns, just lock the portfolio’s volatility at a target level.

  1. Vol Target. Position = (Target σ / Asset σ) × NAV. e.g.: Target σ = 10%, SPX σ = 16%, Position = 62.5% NAV. Scale capital by “target vol / current asset vol.” When asset volatility rises, position shrinks automatically; vice versa. Mechanism underlying most risk-parity / CTA strategies.
  2. Realized Vol. σ_real = std(daily returns) × √252. Annualized standard deviation of daily returns over the past N days (commonly 20–60). EWMA (exponentially weighted) and GARCH are more refined estimators.

Why institutions use it

  1. Risk budget — know “max volatility” in advance, easy to explain to clients.
  2. Volatility clustering — realized vol predicts future vol, especially short-term.
  3. Crisis avoidance — auto-deleverage when vol spikes (2020-03 / 2008).
  4. Long-run return preservation — empirically vol-targeted portfolios run 10–20% higher Sharpe than unmanaged ones.
  5. Composable — stacking multiple vol-targeted assets is risk parity.

Typical parameters

StrategyTarget σ
Conservative multi-asset (60/40 type)6–8%
Steady absolute return8–10%
Equity fund12–16%
Hedge fund (typical)10–15%
CTA trend15–20%
Aggressive long-short20–30%

§08 · Mean-variance — Mean-Variance Optimization

Harry Markowitz published Portfolio Selection in 1952, building Modern Portfolio Theory (MPT) and winning the 1990 Nobel Prize in Economics — four years before Kelly. Both ask “how to size positions,” but the objective functions differ: Markowitz minimizes variance for a given return; Kelly maximizes geometric growth.

  1. MPT · Modern Portfolio Theory. min w'Σw s.t. w'μ = μ_p, Σwᵢ = 1. Given target return μ_p, find weights w that minimize variance. Output is the Efficient Frontier — the envelope of all “optimal” portfolios.
  2. Efficient Frontier. Upper boundary in the return-risk plane. Any portfolio off the frontier is suboptimal — you can raise return without raising risk.
  3. Tangency Portfolio. w = Σ⁻¹(μ − r·1) / denominator. The point where a line from the risk-free asset is tangent to the efficient frontier — the maximum-Sharpe portfolio, the start of the Capital Allocation Line (CAL).
  4. CAPM · Capital Asset Pricing Model. E(Rᵢ) = Rf + βᵢ(Rm − Rf). Pricing theory derived from MPT. Only non-diversifiable systematic risk (β) earns a premium; idiosyncratic risk should be diversified away.
  5. Black-Litterman. Goldman Sachs 1990 enhancement. Use market-equilibrium weights as a prior + investor “views” as a likelihood → posterior portfolio. Solves pure MPT’s extreme sensitivity to inputs.
  6. Shrinkage. The sample covariance matrix is unstable in high dimensions. Ledoit-Wolf shrinkage “shrinks” the sample Σ toward a structured target (identity, diagonal), markedly improving out-of-sample performance.

⚠ MPT’s biggest practical issue · extreme input sensitivity — “the more you optimize, the worse it gets.”

MPT uses historical estimates of μ and Σ, but μ’s estimation error is far larger than Σ’s. Small errors get amplified by the optimizer into extreme weights (-300% in one asset, +400% in another). In practice, raw MPT is almost never used directly; instead, Black-Litterman / Shrinkage / Risk Parity / Equal Weight and other robust variants. A simple 60/40 often runs Sharpe comparable to “scientifically optimized” MPT over the long run.

§09 · Risk parity — Risk Parity

Ray Dalio created risk parity at Bridgewater in 1996. Core idea: allocate not by capital weight but by “risk contribution”. Each asset contributes equal risk to the portfolio.

  1. Risk Parity · Equal Risk Contribution. RCᵢ = wᵢ × (Σw)ᵢ / σ_p; require: RCᵢ = 1/N ∀i. Each asset’s marginal risk contribution is equal. “In a traditional 60/40, equities drive 90% of the volatility” — risk parity uses more bonds + moderate leverage to balance equity and bond risk.
  2. Naive RP · simple / inverse vol. wᵢ = (1/σᵢ) / Σ(1/σⱼ). Weight by inverse volatility, ignoring correlations. Not true RP, but often close enough to formal RP in practice, simple and reliable.

Classic All Weather allocation

Asset classWeight (typical)
US equities30%
Long-term Treasuries40%
Intermediate Treasuries15%
Gold7.5%
Commodities7.5%

Pros and cons

Pros:

Cons:

Other common sizing rules

  1. CPPI · Constant Proportion Portfolio Insurance. Risky asset = m × (NAV − Floor), with m = multiplier (3–5), Floor = floor amount. Add when NAV rises, cut when it falls. Underpins “principal protection + upside participation” products. Weakness is repeated whipsaw stops in choppy markets — many CPPI products got trapped at the floor in 2008.
  2. Volatility Scaling. Scale each strategy to the same vol, then weight. The mechanism underneath every multi-strategy platform (Millennium / Citadel / Balyasny).
  3. Optimal F · Ralph Vince Optimal F. A futures extension of Kelly normalized by historical max loss. More aggressive than Kelly, from Ralph Vince’s Mathematics of Money Management. Practically controversial.
  4. Fixed Fractional. Simplest approach: bet a fixed % of the account each time (e.g. 2%). Recommended starting point for new traders. Kelly is essentially “dynamic fixed fractional.”
  5. Fixed Dollar. Constant bet amount (e.g. always $1,000). No size cuts on losses, no size adds on wins. Avoids emotional sizing but lacks compounding.
  6. Anti-Martingale. “Add on wins, cut on losses.” Opposite of Martingale’s “double after a loss.” A natural trend-following sizing mechanism — Kelly is anti-martingale when win rates are stable.

§10 · Workflow — end-to-end sizing workflow

Given a positive-EV strategy (trading, gambling, business), the workflow below takes you from “should I bet” to “how much” as a complete decision.

Step 1 · Verify positive EV

Step 2 · Estimate parameters

Step 3 · Compute Kelly

Step 4 · Run + review

A concrete example · $100K account + a Sharpe 1.0 strategy — how to size?

  • Strategy backtest: μ = 15% (annual excess), σ = 15%, Sharpe = 1.0
  • Continuous Kelly f* = μ/σ² = 0.15 / 0.0225 = 6.67× (theoretical 666% leverage)
  • Half Kelly → 3.3× leverage (still impractical)
  • Quarter Kelly → 1.67× leverage (workable)
  • Add 20% MDD constraint: actual position < 1.0× NAV
  • Conclusion: use 100% NAV (no leverage), expected ~15% annualized, max drawdown 20–25%, Sharpe holds at 1.0
  • The leverage temptation is large, but parameter uncertainty makes the more robust choice

§11 · Common traps — the usual suspects

  1. Computing Kelly from backtest parameters. A backtest Sharpe of 2.0 may be a true 0.5. Parameter uncertainty means always discount — full Kelly should almost never be used.
  2. Sizing without correlations. “10% per strategy” × 5 strategies = 50% — but if those strategies are 0.8 correlated, real risk equals a 40% single-strategy bet. Use the covariance matrix.
  3. Martingale chasing losses. “Double after a loss, you’ll always recover” — mathematically positive-EV, but requires infinite bankroll. Real-world ruin probability > 50%. Every “Martingale machine” in history has eventually blown up.
  4. Applying Kelly to negative-EV strategies. Negative EV produces negative Kelly (you should reverse). If you stay long, Kelly becomes the “fastest path to zero”. Confirm EV > 0 first.
  5. Confusing fixed dollar with fixed fractional. “Always $1,000” vs “always 1% of NAV” compound very differently. Long-run, always use a percentage; otherwise wins don’t compound but losses are still proportional.
  6. Volatility window too long. Estimating current vol with three-year data underestimates sudden vol spikes. EWMA or 1–3 month windows reflect the present better.
  7. Ignoring “leverage costs”. Kelly often outputs > 100% — requires borrowing. Borrowing costs of 3–6%/year eat much of the excess. Real Kelly should subtract the borrowing rate.
  8. One Sharpe to rule them all. Strategy A Sharpe 1.0 with fat tails vs strategy B Sharpe 0.8 with normal distribution → different Kelly conclusions. Kelly ≈ Sharpe²/2 only under normality; non-normal cases must be re-derived.
  9. MDD stops = sell at the bottom. A strict “20% stop-loss” forces full liquidation in 2020-03 or 2008-10 → miss the bounce. MDD rules must integrate with vol regime, or be replaced by vol targeting instead of hard stops.
  10. Over-concentrated high-conviction bets. “This time I’m very sure” → 40% position. Kelly’s robustness to parameter estimates matters more than confidence — even high conviction shouldn’t exceed 25% in a single asset.
  11. Sharpe ≠ economic optimum. High Sharpe may come from tail-selling option strategies — 99% of the time +1%, 1% of the time -30%. Sortino / Calmar / MDD must be reviewed alongside — Sharpe alone doesn’t capture it.
  12. Math doesn’t match psychology. Mathematically Half Kelly is optimal, but a 30% MDD is unbearable to clients / yourself. Psychological tolerance determines real sizing, not math. Top managers treat psychological tolerance as a hard constraint, not a tunable.