How Many Coin Tosses to Catch a Rigged Coin?

Suppose a coin might be biased toward heads, and the bias — if there is one — is slight. The question this note works through sounds simple but isn't: how many times do you have to flip a coin before you can confidently say it's rigged rather than merely unlucky?

The answer is not a single number — it depends on two things — but it is a clean formula:

It depends on how rigged the coin is and how sure you want to be. A wildly biased 60/40 coin gives itself away in a couple hundred flips. A barely-rigged 51/49 coin can hide for tens of thousands. The cost of catching it explodes as the bias shrinks — and the exact number falls out of a single equation, which a simulation then confirms.

🪙 Flip it and watch the evidence build

Set the coin's true bias with the slider (the universe knows it; you don't), then start flipping. The chart tracks the running fraction of heads, p̂, after every flip. The shaded funnel is the 95% range a genuinely fair coin would stay inside — it narrows like 1/√n. The moment your trace pokes out of the funnel and stays out, you've got visual evidence the coin isn't fair.

running p̂ fair = 0.5 true bias
true p(heads) 0.55
?
flips n
0
heads
0
flip to begin
Try this: set the bias to 0.55 and flip 100 at a time. Early on the trace lurches all over the place — small samples are noisy — and it often sits comfortably inside the fair funnel for a while. Push past a few hundred flips and it finally separates. Now set the bias to 0.70 and watch it escape almost immediately. That gap in difficulty is the whole story.

⚖️ The skeptic's question: the null hypothesis & the p-value

To prove rigging we start by assuming the opposite — that's how statistics stays honest. The null hypothesis is "the coin is fair," H0: p = 0.5. Under H0 the number of heads in n flips follows a Binomial(n, 0.5) distribution, and for any decent n that's well approximated by a bell curve. We summarise how far our result lands from the fair center with the z-score:

z = ( p̂ − 0.5 ) / √( 0.25 / n )

The p-value is then the probability a perfectly fair coin would look this extreme or more, just by luck. Small p-value = "a fair coin almost never does this" = evidence of rigging. The convention is to raise the alarm when p < 0.05. Drag the sample size and the observed fraction of heads and watch the tail area — the p-value — light up:

fair-coin distribution p-value (luck tails)
flips n 100
observed p̂ 0.60
heads
60
z-score
p-value
60 heads in 100: looks lopsided, but a fair coin manages it about 5% of the time — right on the borderline. Not yet damning.
600 heads in 1000: the same 60% fraction, but now p is astronomically small. Same bias, ten times the flips, overwhelming proof. Evidence is about n, not just the ratio.

🎯 Two ways to be wrong: false alarms (α) and missed detections (power)

A test can fail in two directions. A false alarm (Type I error) is calling a fair coin rigged — its rate is the threshold α we picked above (e.g. 0.05). A missed detection (Type II error, rate β) is letting a truly rigged coin walk free. What we really care about is power = 1 − β: the chance we catch a rigged coin when it really is rigged.

Below are two bell curves over p̂: the null (fair, centered at 0.5) and the alternative (the coin's true bias). We reject "fair" whenever p̂ lands past the critical lines. The red tails under the null are α; the green area under the alternative past the line is the power. Crank up n or the bias and watch the curves pull apart — that separation is power.

fair (null) true bias (alt) & power α (false alarm)
true p(heads) 0.55
flips n 200
α (×1000) 0.050
power
α
The tug-of-war: tightening α (fewer false alarms) pushes the critical lines outward and lowers power. The only way to win on both fronts at once is to collect more data. That trade-off is exactly what the sample-size formula below resolves.

🧮 The pure-math answer: a formula for n

Now we just turn the picture into algebra. We want the critical line to sit far enough from the fair center to keep false alarms at α, and far enough below the true bias to catch it with the power we want. Writing zα/2 and zβ for the standard-normal cutoffs, "push the two curves apart until both conditions hold" rearranges to:

n ≈ ( zα/2 · √¼  +  zβ · √p(1−p) )2  /  ( p − 0.5 )2

Everything in that fraction is mild except the denominator. The effect size (p − 0.5) is squared and sits on the bottom, so halving the bias you want to detect roughly quadruples the flips you need. Set your target bias, false-alarm rate, and power, and read off the budget:

detect p(heads) 0.55
α (×1000) 0.050
power (%) 80%
flips required
required n vs bias (log scale)
Worked examples (α = 0.05, power = 80%): a 60/40 coin (p = 0.60) needs only about 194 flips. A 55/45 coin needs roughly 783. A 51/49 coin? About 19,600 flips — the squared effect size in the denominator is merciless.

🔬 Simulation meets math: does the formula actually hold?

A formula is only as good as its predictions. So let's brute-force it. Pick a true bias, a number of flips, and α. We run a thousand independent experiments — each one flips the coin n times and applies the p < α test — and tally how often we correctly catch the rigged coin. That empirical power should land right on the curve's predicted power. Hit run and watch the two bars meet.

true p(heads) 0.55
flips n 783
α (×1000) 0.050
progress
predicted
simulated
experiments
0
caught
0
ready
Try the boundary case: with p = 0.55, n = 783, α = 0.05 the formula was tuned for 80% power — so about 800 of the 1000 experiments should catch the coin, and one in five should miss it even though it really is rigged. Now slide n down to 200 and watch the simulated power collapse: too few flips, and the rigged coin usually escapes.

🎁 The whole story in one breath

  • Assume innocence. Start from H0: "the coin is fair," and ask how surprising your data would be if that were true — that's the p-value.
  • Two errors, one tension. α controls false alarms; power (1 − β) controls catches. You can't shrink both without more flips.
  • One formula. n ≈ (zα/2√¼ + zβ√p(1−p))2 / (p − 0.5)2 turns "how sure" and "how rigged" into an exact flip budget.
  • Cost scales like 1/(p − ½)2. Obvious cheats are caught in a few hundred flips; subtle ones can demand tens of thousands. Detecting tiny bias is genuinely, quadratically expensive — and simulation confirms the math to the percent.

So next time someone insists their coin is fair, you don't have to argue — you can tell them exactly how many flips it'll take to find out, before you throw the first one.

Written on June 19, 2026