Data Science : Understanding the 3B’s (Bernoulli, Binomial, Beta) distributions for modelling Boolean events — An example with a Biased Coin toss

4 min readNov 5, 2022

An overview of Bernoulli, Binomial, Beta distribution Families for Boolean sequences

# img ref — https://www.shutterstock.com/image-vector/mathematics-galton-board-normal-distribution-gaussian-651531505

Q1) Is the coin biased towards Heads ?

Case (1) = ['H', 'H', 'T', 'H', 'T', 'T', 'H', 'H', 'T', 'H']We have obtained 6 Heads in 10 coin tosses. 
Can we say that this coin is biased towards heads ?

To check for coin bias, we need to calculate p-value using the hypotheses test.

Null Hypotheses : Coin is unbiased
Alternate Hypotheses : Coin is biased

“p-value is the probability of observing 6 heads in 10 coin tosses provided the coin is unbiased”

Therefore, p-value = Binomial Probability Mass Function with n = 10, k = 6 and p = 0.5

Implementing the same in code looks like —

(calculate factorial with dynamic programming, not recursion for faster compute)

On running this code for this example, we get p-value = 0.205

which basically means that there is a 20% chance of observing 6 heads in 10 coin tosses even if coin is unbiased.

Therefore, we do not have sufficient evidence to conclude that this coin is biased towards heads at a 5% significance level.

Let us take another case, the same coin but with 20 tosses.

Case (2) = ['H', 'H', 'T', 'T', 'H', 'T', 'T', 'T', 'H', 'H', 'H', 'H', 'H', 'H', 'T', 'H', 'H', 'H', 'H', 'H']20 tosses are performed with this coin. Can you say that this coin is biased towards heads ?

This time we get p-value = 0.036, which means that there is only a 3.6% chance of observing 14 heads in 20 coin tosses even if the coin is unbiased. Therefore, this being a rare occurence (5% is accepted significance level), we can reject our null hypotheses that coin is unbiased.

Therefore, this coin is biased given we have observed 14 heads in 20 tosses.

Q2) Now that coin is biased, how do we learn how much the coin is biased ?

We basically need to estimate the p in Bernouli(p). since a coin toss follows a boolean with probability(Heads) = p and probability(Tails) = 1 — p.

We can use Beta distribution to Model this by taking recurring feedback after each iteration. This is popularly called the Thompson Sampling algorithm.

Beta distribution is given by —

The beta distribution (alpha, beta) learns as follows — as we can see, as the trials increase, the variance reduces since the learner gets more certain on the p estimate where alpha = num of heads and beta = num of tails in our case.

We can see, that the curve has converged to p estimate of about 0.65 which tells us that the coin is biased by having a probability (head) = 65% and probability(tail) = 35%.

The learned probability(estimate) is the Expectation of the Beta distribution which can be estimated using the learnt alpha, beta parameters. Also, we can see that Variance reduces as (alpha + beta) increases which reduces the uncertainity in the estimate.

Below graph shows the sampled probability from the Beta distribution, we can clearly observe that at initial steps, there is a lot of stochasticity since the learner is still learning but over time, converges to a value around 0.65.

Q3) How many Heads can we expect in a given number of coin tosses now that we know the bias of a coin ?

This basically becomes a Binomial distribution.

We know, p = 0.65, Let n = 100 (where each of our simulation would be 100 coin tosses).

The num of Heads (x) that are obtained in (n) coin tosses with a coin bias of p = probability(head) follows a Binomial distribution —

# ref — https://www.cuemath.com/binomial-distribution-formula/

On running 10,000 simulations with (n,p) as the fixed parameters of the Binomial distribution, we obtain the distribution of ‘x’ which is seen below.

But is this a normal distribution ? We can verify this by plotting QQ — plot (Quantile — Quantile plot).

This shows that the sample quantiles are more or less near normal distribution except for near the edges.

Data Science : Understanding the 3B’s (Bernoulli, Binomial, Beta) distributions for modelling Boolean events — An example with a Biased Coin toss

Q1) Is the coin biased towards Heads ?

“p-value is the probability of observing 6 heads in 10 coin tosses provided the coin is unbiased”

Q2) Now that coin is biased, how do we learn how much the coin is biased ?

Q3) How many Heads can we expect in a given number of coin tosses now that we know the bias of a coin ?

Written by Debayan Mitra

Responses (1)