Classwise Concept with Examples
6th	7th	8th	9th	10th	11th	12th

Class 12th Chapters
1. Relations and Functions	2. Inverse Trigonometric Functions	3. Matrices
4. Determinants	5. Continuity and Differentiability	6. Application of Derivatives
7. Integrals	8. Application of Integrals	9. Differential Equations
10. Vector Algebra	11. Three Dimensional Geometry	12. Linear Programming
13. Probability

Content On This Page
Conditional Probability	Multiplication Theorem on Probability	The Law of Total Probability
Baye’s Theorem	Random Variables	Mean and Variance of Probability Distribution
Binomial Experiment	Mean and Variance of Binomial Distribution

Chapter 13 Probability (Concepts)

Welcome to this culminating chapter on Probability, which significantly builds upon and formalizes the concepts introduced in Class 11. We transition from the foundational classical and experimental definitions towards a more sophisticated analysis involving dependent events, probabilistic inference, and the study of numerical outcomes of random phenomena through random variables and their distributions. This chapter equips you with advanced tools essential for understanding uncertainty and modeling stochastic processes across diverse fields like statistics, finance, science, and engineering.

We begin by exploring the crucial concept of Conditional Probability. This addresses how the probability of an event occurring changes when we know that another related event has already happened. The conditional probability of event $A$ occurring given that event $B$ has already occurred is denoted by $P(A|B)$ (read as "the probability of A given B") and is formally calculated as: $$ \mathbf{P(A|B) = \frac{P(A \cap B)}{P(B)}} $$ provided that the probability of the given event $B$ is not zero ($P(B) \neq 0$). Understanding conditional probability allows us to refine our probability assessments based on new information. Directly derived from this is the Multiplication Rule of Probability, which helps calculate the probability of the intersection of events: $P(A \cap B) = P(B) \times P(A|B) = P(A) \times P(B|A)$. This rule naturally extends to finding the probability of the intersection of multiple events.

Contrasting with dependence, we define Independent Events. Two events $A$ and $B$ are considered independent if the occurrence (or non-occurrence) of one event has absolutely no effect on the probability of the occurrence of the other. Mathematically, this translates to $P(A|B) = P(A)$ and $P(B|A) = P(B)$. The defining condition for independence simplifies the multiplication rule to $\mathbf{P(A \cap B) = P(A) \times P(B)}$. This concept extends to the mutual independence of three or more events.

For scenarios involving multiple possibilities leading to an event, the Theorem of Total Probability provides a powerful calculation tool. If events $E_1, E_2, \dots, E_n$ form a partition of the sample space $S$ (meaning they are mutually exclusive and their union is $S$), then for any event $A$ associated with $S$, its probability can be calculated as the sum of probabilities of $A$ occurring through each partition: $$ \mathbf{P(A) = \sum_{i=1}^{n} P(E_i)P(A|E_i)} $$ This theorem is the foundation for the celebrated Bayes' Theorem. Bayes' theorem is fundamental for statistical inference, allowing us to 'reverse' conditional probabilities. It enables us to calculate the probability of an initial event (cause) $E_i$ given that a subsequent event (effect) $A$ has occurred, using the prior probabilities $P(E_i)$ and the conditional probabilities $P(A|E_i)$. The formula is: $$ \mathbf{P(E_i|A) = \frac{P(E_i)P(A|E_i)}{\sum_{j=1}^{n} P(E_j)P(A|E_j)}} $$ This theorem is widely used in areas like medical diagnosis, spam filtering, and machine learning to update beliefs based on observed evidence.

We then shift focus to quantifying the outcomes of random experiments numerically by introducing Random Variables. A random variable $X$ is a variable whose value is a numerical outcome resulting from a random phenomenon. Associated with each discrete random variable is its Probability Distribution, which lists all possible values the variable can take and their corresponding probabilities, $P(X=x_i)$. A valid probability distribution must satisfy two conditions: $P(X=x_i) \ge 0$ for all $i$, and $\mathbf{\sum\limits_{i} P(X=x_i) = 1}$. We learn to calculate key summary statistics for discrete random variables:

Mean or Expected Value ($\mu$ or $E(X)$): Represents the long-run average value. $\mathbf{E(X) = \mu = \sum\limits_{i} x_i P(X=x_i)}$.
Variance ($\sigma^2$ or $Var(X)$): Measures the spread or variability of the distribution around the mean. $\mathbf{Var(X) = \sigma^2 = \sum\limits_{i} (x_i - \mu)^2 P(X=x_i) = E(X^2) - [E(X)]^2}$.
Standard Deviation ($\sigma$): The positive square root of the variance ($\sigma = \sqrt{Var(X)}$), providing a measure of spread in the original units.

Finally, we study a particularly important type of discrete probability distribution arising from specific kinds of experiments called Bernoulli Trials. A Bernoulli trial is characterized by having only two possible outcomes (often labeled 'success' and 'failure'), being independent of other trials, and having a constant probability of success ($p$) across all trials. The Binomial Distribution models the probability of obtaining exactly $k$ successes in a fixed number ($n$) of independent Bernoulli trials. The probability mass function is given by: $$ \mathbf{P(X=k) = {^nC_k} p^k q^{n-k}} \quad (\text{for } k = 0, 1, 2, \dots, n) $$ where ${^nC_k} = \binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient, $p$ is the probability of success on a single trial, and $q = 1-p$ is the probability of failure. We also learn the formulas for the mean ($\mu = np$) and variance ($\sigma^2 = npq$) of a binomial distribution. This chapter provides a robust framework for analyzing probabilistic situations involving conditioning, inference, and repeated independent trials.

Conditional Probability

In probability theory, often we are interested in the probability of an event occurring given that some other event has already happened. This is where the concept of conditional probability comes into play. It measures the probability of an event under the assumption that another event has occurred.

For instance, consider drawing two cards from a deck without replacement. The probability of drawing a second Ace depends on whether the first card drawn was an Ace or not. This dependency is captured by conditional probability.

Definition of Conditional Probability

Let S be the sample space of a random experiment, and let E and F be two events associated with S. The conditional probability of event E occurring given that event F has already occurred is denoted by $P(E|F)$.

It is defined as the ratio of the probability of the intersection of events E and F ($E \cap F$) to the probability of event F, provided that the probability of event F is not zero.

P(E|F) = $\frac{P(E \cap F)}{P(F)}$, provided $P(F) \neq 0$

... (i)

Similarly, the conditional probability of event F occurring given that event E has already occurred is denoted by $P(F|E)$, and is defined as:

P(F|E) = $\frac{P(F \cap E)}{P(E)}$, provided $P(E) \neq 0$

... (ii)

In these definitions:

$P(E|F)$ is read as "the probability of E given F".
$E \cap F$ represents the event where both E and F occur. $P(E \cap F)$ is the probability of the joint occurrence of E and F.
$P(F)$ is the probability of event F occurring. It is the condition under which the probability of E is being calculated. The condition $P(F) \neq 0$ is necessary because division by zero is undefined.

The idea behind the formula is that when we are given that F has occurred, the sample space is effectively reduced to the outcomes in F. Out of these outcomes in F, we are interested in those where E also occurs, which are the outcomes in $E \cap F$. So, we are calculating the proportion of outcomes in F that are also in $E \cap F$.

Alternative Definition using Number of Outcomes (for Equally Likely Outcomes)

If the sample space S is finite and all outcomes are equally likely, then the probability of any event A is given by $P(A) = \frac{\text{Number of outcomes favourable to A}}{\text{Total number of outcomes in S}} = \frac{n(A)}{n(S)}$.

In this case, the formula for conditional probability can be expressed in terms of the number of outcomes:

P(E|F) = $\frac{P(E \cap F)}{P(F)} = \frac{n(E \cap F)/n(S)}{n(F)/n(S)}$

Assuming $n(S) \neq 0$ and $n(F) \neq 0$, we can cancel $n(S)$ from the numerator and denominator:

P(E|F) = $\frac{n(E \cap F)}{n(F)}$, provided $n(F) \neq 0$

... (iii)

This form directly illustrates the reduced sample space concept: we are counting the outcomes common to E and F relative to the total number of outcomes in F.

Properties of Conditional Probability

Conditional probability satisfies the basic axioms of probability and has several important properties:

Let S be the sample space and E, F, G be events of S. Assume $P(F) \neq 0$.

Property 1

$P(S|F) = 1$ and $P(F|F) = 1$.

Proof:

P(S|F) = $\frac{P(S \cap F)}{P(F)}$

[By definition of conditional probability]

Since S is the sample space, $S \cap F = F$.

P(S|F) = $\frac{P(F)}{P(F)} = 1$

Similarly, for $P(F|F)$:

P(F|F) = $\frac{P(F \cap F)}{P(F)}$

[By definition]

Since $F \cap F = F$:

P(F|F) = $\frac{P(F)}{P(F)} = 1$

This property means that if we know for sure that event F has occurred, the probability of the entire sample space S occurring given F is 1, and the probability of F itself occurring given F is also 1.

Property 2 (Addition Theorem for Conditional Probability)

If A and B are any two events of a sample space S and F is an event of S such that $P(F) \neq 0$, then the conditional probability of the union of A and B given F is:

P((A $\cup$ B)|F) = P(A|F) + P(B|F) - P((A $\cap$ B)|F)

... (iv)

Proof:

P((A $\cup$ B)|F) = $\frac{P((A \cup B) \cap F)}{P(F)}$

[By definition]

Using the distributive law of set intersection over union, $(A \cup B) \cap F = (A \cap F) \cup (B \cap F)$.

P((A $\cup$ B)|F) = $\frac{P((A \cap F) \cup (B \cap F))}{P(F)}$

Using the Addition Theorem of probability for events $X = (A \cap F)$ and $Y = (B \cap F)$, $P(X \cup Y) = P(X) + P(Y) - P(X \cap Y)$:

$= \frac{P(A \cap F) + P(B \cap F) - P((A \cap F) \cap (B \cap F))}{P(F)}$

Note that $(A \cap F) \cap (B \cap F) = A \cap F \cap B \cap F = A \cap B \cap F = (A \cap B) \cap F$.

$= \frac{P(A \cap F) + P(B \cap F) - P((A \cap B) \cap F)}{P(F)}$

Now, we can separate the terms by dividing by $P(F)$:

$= \frac{P(A \cap F)}{P(F)} + \frac{P(B \cap F)}{P(F)} - \frac{P((A \cap B) \cap F)}{P(F)}$

By the definition of conditional probability, these terms are $P(A|F)$, $P(B|F)$, and $P((A \cap B)|F)$.

P((A $\cup$ B)|F) = P(A|F) + P(B|F) - P((A $\cap$ B)|F)

Special Case: If A and B are disjoint events (mutually exclusive), then $A \cap B = \emptyset$. This implies $(A \cap B) \cap F = \emptyset \cap F = \emptyset$, and $P((A \cap B) \cap F) = P(\emptyset) = 0$.

In this case, the formula simplifies to:

P((A $\cup$ B)|F) = P(A|F) + P(B|F)

Property 3 (Complement Rule for Conditional Probability)

Let E be an event and E' be its complement. If $P(F) \neq 0$, then the conditional probability of the complement of E given F is:

P(E'|F) = 1 - P(E|F)

... (v)

Proof:

We know that the sample space S can be written as the union of an event E and its complement E', i.e., $S = E \cup E'$. Events E and E' are mutually exclusive (disjoint).

Using Property 1, $P(S|F) = 1$.

Using Property 2 for disjoint events $(E \cup E')$, we have:

P(S|F) = P((E $\cup$ E')|F) = P(E|F) + P(E'|F)

Substituting $P(S|F) = 1$:

1 = P(E|F) + P(E'|F)

Rearranging the terms to solve for $P(E'|F)$:

P(E'|F) = 1 - P(E|F)

This property is analogous to the complement rule in basic probability.

Example 1. A die is thrown. Let E be the event 'the number appearing is greater than 4' and F be the event 'the number appearing is even'. Find $P(E|F)$.

Answer:

Given problem involves a single throw of a standard six-sided die.

The sample space S for this experiment is the set of all possible outcomes:

S = $\{1, 2, 3, 4, 5, 6\}$

The total number of outcomes in the sample space is $n(S) = 6$.

The event E is 'the number appearing is greater than 4'. The outcomes favorable to E are the numbers 5 and 6.

E = $\{5, 6\}$

The number of outcomes favorable to E is $n(E) = 2$.

The event F is 'the number appearing is even'. The outcomes favorable to F are the even numbers in the sample space:

F = $\{2, 4, 6\}$

The number of outcomes favorable to F is $n(F) = 3$.

We need to find $P(E|F)$, the conditional probability of event E occurring given that event F has already occurred. This means we want the probability of getting a number greater than 4, given that the number obtained is even.

First, let's find the intersection of events E and F, which represents the outcomes common to both events.

E $\cap$ F = $\{5, 6\} \cap \{2, 4, 6\}$

E $\cap$ F = $\{6\}$

The number of outcomes favorable to $E \cap F$ is $n(E \cap F) = 1$.

Since all outcomes in the sample space S are equally likely, we can calculate the probabilities of $E \cap F$ and F:

$P(E \cap F) = \frac{n(E \cap F)}{n(S)} = \frac{1}{6}$

$P(F) = \frac{n(F)}{n(S)} = \frac{3}{6} = \frac{1}{2}$

Now, we use the definition of conditional probability $P(E|F) = \frac{P(E \cap F)}{P(F)}$, since $P(F) = \frac{1}{2} \neq 0$.

$P(E|F) = \frac{1/6}{1/2}$

$P(E|F) = \frac{1}{6} \times \frac{2}{1} = \frac{2}{6} = \frac{1}{3}$

... (1)

Alternate Approach using Reduced Sample Space:

Given that event F has occurred, the possible outcomes are restricted to the elements of F. The reduced sample space is effectively F = $\{2, 4, 6\}$. The size of this reduced sample space is $n(F) = 3$.

Within this reduced sample space F, we are interested in the outcomes that are also in event E (number greater than 4). The outcomes in F that are also in E are the elements of $E \cap F$, which is $\{6\}$. The number of favorable outcomes in this reduced sample space is $n(E \cap F) = 1$.

Using the concept of the reduced sample space (which is valid for equally likely outcomes):

$P(E|F) = \frac{\text{Number of outcomes in } E \cap F}{\text{Number of outcomes in } F}$

$= \frac{n(E \cap F)}{n(F)} = \frac{1}{3}$

... (2)

Both methods yield the same result. The probability of the number appearing being greater than 4, given that the number appearing is even, is $\frac{1}{3}$.

Multiplication Theorem on Probability

The Multiplication Theorem on Probability is a fundamental rule that allows us to calculate the probability of the joint occurrence of two or more events. It is a direct consequence of the definition of conditional probability. This theorem is particularly useful when dealing with sequences of events where the outcome of one event affects the probability of the subsequent events, such as drawing cards without replacement or drawing balls from an urn.

Statement of the Multiplication Theorem for Two Events

For any two events E and F associated with a sample space S, the probability of the simultaneous occurrence of both events E and F (denoted by $E \cap F$) is given by:

P(E $\cap$ F) = P(F) $\cdot$ P(E|F), provided $P(F) \neq 0$

... (i)

Alternatively, we can write:

P(E $\cap$ F) = P(E) $\cdot$ P(F|E), provided $P(E) \neq 0$

... (ii)

These two statements are equivalent because the event $E \cap F$ is the same as $F \cap E$, so $P(E \cap F) = P(F \cap E)$. The theorem states that the probability that both E and F occur is the probability that F occurs, multiplied by the conditional probability that E occurs given that F has already occurred (or vice versa).

Derivation from Conditional Probability

The Multiplication Theorem is derived directly from the definition of conditional probability.

From the definition of conditional probability of E given F (Equation (i) from the previous section), we have:

P(E|F) = $\frac{P(E \cap F)}{P(F)}$, where $P(F) \neq 0$

Multiplying both sides by $P(F)$, we get:

P(E $\cap$ F) = P(F) $\cdot$ P(E|F)

Similarly, from the definition of conditional probability of F given E (Equation (ii) from the previous section), we have:

P(F|E) = $\frac{P(F \cap E)}{P(E)}$, where $P(E) \neq 0$

Multiplying both sides by $P(E)$, we get:

P(F $\cap$ E) = P(E) $\cdot$ P(F|E)

Since $E \cap F$ and $F \cap E$ represent the same event, $P(E \cap F) = P(F \cap E)$. Thus, both forms of the Multiplication Theorem are valid.

Multiplication Theorem for Three or More Events

The Multiplication Theorem can be extended to find the probability of the joint occurrence of three or more events.

For three events E, F, and G associated with a sample space S, the probability of the event $E \cap F \cap G$ (E, F, and G occur) is given by:

P(E $\cap$ F $\cap$ G) = P(E) $\cdot$ P(F|E) $\cdot$ P(G|E $\cap$ F)

... (iii)

provided $P(E) \neq 0$ and $P(E \cap F) \neq 0$. This formula means the probability of E, F, and G occurring is the probability of E, times the probability of F given E, times the probability of G given that both E and F have occurred.

In general, for $n$ events $E_1, E_2, \dots, E_n$ associated with a sample space S, the probability of their simultaneous occurrence is:

P(E$_1 \cap$ E$_2 \cap \dots \cap$ E$_n$) = P(E$_1$) $\cdot$ P(E$_2$|E$_1$) $\cdot$ P(E$_3$|E$_1 \cap$ E$_2$) $\dots$ P(E$_n$|E$_1 \cap$ E$_2 \cap \dots \cap$ E$_{n-1}$)

... (iv)

provided that $P(E_1 \cap E_2 \cap \dots \cap E_{k-1}) \neq 0$ for all $k = 2, 3, \dots, n$. Each conditional probability in the sequence is conditioned on the occurrence of all preceding events in the intersection.

Independent Events

The Multiplication Theorem takes a particularly simple form when the events involved are independent.

Two events E and F are said to be independent if the occurrence or non-occurrence of event F does not affect the probability of event E, and vice versa.

Mathematically, two events E and F are independent if and only if:

P(E|F) = P(E), provided $P(F) \neq 0$

and

P(F|E) = P(F), provided $P(E) \neq 0$

If E and F are independent events, substituting $P(E|F) = P(E)$ into the Multiplication Theorem $P(E \cap F) = P(F) \cdot P(E|F)$, we get:

P(E $\cap$ F) = P(F) $\cdot$ P(E)

Similarly, substituting $P(F|E) = P(F)$ into $P(E \cap F) = P(E) \cdot P(F|E)$, we get the same result:

P(E $\cap$ F) = P(E) $\cdot$ P(F)

... (v) [Condition for independence of E and F]

This is the fundamental condition for the independence of two events. Events E and F are independent if and only if the probability of their intersection is equal to the product of their individual probabilities. If $P(E \cap F) \neq P(E)P(F)$, then the events are dependent.

Note that independent events are different from mutually exclusive events. Mutually exclusive events cannot occur at the same time ($E \cap F = \emptyset$, so $P(E \cap F) = 0$). If $P(E) \neq 0$ and $P(F) \neq 0$, then for mutually exclusive events, $P(E \cap F) = 0$, while $P(E)P(F) \neq 0$. Thus, non-null mutually exclusive events are always dependent.

Independence of Three or More Events

Three events E, F, and G are said to be mutually independent if they are pairwise independent and the probability of their intersection is the product of their individual probabilities. That is, they must satisfy all the following conditions:

P(E $\cap$ F) = P(E)P(F)

P(F $\cap$ G) = P(F)P(G)

P(E $\cap$ G) = P(E)P(G)

P(E $\cap$ F $\cap$ G) = P(E)P(F)P(G)

Pairwise independence does not necessarily imply mutual independence for three or more events. All four conditions must hold for E, F, and G to be mutually independent.

Example 1. An urn contains 10 black and 5 white balls. Two balls are drawn from the urn one after the other without replacement. What is the probability that both drawn balls are black?

Answer:

Given problem involves drawing balls without replacement, which means the events are dependent.

Let B$_1$ be the event that the first ball drawn is black.

Let B$_2$ be the event that the second ball drawn is black.

We want to find the probability that both balls drawn are black, which is the probability of the intersection of B$_1$ and B$_2$, denoted by $P(\text{B}_1 \cap \text{B}_2)$.

Total number of balls initially in the urn = Number of black balls + Number of white balls = 10 + 5 = 15.

The probability of drawing a black ball on the first draw (Event B$_1$) is:

P(B$_1$) = $\frac{\text{Number of black balls}}{\text{Total number of balls}} = \frac{10}{15} = \frac{2}{3}$

... (1)

After the first ball drawn is black (since it's without replacement), there are now 9 black balls remaining in the urn, and the total number of balls remaining is $15 - 1 = 14$.

The probability that the second ball drawn is black, given that the first ball drawn was black (Event B$_2$ given B$_1$), is:

P(B$_2$|B$_1$) = $\frac{\text{Number of black balls remaining}}{\text{Total number of balls remaining}} = \frac{9}{14}$

... (2)

Using the Multiplication Theorem on probability for two events (Equation i):

P(B$_1 \cap$ B$_2$) = P(B$_1$) $\cdot$ P(B$_2$|B$_1$)

Substitute the probabilities from (1) and (2):

P(B$_1 \cap$ B$_2$) = $\frac{2}{3} \times \frac{9}{14}$

Simplify the expression:

$= \frac{\cancel{2}^{1}}{\cancel{3}_{1}} \times \frac{\cancel{9}^{3}}{\cancel{14}_{7}} = \frac{1 \times 3}{1 \times 7} = \frac{3}{7}$

The probability that both drawn balls are black is $\frac{3}{7}$.

Example 2. Three cards are drawn successively, without replacement from a pack of 52 well-shuffled cards. What is the probability that the first card is a king, the second card is a queen, and the third card is a jack?

Answer:

This problem involves a sequence of three events without replacement, indicating dependence between the events.

Let K be the event that the first card drawn is a king.

Let Q be the event that the second card drawn is a queen.

Let J be the event that the third card drawn is a jack.

We want to find the probability of the intersection of these three events, $P(K \cap Q \cap J)$.

A standard deck of 52 cards has 4 kings, 4 queens, and 4 jacks.

Probability of the first card being a king (Event K):

Total cards = 52. Number of kings = 4.

P(K) = $\frac{\text{Number of kings}}{\text{Total number of cards}} = \frac{4}{52} = \frac{1}{13}$

... (1)

Probability of the second card being a queen, given the first was a king (Event Q given K):

After drawing one king without replacement, there are $52 - 1 = 51$ cards remaining. The number of queens is still 4.

P(Q|K) = $\frac{\text{Number of queens}}{\text{Total number of cards remaining}} = \frac{4}{51}$

... (2)

Probability of the third card being a jack, given the first was a king and the second was a queen (Event J given K $\cap$ Q):

After drawing one king and one queen without replacement, there are $51 - 1 = 50$ cards remaining. The number of jacks is still 4.

P(J|K $\cap$ Q) = $\frac{\text{Number of jacks}}{\text{Total number of cards remaining}} = \frac{4}{50} = \frac{2}{25}$

... (3)

Using the Multiplication Theorem for three events (Equation iii):

P(K $\cap$ Q $\cap$ J) = P(K) $\cdot$ P(Q|K) $\cdot$ P(J|K $\cap$ Q)

Substitute the probabilities from (1), (2), and (3):

P(K $\cap$ Q $\cap$ J) = $\frac{1}{13} \times \frac{4}{51} \times \frac{2}{25}$

Calculate the product:

$= \frac{1 \times 4 \times 2}{13 \times 51 \times 25} = \frac{8}{16575}$

... (4)

The probability that the first card is a king, the second is a queen, and the third is a jack, when drawn successively without replacement, is $\frac{8}{16575}$.

The Law of Total Probability

The Law of Total Probability is a fundamental theorem in probability that expresses the total probability of an event as the sum of probabilities of that event occurring under several mutually exclusive and exhaustive conditions. It's a powerful tool for breaking down complex probability problems into simpler, conditional probabilities. This law is the foundation for Bayes' Theorem.

Before stating the law, we need to understand the concept of a partition of the sample space.

Partition of a Sample Space

A set of events $E_1, E_2, \dots, E_n$ associated with a sample space S is said to constitute a partition of the sample space S if they satisfy the following three conditions:

1. Mutually Exclusive Events

The events are pairwise disjoint. This means that no two events can occur at the same time.

E$_i \cap$ E$_j = \emptyset$ for all $i \neq j$, where $i, j \in \{1, 2, \dots, n\}$.

Consequently, $P(E_i \cap E_j) = 0$ for $i \neq j$.

2. Exhaustive Events

The union of all the events covers the entire sample space. This means that at least one of the events must occur in any trial of the experiment.

E$_1 \cup$ E$_2 \cup \dots \cup$ E$_n = \text{S}$

Consequently, $P(E_1 \cup E_2 \cup \dots \cup E_n) = P(S) = 1$.

3. Non-zero Probabilities

Each event in the partition must have a non-zero probability of occurring.

P(E$_i$) > 0 for all $i = 1, 2, \dots, n$.

In simpler terms, a partition divides the sample space into distinct, non-overlapping regions that collectively cover the entire sample space. Any outcome of the experiment must fall into exactly one of these events $E_i$.

Statement of the Law of Total Probability

Let $E_1, E_2, \dots, E_n$ be a partition of the sample space S. Let A be any event associated with S. Then the probability of event A occurring, $P(A)$, is given by the sum of the probabilities of the intersection of A with each event in the partition:

P(A) = $\sum_{j=1}^{n} P(E_j \cap A)$

Using the Multiplication Theorem on probability, we know that $P(E_j \cap A) = P(E_j) \cdot P(A|E_j)$, assuming $P(E_j) > 0$ (which is guaranteed by the definition of a partition).

Substituting this into the sum, we get the most common form of the Law of Total Probability:

P(A) = $\sum_{j=1}^{n} P(E_j) P(A|E_j)$

... (i)

This means that the probability of event A is the weighted average of the conditional probabilities of A given each event $E_j$, where the weights are the probabilities of the events $E_j$ themselves.

In expanded form:

P(A) = $P(E_1) P(A|E_1) + P(E_2) P(A|E_2) + \dots + P(E_n) P(A|E_n)$

Derivation of the Law of Total Probability

The derivation relies on the properties of partitions and the Addition Theorem for mutually exclusive events.

Since $E_1, E_2, \dots, E_n$ form a partition of the sample space S, their union is S: $S = E_1 \cup E_2 \cup \dots \cup E_n$.

Any event A can be expressed as the intersection of A with the sample space S. We can substitute the union of the partition events for S:

A = A $\cap$ S = A $\cap$ (E$_1 \cup$ E$_2 \cup \dots \cup$ E$_n$)

Using the distributive law of set intersection over union, we can distribute A across the union:

A = (A $\cap$ E$_1$) $\cup$ (A $\cap$ E$_2$) $\cup$ $\dots$ $\cup$ (A $\cap$ E$_n$)

Now, consider the events $(A \cap E_1), (A \cap E_2), \dots, (A \cap E_n)$. Since the events $E_1, E_2, \dots, E_n$ are mutually exclusive ($E_i \cap E_j = \emptyset$ for $i \neq j$), the events $(A \cap E_i)$ are also mutually exclusive. If there's no overlap between $E_i$ and $E_j$, there can be no overlap between the part of A in $E_i$ and the part of A in $E_j$. $(A \cap E_i) \cap (A \cap E_j) = A \cap (E_i \cap E_j) = A \cap \emptyset = \emptyset$.

Since the events $(A \cap E_1), (A \cap E_2), \dots, (A \cap E_n)$ are mutually exclusive, the probability of their union is the sum of their individual probabilities (Addition Theorem for mutually exclusive events):

P(A) = P((A $\cap$ E$_1$) $\cup$ (A $\cap$ E$_2$) $\cup$ $\dots$ $\cup$ (A $\cap$ E$_n$))

P(A) = P(A $\cap$ E$_1$) + P(A $\cap$ E$_2$) + $\dots$ + P(A $\cap$ E$_n$)

P(A) = $\sum_{j=1}^{n} P(A \cap E_j)$

... (ii)

Finally, using the Multiplication Theorem for two events, $P(A \cap E_j) = P(E_j \cap A) = P(E_j) \cdot P(A|E_j)$ (since $P(E_j) > 0$).

Substituting this into equation (ii):

P(A) = $\sum_{j=1}^{n} P(E_j) P(A|E_j)$

This completes the derivation of the Law of Total Probability.

Example 1. A bag contains 3 red and 7 black balls. Another bag contains 8 red and 2 black balls. A bag is selected at random, and then a ball is drawn from the selected bag. What is the probability that the ball drawn is red?

Answer:

Let's define the events based on the problem description.

There are two possible bags that can be selected.

Let B$_1$ be the event that the first bag is selected.

Let B$_2$ be the event that the second bag is selected.

Since a bag is selected at random, and there are two bags, the probabilities of selecting each bag are equal:

P(B$_1$) = $\frac{1}{2}$

[Two bags, selected at random]

P(B$_2$) = $\frac{1}{2}$

[Two bags, selected at random]

These two events, B$_1$ and B$_2$, form a partition of the sample space of selecting a bag. They are mutually exclusive (you select either Bag 1 OR Bag 2, not both at the same time), exhaustive (you must select one of the two bags), and have non-zero probabilities.

Let R be the event that the ball drawn from the selected bag is red.

We are interested in finding the total probability of drawing a red ball, $P(R)$. A red ball can be drawn either from Bag 1 or from Bag 2. This suggests using the Law of Total Probability.

We need the conditional probabilities of drawing a red ball given which bag was selected:

Probability of drawing a red ball given that Bag 1 was selected (Event R given B$_1$):

Bag 1 contains 3 red balls and 7 black balls, making a total of $3 + 7 = 10$ balls.

P(R|B$_1$) = $\frac{\text{Number of red balls in Bag 1}}{\text{Total number of balls in Bag 1}} = \frac{3}{10}$

... (1)

Probability of drawing a red ball given that Bag 2 was selected (Event R given B$_2$):

Bag 2 contains 8 red balls and 2 black balls, making a total of $8 + 2 = 10$ balls.

P(R|B$_2$) = $\frac{\text{Number of red balls in Bag 2}}{\text{Total number of balls in Bag 2}} = \frac{8}{10} = \frac{4}{5}$

... (2)

Using the Law of Total Probability (Equation i) with $n=2$ events (B$_1$, B$_2$) forming the partition:

P(R) = P(B$_1$) $\cdot$ P(R|B$_1$) + P(B$_2$) $\cdot$ P(R|B$_2$)

Substitute the probabilities calculated:

P(R) = $\left(\frac{1}{2}\right) \left(\frac{3}{10}\right) + \left(\frac{1}{2}\right) \left(\frac{4}{5}\right)$

P(R) = $\frac{1 \times 3}{2 \times 10} + \frac{1 \times 4}{2 \times 5}$

P(R) = $\frac{3}{20} + \frac{4}{10}$

To add the fractions, find a common denominator, which is 20.

P(R) = $\frac{3}{20} + \frac{4 \times 2}{10 \times 2} = \frac{3}{20} + \frac{8}{20}$

P(R) = $\frac{3 + 8}{20} = \frac{11}{20}$

... (3)

The total probability that the ball drawn from the randomly selected bag is red is $\frac{11}{20}$.

Baye’s Theorem

Bayes' Theorem, named after Thomas Bayes, is a cornerstone of statistical inference. It provides a mathematical formula to update the probability of a hypothesis based on new evidence. In the context of probability, it allows us to calculate a "reversed" conditional probability, i.e., finding $P(E_i|A)$ when we know $P(A|E_i)$. This is particularly useful in scenarios where we observe an outcome (event A) and want to determine the probability that this outcome was a result of a specific cause or condition (event $E_i$).

Bayes' Theorem is derived directly from the definition of conditional probability and the Law of Total Probability.

Statement of Baye’s Theorem

Let $E_1, E_2, \dots, E_n$ be a set of $n$ events that constitute a partition of the sample space S. Recall that this means the events are mutually exclusive ($E_i \cap E_j = \emptyset$ for $i \neq j$), exhaustive ($E_1 \cup E_2 \cup \dots \cup E_n = \text{S}$), and have non-zero probabilities ($P(E_i) > 0$ for all $i$).

Let A be any event associated with S such that $P(A) \neq 0$.

For any specific event $E_i$ from the partition (where $i$ is any index from 1 to $n$), the conditional probability of $E_i$ occurring given that event A has occurred is given by Bayes' Theorem:

P(E$_i$|A) = $\frac{P(E_i \cap A)}{P(A)}$

[By definition of conditional probability]

Now, we use the Multiplication Theorem on probability to express the term in the numerator, $P(E_i \cap A)$. According to the Multiplication Theorem, $P(E_i \cap A) = P(E_i) \cdot P(A|E_i)$, provided $P(E_i) \neq 0$ (which is true since $E_i$ is part of the partition).

So, the numerator becomes $P(E_i) \cdot P(A|E_i)$.

Next, we use the Law of Total Probability to express the term in the denominator, $P(A)$. According to the Law of Total Probability, since $E_1, E_2, \dots, E_n$ form a partition, the probability of event A is the sum of the probabilities of A occurring with each event $E_j$:

P(A) = $\sum_{j=1}^{n} P(E_j \cap A)$

Using the Multiplication Theorem again, $P(E_j \cap A) = P(E_j) \cdot P(A|E_j)$. So, the denominator becomes:

P(A) = $\sum_{j=1}^{n} P(E_j) P(A|E_j)$

Substituting the expressions for the numerator and the denominator back into the definition of conditional probability $P(E_i|A)$, we obtain Bayes' Theorem:

P(E$_i$|A) = $\frac{P(E_i) P(A|E_i)}{\sum_{j=1}^{n} P(E_j) P(A|E_j)}$

... (i)

This is the general form of Bayes' Theorem. For the case of just two events $E$ and $E'$ forming a partition (i.e., $E'$ is the complement of $E$, $P(E) + P(E') = 1$), the formula simplifies. Let $E_1 = E$ and $E_2 = E'$. Then the denominator is $P(E_1)P(A|E_1) + P(E_2)P(A|E_2) = P(E)P(A|E) + P(E')P(A|E')$.

For $i=1$, $P(E|A) = \frac{P(E)P(A|E)}{P(E)P(A|E) + P(E')P(A|E')}$.

For $i=2$, $P(E'|A) = \frac{P(E')P(A|E')}{P(E)P(A|E) + P(E')P(A|E')}$.

Interpretation of Terms in Bayes' Theorem

$P(E_i)$: This is the prior probability of event $E_i$. It represents our initial belief about the probability of $E_i$ occurring before we have any new evidence (before event A occurs).
$P(A|E_i)$: This is the likelihood. It is the probability of observing the evidence A, given that the event $E_i$ has occurred. This comes directly from the problem setup or data.
$P(A)$: This is the total probability of the evidence A. It acts as a normalising constant and is calculated using the Law of Total Probability.
$P(E_i|A)$: This is the posterior probability of event $E_i$. It represents the updated probability of $E_i$ occurring after observing the evidence A. Bayes' Theorem tells us how to revise our prior belief $P(E_i)$ into a posterior belief $P(E_i|A)$ based on the evidence A and its likelihood under different conditions $E_j$.

In essence, Bayes' Theorem allows us to move from $P(\text{Evidence}|\text{Cause})$ to $P(\text{Cause}|\text{Evidence})$.

Example 1. In a factory, Machine X produces 30% of the total output, Machine Y produces 20%, and Machine Z produces 50%. The percentage of defective items produced by X, Y, and Z are 1%, 2%, and 1% respectively. An item is selected at random and is found to be defective. What is the probability that it was produced by Machine X?

Answer:

Let's define the events:

Let E$_1$ be the event that the selected item was produced by Machine X.

Let E$_2$ be the event that the selected item was produced by Machine Y.

Let E$_3$ be the event that the selected item was produced by Machine Z.

These events E$_1$, E$_2$, and E$_3$ form a partition of the sample space because every item is produced by exactly one of the three machines, and the production percentages are given.

We are given the prior probabilities of these events (based on the percentage of total output):

P(E$_1$) = 30% = 0.30

[Prior probability of item being from Machine X]

P(E$_2$) = 20% = 0.20

[Prior probability of item being from Machine Y]

P(E$_3$) = 50% = 0.50

[Prior probability of item being from Machine Z]

Let A be the event that the selected item is defective. We are given the conditional probabilities of an item being defective, given the machine that produced it (these are the likelihoods):

P(A|E$_1$) = 1% = 0.01

[Probability of defective given produced by X]

P(A|E$_2$) = 2% = 0.02

[Probability of defective given produced by Y]

P(A|E$_3$) = 1% = 0.01

[Probability of defective given produced by Z]

We are given that a randomly selected item is found to be defective (event A has occurred). We want to find the probability that this defective item was produced by Machine X, i.e., we want to find the posterior probability $P(E_1|A)$.

We use Bayes' Theorem (Equation i) for $i=1$:

P(E$_1$|A) = $\frac{P(E_1) P(A|E_1)}{P(E_1) P(A|E_1) + P(E_2) P(A|E_2) + P(E_3) P(A|E_3)}$

First, let's calculate the denominator, which is the total probability of event A (the probability that a randomly selected item is defective) using the Law of Total Probability:

P(A) = P(E$_1$) P(A|E$_1$) + P(E$_2$) P(A|E$_2$) + P(E$_3$) P(A|E$_3$)

P(A) = $(0.30)(0.01) + (0.20)(0.02) + (0.50)(0.01)$

P(A) = $0.0030 + 0.0040 + 0.0050$

P(A) = $0.0120$

Now, substitute the values of $P(E_1)$, $P(A|E_1)$, and $P(A)$ into Bayes' Theorem formula for $P(E_1|A)$:

P(E$_1$|A) = $\frac{(0.30)(0.01)}{0.0120} = \frac{0.0030}{0.0120}$

To simplify the fraction, we can multiply the numerator and denominator by 10000 to remove the decimals:

P(E$_1$|A) = $\frac{0.0030 \times 10000}{0.0120 \times 10000} = \frac{30}{120}$

P(E$_1$|A) = $\frac{\cancel{30}^{1}}{\cancel{120}_{4}} = \frac{1}{4}$

... (1)

The probability that the defective item was produced by Machine X is $\frac{1}{4}$ or 0.25.

Random Variables

In probability, we often deal with numerical outcomes of random experiments. A random variable is a concept that allows us to associate a numerical value with each outcome in the sample space of a random experiment. It provides a way to quantify the results of chance events, making them amenable to mathematical analysis.

Definition of a Random Variable

Formally, a random variable, usually denoted by capital letters like X, Y, Z, etc., is a real-valued function defined on the sample space of a random experiment. Its domain is the sample space S, and its range is a subset of the real numbers ($\mathbb{R}$).

X : S $\to \mathbb{R}$

For every outcome $\omega$ in the sample space S, the random variable X assigns a unique real number $X(\omega)$. This number is a numerical description of the outcome.

Example: Consider the random experiment of tossing two fair coins simultaneously.

The sample space is $S = \{HH, HT, TH, TT\}$.

Let X be a random variable representing the "number of heads obtained". We can define the value of X for each outcome in S:

For the outcome HH (Head on first coin, Head on second coin), the number of heads is 2. So, $X(HH) = 2$.
For the outcome HT (Head on first coin, Tail on second coin), the number of heads is 1. So, $X(HT) = 1$.
For the outcome TH (Tail on first coin, Head on second coin), the number of heads is 1. So, $X(TH) = 1$.
For the outcome TT (Tail on first coin, Tail on second coin), the number of heads is 0. So, $X(TT) = 0$.

The set of all possible values that the random variable X can take is $\{0, 1, 2\}$.

Notice that different outcomes in the sample space can be mapped to the same value of the random variable. For instance, both HT and TH result in X=1.

The use of the term "random" in "random variable" signifies that the value of the variable is uncertain and depends on the outcome of the random experiment.

Types of Random Variables

Random variables are broadly classified into two types based on the set of values they can take:

1. Discrete Random Variable

A random variable is called a discrete random variable if its possible values are finite or countably infinite. Countably infinite means the values can be listed in a sequence, corresponding to the set of non-negative integers $\{0, 1, 2, 3, \dots\}$.

The values of a discrete random variable can often be thought of as counts.

Examples of discrete random variables:

The number of heads obtained when tossing a coin a fixed number of times.
The number of defective items in a sample from a production line.
The number of students present in a class on a given day.
The number of calls received by a call centre in an hour.

2. Continuous Random Variable

A random variable is called a continuous random variable if it can take any value within a given range or interval of real numbers. Its possible values are uncountable.

Continuous random variables often represent measurements.

Examples of continuous random variables:

The height of a student.
The weight of a package.
The time taken to complete a race.
The temperature of a room.

In Class 12 Mathematics, we primarily focus on discrete random variables and their distributions.

Probability Distribution of a Discrete Random Variable

The probability distribution of a discrete random variable is a description that gives the probability of each possible value of the random variable. It completely characterises the behaviour of the discrete random variable.

If X is a discrete random variable that can take distinct values $x_1, x_2, \dots, x_n$ (finite case) or $x_1, x_2, x_3, \dots$ (countably infinite case), then its probability distribution specifies the probability $P(X=x_i)$ for each possible value $x_i$.

Let $p_i$ denote the probability that the random variable X takes the value $x_i$.

P(X = $x_i$) = $p_i$

The probability distribution is often represented in a table or as a function (Probability Mass Function, PMF).

Representation as a Table:

Value of X ($x_i$)	$x_1$	$x_2$	...	$x_n$
Probability P(X = $x_i$) ($p_i$)	$p_1$	$p_2$	...	$p_n$

For the table to represent a valid probability distribution, the probabilities $p_i$ must satisfy two fundamental conditions:

1. Non-negativity of Probabilities

The probability of any specific value occurring must be between 0 and 1, inclusive.

0 $\leq p_i \leq 1$ for all $i = 1, 2, \dots, n$.

2. Sum of Probabilities

The sum of the probabilities of all possible values of the random variable must be equal to 1. This is because the set of all possible values of X covers all outcomes in the sample space (partitioning the sample space based on the value of X).

$\sum_{i=1}^{n} p_i = p_1 + p_2 + \dots + p_n = 1$

... (i)

(For a countably infinite set of values, the sum becomes an infinite series $\sum_{i=1}^{\infty} p_i = 1$).

Example 1. Two coins are tossed simultaneously. Let X be the random variable representing the number of tails. Find the probability distribution of X.

Answer:

The random experiment is tossing two coins simultaneously.

The sample space S consists of all possible outcomes:

S = $\{HH, HT, TH, TT\}$

Assuming the coins are fair, each outcome is equally likely, with probability $\frac{1}{4}$.

The random variable X represents the number of tails obtained in each outcome. Let's find the value of X for each outcome in S:

For HH: Number of tails = 0. So, $X(HH) = 0$.
For HT: Number of tails = 1. So, $X(HT) = 1$.
For TH: Number of tails = 1. So, $X(TH) = 1$.
For TT: Number of tails = 2. So, $X(TT) = 2$.

The possible values that the random variable X can take are 0, 1, and 2. X is a discrete random variable.

Now, we find the probability of each possible value of X:

$P(X=0)$: This occurs when the outcome is HH.

P(X=0) = P(\{HH\}) = $\frac{1}{4}$
$P(X=1)$: This occurs when the outcome is HT or TH.

P(X=1) = P(\{HT, TH\}) = P(\{HT\}) + P(\{TH\})$

$= \frac{1}{4} + \frac{1}{4} = \frac{2}{4} = \frac{1}{2}$
$P(X=2)$: This occurs when the outcome is TT.

P(X=2) = P(\{TT\}) = $\frac{1}{4}$

The probability distribution of the random variable X is the collection of these possible values and their corresponding probabilities. We can represent this in a table:

Value of X ($x_i$)	0	1	2
Probability P(X = $x_i$) ($p_i$)	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{1}{4}$

To verify this is a valid probability distribution, we check the two conditions:

All probabilities $p_i$ ($\frac{1}{4}, \frac{1}{2}, \frac{1}{4}$) are between 0 and 1. This condition is satisfied.
The sum of the probabilities is:

$\sum p_i = P(X=0) + P(X=1) + P(X=2) = \frac{1}{4} + \frac{1}{2} + \frac{1}{4}$

$= \frac{1}{4} + \frac{2}{4} + \frac{1}{4} = \frac{1+2+1}{4} = \frac{4}{4} = 1$

The sum of probabilities is 1. This condition is also satisfied.

Thus, the table above represents the complete probability distribution of the number of tails when tossing two coins.

Mean and Variance of Probability Distribution

Once we have the probability distribution of a discrete random variable, we can calculate certain numerical characteristics that summarise the distribution. The most important of these are the mean and the variance. These measures provide information about the central location and the spread or variability of the random variable's values.

Mean (Expected Value) of a Discrete Random Variable

The mean of a discrete random variable X is also known as its expected value or expectation. It is denoted by $E(X)$ or $\mu$ (the Greek letter mu). The expected value is a measure of the central tendency of the random variable. It represents the average value we would expect to obtain if we performed the random experiment a very large number of times.

If a discrete random variable X can take distinct values $x_1, x_2, \dots, x_n$ with corresponding probabilities $p_1, p_2, \dots, p_n$ such that $P(X=x_i) = p_i$, then the mean or expected value of X is defined as the sum of the products of each possible value and its probability:

E(X) = $\mu = \sum_{i=1}^{n} x_i p_i$

... (i)

In expanded form:

E(X) = $x_1 p_1 + x_2 p_2 + \dots + x_n p_n$

The concept of expected value is similar to the concept of the mean of a frequency distribution. If we consider the probabilities $p_i$ as relative frequencies, then the expected value is the weighted average of the values $x_i$.

Variance of a Discrete Random Variable

The variance of a discrete random variable X, denoted by $\text{Var}(X)$ or $\sigma^2$ (sigma squared), is a measure of the dispersion or spread of the possible values of X around its mean $\mu$. A high variance indicates that the values of X are widely spread from the mean, while a low variance indicates that the values are clustered closely around the mean.

The variance is defined as the expected value of the squared deviation of the random variable from its mean:

$\text{Var}(X) = E[(X - \mu)^2]$

Using the definition of expected value from equation (i), this can be written as:

$\text{Var}(X) = \sum_{i=1}^{n} (x_i - \mu)^2 p_i$

... (ii)

This formula calculates the weighted average of the squared differences between each possible value ($x_i$) and the mean ($\mu$), with probabilities ($p_i$) as weights. We square the deviations $(x_i - \mu)$ so that positive and negative deviations do not cancel each other out.

Alternative Formula for Variance:

An equivalent and often computationally easier formula for variance is:

$\text{Var}(X) = E(X^2) - [E(X)]^2$

... (iii)

where $E(X^2)$ is the expected value of the square of the random variable X, calculated as:

E(X$^2$) = $\sum_{i=1}^{n} x_i^2 p_i$

... (iv)

Derivation of the Alternative Formula:

$\text{Var}(X) = E[(X - \mu)^2] = E[X^2 - 2X\mu + \mu^2]$

Using the properties of expectation ($E(aX + bY) = aE(X) + bE(Y)$ and $E(c) = c$ for a constant c):

$\text{Var}(X) = E(X^2) - E(2X\mu) + E(\mu^2)$

$\text{Var}(X) = E(X^2) - 2\mu E(X) + \mu^2$

(Since $\mu$ is a constant)

Since $E(X) = \mu$:

$\text{Var}(X) = E(X^2) - 2\mu(\mu) + \mu^2$

$\text{Var}(X) = E(X^2) - 2\mu^2 + \mu^2$

$\text{Var}(X) = E(X^2) - \mu^2$

This derivation shows that $E(X^2) - [E(X)]^2$ is equivalent to the definition $\sum (x_i - \mu)^2 p_i$.

Standard Deviation

The standard deviation of a random variable X, denoted by $\sigma$ (sigma), is the positive square root of the variance:

$\sigma = \sqrt{\text{Var}(X)} = \sqrt{E(X^2) - [E(X)]^2}$

... (v)

Standard deviation is also a measure of spread, and it has the advantage of being in the same units as the random variable X and the mean $\mu$, making it easier to interpret than the variance (which is in square units).

Example 1. Find the mean and variance of the number of tails when two coins are tossed simultaneously. (The probability distribution was found in the previous example).

Answer:

From the previous example (I5, Example 1), the probability distribution of the random variable X (number of tails in two coin tosses) is given by the following table:

Value of X ($x_i$)	0	1	2
Probability P(X = $x_i$) ($p_i$)	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{1}{4}$

Calculating the Mean (Expected Value), E(X):

Using the formula $E(X) = \sum_{i=1}^{n} x_i p_i$ (Equation i):

E(X) = $(0) \cdot P(X=0) + (1) \cdot P(X=1) + (2) \cdot P(X=2)$

E(X) = $(0) \left(\frac{1}{4}\right) + (1) \left(\frac{1}{2}\right) + (2) \left(\frac{1}{4}\right)$

E(X) = $0 + \frac{1}{2} + \frac{2}{4}$

E(X) = $0 + \frac{1}{2} + \frac{1}{2}$

E(X) = $1$

... (1) [Mean number of tails]

The mean or expected number of tails when tossing two coins is 1.

Calculating the Variance (Var(X)):

We will use the alternative formula $\text{Var}(X) = E(X^2) - [E(X)]^2$ (Equation iii).

First, we need to calculate $E(X^2)$ using the formula $E(X^2) = \sum_{i=1}^{n} x_i^2 p_i$ (Equation iv):

E(X$^2$) = $(0)^2 \cdot P(X=0) + (1)^2 \cdot P(X=1) + (2)^2 \cdot P(X=2)$

E(X$^2$) = $(0)^2 \left(\frac{1}{4}\right) + (1)^2 \left(\frac{1}{2}\right) + (2)^2 \left(\frac{1}{4}\right)$

E(X$^2$) = $(0) \left(\frac{1}{4}\right) + (1) \left(\frac{1}{2}\right) + (4) \left(\frac{1}{4}\right)$

E(X$^2$) = $0 + \frac{1}{2} + 1$

E(X$^2$) = $\frac{1}{2} + \frac{2}{2} = \frac{3}{2}$

Now, substitute $E(X^2) = \frac{3}{2}$ and $\mu = E(X) = 1$ into the variance formula:

$\text{Var}(X) = E(X^2) - [E(X)]^2 = \frac{3}{2} - (1)^2$

$\text{Var}(X) = \frac{3}{2} - 1$

$\text{Var}(X) = \frac{3 - 2}{2} = \frac{1}{2}$

... (2) [Variance of the number of tails]

The variance of the number of tails is $\frac{1}{2}$. The standard deviation is $\sigma = \sqrt{\text{Var}(X)} = \sqrt{\frac{1}{2}} = \frac{1}{\sqrt{2}}$.

Binomial Experiment

In probability and statistics, a Binomial Experiment is a specific type of random experiment that consists of a fixed number of independent trials, where each trial has only two possible outcomes and the probability of success is constant for every trial. Such individual trials are known as Bernoulli trials. A binomial experiment is essentially a sequence of identical Bernoulli trials.

Conditions for a Binomial Experiment

A random experiment qualifies as a binomial experiment if and only if it satisfies the following four conditions:

1. Fixed Number of Trials

The experiment must consist of a predetermined and fixed number of identical trials. This number is usually denoted by $n$. The number of trials cannot be random; it must be set in advance.

2. Two Possible Outcomes

Each individual trial must have only two possible, mutually exclusive outcomes. These outcomes are conventionally labelled as "Success" (S) and "Failure" (F). For example, in a coin toss, Heads can be considered a success and Tails a failure (or vice versa).

3. Independent Trials

The outcome of any single trial must not influence or affect the outcome of any other trial. The trials are statistically independent. This is a crucial requirement. For example, drawing cards with replacement makes the draws independent, whereas drawing without replacement results in dependent trials.

4. Constant Probability of Success

The probability of "success", denoted by $p$, must remain the same for every trial in the experiment. Consequently, the probability of "failure", denoted by $q$, is also constant for every trial, and $q = 1 - p$.

If an experiment satisfies all these four conditions, it is a binomial experiment. The results of such an experiment can be analysed using the binomial probability distribution.

Binomial Distribution

When we perform a binomial experiment with $n$ trials and the probability of success in each trial is $p$, we are often interested in the total number of successes obtained in these $n$ trials. A discrete random variable X that represents the number of successes in such a binomial experiment is said to follow a Binomial Distribution.

A random variable X that follows a binomial distribution is denoted as $X \sim B(n, p)$. Here, $n$ is the parameter representing the number of trials, and $p$ is the parameter representing the probability of success in a single trial.

The possible values that the random variable X can take are integers from 0 (no successes) to $n$ (success in all trials), i.e., $k \in \{0, 1, 2, \dots, n\}$.

The probability of obtaining exactly $k$ successes in $n$ trials of a binomial experiment is given by the Probability Mass Function (PMF) of the binomial distribution:

P(X = k) = $\binom{n}{k} p^k (1-p)^{n-k}$

... (i)

for $k = 0, 1, 2, \dots, n$.

In this formula:

$n$: The total number of trials.
$k$: The specific number of successes we are interested in.
$p$: The probability of success in a single trial.
$(1-p)$ or $q$: The probability of failure in a single trial.
$\binom{n}{k} = \frac{n!}{k!(n-k)!}$: The binomial coefficient, which represents the number of different ways to choose $k$ trials out of $n$ trials to be successes. For example, if we have 3 trials (T1, T2, T3) and want 2 successes (S) and 1 failure (F), the possibilities are SSF, SFS, FSS. The number of ways is $\binom{3}{2} = \frac{3!}{2!1!} = 3$.

The term $p^k (1-p)^{n-k}$ represents the probability of one specific sequence with $k$ successes and $n-k$ failures (e.g., SSS...S FFF...F). The binomial coefficient $\binom{n}{k}$ accounts for all possible arrangements of $k$ successes and $n-k$ failures in $n$ trials.

The sum of probabilities for all possible values of X in a binomial distribution is always 1. This can be shown using the binomial theorem:

$\sum_{k=0}^{n} P(X = k) = \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k}$

By the binomial theorem, $\sum_{k=0}^{n} \binom{n}{k} a^k b^{n-k} = (a+b)^n$. Let $a=p$ and $b=(1-p)$.

$\sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = (1)^n = 1$

This confirms that the probabilities sum to 1, as required for any probability distribution.

Example 1. A fair coin is tossed 5 times. Let X be the number of heads obtained. Is this a binomial experiment? If so, find the probability of getting exactly 3 heads.

Answer:

We need to check if the given experiment of tossing a fair coin 5 times and counting the number of heads satisfies the conditions for a binomial experiment:

Fixed Number of Trials: The coin is tossed 5 times. So, $n=5$. This is a fixed number. (Condition 1 met)
Two Possible Outcomes: In each toss, the outcome is either a Head (H) or a Tail (T). Let's define getting a Head as "Success" (S) and getting a Tail as "Failure" (F). (Condition 2 met)
Independent Trials: The outcome of one coin toss does not affect the outcome of any other toss. The trials are independent. (Condition 3 met)
Constant Probability of Success: The coin is fair. The probability of getting a Head in a single toss is always $p = P(\text{Head}) = \frac{1}{2}$. This probability is constant for all 5 trials. Consequently, the probability of failure is $q = 1 - p = 1 - \frac{1}{2} = \frac{1}{2}$. (Condition 4 met)

Since all four conditions are satisfied, the experiment is a binomial experiment.

The random variable X, representing the number of heads, follows a binomial distribution with parameters $n=5$ and $p=\frac{1}{2}$. We write this as $X \sim B(5, \frac{1}{2})$.

We are asked to find the probability of getting exactly 3 heads. This means we need to find $P(X=3)$.

We use the binomial probability formula $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ (Equation i).

Here, $n=5$ (number of trials), $k=3$ (number of successes, which is heads), $p=\frac{1}{2}$ (probability of success), and $1-p = \frac{1}{2}$ (probability of failure).

Substitute these values into the formula:

P(X=3) = $\binom{5}{3} \left(\frac{1}{2}\right)^3 \left(1-\frac{1}{2}\right)^{5-3}$

P(X=3) = $\binom{5}{3} \left(\frac{1}{2}\right)^3 \left(\frac{1}{2}\right)^2$

P(X=3) = $\binom{5}{3} \left(\frac{1}{2}\right)^{3+2}$

P(X=3) = $\binom{5}{3} \left(\frac{1}{2}\right)^5$

Now, calculate the binomial coefficient $\binom{5}{3}$:

$\binom{5}{3} = \frac{5!}{3!(5-3)!} = \frac{5!}{3!2!}$

$= \frac{5 \times 4 \times 3 \times 2 \times 1}{(3 \times 2 \times 1)(2 \times 1)} = \frac{5 \times 4}{2 \times 1} = \frac{20}{2} = 10$

Calculate $\left(\frac{1}{2}\right)^5$:

Substitute these values back into the expression for $P(X=3)$:

P(X=3) = $10 \times \frac{1}{32} = \frac{10}{32}$

P(X=3) = $\frac{5}{16}$

... (1)

The probability of getting exactly 3 heads when a fair coin is tossed 5 times is $\frac{5}{16}$.

Mean and Variance of Binomial Distribution

For a random variable X that follows a binomial distribution with parameters $n$ (number of trials) and $p$ (probability of success in a single trial), the probability mass function (PMF) is given by $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ for $k=0, 1, \dots, n$. While we can calculate the mean and variance of this distribution using the general formulas for discrete random variables ($E(X) = \sum x_i p_i$ and $\text{Var}(X) = \sum (x_i - \mu)^2 p_i$), there are simpler, direct formulas specifically for the binomial distribution.

Mean of a Binomial Distribution

The mean or expected value of a random variable X following a binomial distribution $B(n, p)$ is given by the product of the number of trials and the probability of success in a single trial.

E(X) = $\mu = np$

... (i)

This formula is quite intuitive. If you perform $n$ trials, and the probability of success in each is $p$, you would "expect" to get, on average, $n \times p$ successes. For example, if you toss a fair coin ($p=0.5$) 10 times ($n=10$), you would expect $10 \times 0.5 = 5$ heads.

Derivation of the Mean

Using the definition of expected value for a discrete random variable, $E(X) = \sum_{k=0}^{n} k P(X=k)$. Substitute the PMF of the binomial distribution:

E(X) = $\sum_{k=0}^{n} k \binom{n}{k} p^k (1-p)^{n-k}$

The term for $k=0$ is $0 \cdot P(X=0) = 0$, so we can start the summation from $k=1$:

E(X) = $\sum_{k=1}^{n} k \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k}$

We can write $k! = k \cdot (k-1)!$:

E(X) = $\sum_{k=1}^{n} k \frac{n!}{k \cdot (k-1)!(n-k)!} p^k (1-p)^{n-k}$

Cancel $k$ in the numerator and denominator:

E(X) = $\sum_{k=1}^{n} \frac{n!}{(k-1)!(n-k)!} p^k (1-p)^{n-k}$

Factor out $n$ and $p$ from the terms (note $p^k = p \cdot p^{k-1}$ and $n! = n \cdot (n-1)!$):

E(X) = $np \sum_{k=1}^{n} \frac{(n-1)!}{(k-1)!(n-k)!} p^{k-1} (1-p)^{n-k}$

Let $m = n-1$ and $j = k-1$. As $k$ goes from 1 to $n$, $j$ goes from 0 to $n-1=m$. Also $n-k = (m+1)-(j+1) = m-j$.

E(X) = $np \sum_{j=0}^{m} \frac{m!}{j!(m-j)!} p^{j} (1-p)^{m-j}$

The sum is the binomial expansion of $(p + (1-p))^m$ by the binomial theorem, which equals $1^m = 1$.

E(X) = $np \cdot 1 = np$

This completes the derivation.

Variance of a Binomial Distribution

The variance of a random variable X following a binomial distribution $B(n, p)$ is given by the product of the number of trials, the probability of success, and the probability of failure.

$\text{Var}(X) = np(1-p) = npq$

... (ii)

where $n$ is the number of trials, $p$ is the probability of success, and $q = 1-p$ is the probability of failure.

The standard deviation is $\sigma = \sqrt{\text{Var}(X)} = \sqrt{np(1-p)} = \sqrt{npq}$.

Derivation of the Variance (Outline)

A common method to derive the variance involves calculating $E(X^2)$ using the formula $E(X^2) = \sum_{k=0}^{n} k^2 P(X=k)$ and then using the property $\text{Var}(X) = E(X^2) - [E(X)]^2$. The calculation of $E(X^2)$ directly is algebraically intensive.

An alternative approach, similar to the mean derivation, is to first calculate $E[X(X-1)] = \sum_{k=0}^{n} k(k-1) P(X=k)$.

E[X(X-1)] = $\sum_{k=0}^{n} k(k-1) \binom{n}{k} p^k (1-p)^{n-k}$

The terms for $k=0$ and $k=1$ are zero because of the $k(k-1)$ factor. So, we start the sum from $k=2$:

E[X(X-1)] = $\sum_{k=2}^{n} k(k-1) \frac{n!}{k!(n-k)!} p^k (1-p)^{n-k}$

We can write $k! = k(k-1)(k-2)!$:

E[X(X-1)] = $\sum_{k=2}^{n} k(k-1) \frac{n!}{k(k-1)(k-2)!(n-k)!} p^k (1-p)^{n-k}$

Cancel $k(k-1)$:

E[X(X-1)] = $\sum_{k=2}^{n} \frac{n!}{(k-2)!(n-k)!} p^k (1-p)^{n-k}$

Factor out $n(n-1)$ and $p^2$ from the terms ($n! = n(n-1)(n-2)!$, $p^k = p^2 \cdot p^{k-2}$):

E[X(X-1)] = $n(n-1)p^2 \sum_{k=2}^{n} \frac{(n-2)!}{(k-2)!(n-k)!} p^{k-2} (1-p)^{n-k}$

Let $m = n-2$ and $j = k-2$. As $k$ goes from 2 to $n$, $j$ goes from 0 to $n-2=m$. Also $n-k = (m+2)-(j+2) = m-j$.

E[X(X-1)] = $n(n-1)p^2 \sum_{j=0}^{m} \binom{m}{j} p^{j} (1-p)^{m-j}$

Again, the sum is the binomial expansion of $(p + (1-p))^m$, which is $1^m = 1$.

E[X(X-1)] = $n(n-1)p^2 \cdot 1 = n(n-1)p^2$

Now we relate $E[X(X-1)]$ to $E(X^2)$: $E[X(X-1)] = E[X^2 - X] = E(X^2) - E(X)$.

E(X$^2$) - E(X) = $n(n-1)p^2$

We know $E(X) = np$.

E(X$^2$) = $n(n-1)p^2 + E(X) = n(n-1)p^2 + np$

Finally, use the variance formula $\text{Var}(X) = E(X^2) - [E(X)]^2$:

$\text{Var}(X) = (n(n-1)p^2 + np) - (np)^2$

$\text{Var}(X) = n^2p^2 - np^2 + np - n^2p^2$

$\text{Var}(X) = np - np^2$

Factor out $np$:

$\text{Var}(X) = np(1-p)$

Let $q = 1-p$:

$\text{Var}(X) = npq$

This completes the derivation.

Example 1. If a binomial distribution has 10 trials and the probability of success in each trial is 0.4, find the mean and variance of the distribution.

Answer:

The random variable X follows a binomial distribution with parameters $n=10$ and $p=0.4$. So, $X \sim B(10, 0.4)$.

We need to find the mean and variance of this distribution.

Given:

Number of trials, $n = 10$

Probability of success, $p = 0.4$

First, calculate the probability of failure, $q$:

q = $1 - p = 1 - 0.4 = 0.6$

Calculating the Mean:

Using the formula for the mean of a binomial distribution, $E(X) = np$ (Equation i):

Mean = $10 \times 0.4 = 4$

... (1)

The mean of the distribution is 4.

Calculating the Variance:

Using the formula for the variance of a binomial distribution, $\text{Var}(X) = npq$ (Equation ii):

Variance = $10 \times 0.4 \times 0.6$

Variance = $4 \times 0.6 = 2.4$

... (2)

The variance of the distribution is 2.4.

The standard deviation would be $\sigma = \sqrt{2.4} \approx 1.549$.