Applied Mathematics for Class 11th & 12th (Concepts and Questions)
11th	Concepts	Questions
12th	Concepts	Questions

Applied Maths Class 12th Chapters (Concepts)
1. Numbers, Quantification and Numerical Applications	2. Matrices	3. Differentiation and Its Applications
4. Integration and Its Application	5. Differential Equations and Modeling	6. Probability Distribution
7. Inferential Statistics	8. Index Numbers and Time Based Data	9. Financial Mathematics
10. Linear Programming

Content On This Page
Probability Distribution	Mathematical Expectation	Variance
Binomial Distribution	Poisson Distribution	Normal Distribution

Chapter 6 Probability Distribution (Concepts)

Welcome to this insightful chapter exploring Probability Distributions, a crucial extension of basic probability theory within Applied Mathematics. While earlier concepts focused on calculating the probability of single events, probability distributions provide a comprehensive picture by describing the likelihood associated with all possible numerical outcomes of a random experiment. This allows us to model and analyze random phenomena more completely, understanding not just individual probabilities but also the overall pattern of outcomes, their average value, and their variability. These concepts are fundamental in statistics, risk management, quality control, operations research, and any field dealing with uncertainty and data modeling.

We begin by formally defining a Random Variable ($X$) as a variable whose value is a numerical outcome of a random experiment. A crucial distinction is made between:

Discrete Random Variables: Variables that can take on only a finite or countably infinite number of distinct values (e.g., the number of heads in three coin tosses, the number of defective items in a sample).
Continuous Random Variables: Variables that can take on any value within a given range or interval (e.g., the height of a student, the time until a device fails).

Our primary initial focus will be on discrete random variables. The probability distribution of a discrete random variable $X$ specifies the probability for each possible value $x_i$ that $X$ can take. This is usually represented by a table listing pairs $(x_i, p_i)$ or by a probability mass function $P(X=x_i) = p_i$. Two fundamental properties must always hold for a valid probability distribution: $p_i \ge 0$ for all $i$, and the sum of all probabilities must equal 1, i.e., $\sum p_i = 1$.

Once we have a probability distribution, we can calculate important summary measures. The Mean or Expected Value ($E[X]$ or $\mu$) represents the long-run average value of the random variable if the experiment were repeated many times. For a discrete variable, it's calculated as a weighted average: $\mu = E[X] = \sum x_i p_i$. To measure the spread or variability of the distribution around the mean, we use the Variance ($Var(X)$ or $\sigma^2$) and its square root, the Standard Deviation ($\sigma$). Variance is the expected value of the squared deviations from the mean: $\sigma^2 = Var(X) = E[(X-\mu)^2] = \sum (x_i - \mu)^2 p_i$. A computationally simpler formula is often used: $\sigma^2 = E[X^2] - (E[X])^2 = (\sum x_i^2 p_i) - \mu^2$. Calculating and interpreting the mean and standard deviation are key skills.

The chapter then typically delves into specific, widely applicable probability distributions:

Binomial Distribution: This models the number of 'successes' ($X$) in a fixed number ($n$) of independent Bernoulli trials (trials with only two outcomes, success or failure, and a constant probability of success $p$). We discuss the conditions required for a situation to be modeled binomially. The probability of obtaining exactly $r$ successes is given by the probability mass function: $P(X=r) = \binom{n}{r} p^r q^{n-r}$, where $q = 1-p$ is the probability of failure. We also learn the formulas for the mean ($np$) and variance ($npq$) of a binomial distribution, which provide quick ways to find its expected value and spread.
Poisson Distribution: (Often included in Applied Mathematics syllabi) This distribution is used to model the number of times an event occurs within a fixed interval of time or space, particularly useful for relatively rare events occurring independently at a constant average rate ($\lambda$). The probability mass function is $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$ for $k=0, 1, 2, \dots$. A notable property is that both the mean and the variance of a Poisson distribution are equal to the parameter $\lambda$.
Normal Distribution: (May be introduced conceptually) This is the most important continuous probability distribution, characterized by its symmetric, bell-shaped curve. It's defined by its mean ($\mu$) and standard deviation ($\sigma$). Key properties like the empirical rule (approximately 68% of data within $\mu \pm \sigma$, 95% within $\mu \pm 2\sigma$, 99.7% within $\mu \pm 3\sigma$) might be mentioned. The concept of the standard normal variable $Z = \frac{X-\mu}{\sigma}$ and the use of standard normal tables (Z-tables) to find probabilities (areas under the curve) are often introduced as essential tools for working with normal distributions.

This chapter provides crucial tools for modeling random phenomena, moving from single event probabilities to understanding the entire distribution of possibilities and their key characteristics, essential for statistical analysis and decision-making under uncertainty.

Probability Distribution

In probability theory, we often deal with numerical outcomes of random experiments. A random variable is a variable whose value is a numerical outcome of a random phenomenon. It is a function that maps the outcomes of a random experiment to real numbers. Random variables allow us to apply mathematical analysis to the results of random events.

Random variables are primarily classified into two types:

Discrete Random Variable: A variable that can take only a finite or countably infinite number of distinct values. These values are typically integers or other discrete units. Examples include the number of heads in coin tosses, the number of defective items in a sample, or the number of calls received by a call centre in an hour.
Continuous Random Variable: A variable that can take any value within a given range or interval. These values are typically measurements. Examples include the height or weight of a person, the temperature of a place, or the time taken to complete a task.

A probability distribution describes how the total probability of 1 is distributed among the possible values of a random variable. It is a function that links each possible value (or range of values) of the random variable to its probability of occurrence. Understanding the probability distribution is crucial for characterizing the behavior of a random variable and for making predictions or inferences about the random phenomenon it represents.

Discrete Probability Distribution

A discrete probability distribution describes the probabilities for a discrete random variable. It is defined by a list of the possible values the variable can take and the probability associated with each value.

Probability Mass Function (PMF)

For a discrete random variable $X$, the Probability Mass Function (PMF), often denoted by $P(X=x)$ or $f(x)$, gives the probability that the random variable $X$ takes on a specific value $x$. The PMF essentially lists out all possible outcomes and their corresponding probabilities.

The properties of a valid PMF $P(X=x)$ for a discrete random variable $X$ are:

The probability of each specific value must be non-negative and not greater than 1:

$0 \le P(X=x) \le 1$

(For all possible values of $x$)
The sum of the probabilities for all possible values must be exactly 1:

$\sum_x P(X=x) = 1$

(Sum over all possible values of $x$)

... (1)

A discrete probability distribution can be represented in various ways:

Table: Listing the values $x$ and their probabilities $P(X=x)$ in columns.
Graph: Using a bar chart or histogram where the height of the bar represents the probability for each value of $x$.
Formula: A mathematical expression that gives $P(X=x)$ for any valid value of $x$.

Example: Consider the random experiment of tossing two fair coins. Let $X$ be the discrete random variable representing the number of heads obtained. The possible outcomes of the experiment are {HH, HT, TH, TT}. The corresponding values of the random variable $X$ are:

For HH, $X=2$
For HT, $X=1$
For TH, $X=1$
For TT, $X=0$

The possible values for $X$ are 0, 1, and 2.

Now, we calculate the probability for each possible value of $X$:

$P(X=0) = P(\text{TT}) = \frac{1}{4}$ (Probability of getting 0 heads)
$P(X=1) = P(\text{HT or TH}) = P(\text{HT}) + P(\text{TH}) = \frac{1}{4} + \frac{1}{4} = \frac{2}{4} = \frac{1}{2}$ (Probability of getting 1 head)
$P(X=2) = P(\text{HH}) = \frac{1}{4}$ (Probability of getting 2 heads)

The probability distribution of $X$ can be shown in a table:

$x$ (Number of Heads)	$P(X=x)$
0	1/4
1	1/2
2	1/4

Let's check the properties of PMF:

All probabilities ($1/4, 1/2, 1/4$) are between 0 and 1. Property 1 is satisfied.
The sum of probabilities is:

$\sum P(X=x) = P(X=0) + P(X=1) + P(X=2)$

$= \frac{1}{4} + \frac{1}{2} + \frac{1}{4} = \frac{1}{4} + \frac{2}{4} + \frac{1}{4} = \frac{1+2+1}{4} = \frac{4}{4} = 1$

Property 2 is satisfied.

Cumulative Distribution Function (CDF)

For a discrete random variable $X$, the Cumulative Distribution Function (CDF), denoted by $F(x)$, gives the probability that the random variable $X$ takes on a value less than or equal to a specific value $x$. It accumulates the probabilities up to that value.

$F(x) = P(X \le x) = \sum_{t \le x} P(X=t)$

... (2)

where the sum is over all possible values $t$ of $X$ such that $t \le x$.

For the coin toss example ($X$=number of heads):

$F(0) = P(X \le 0) = P(X=0) = \frac{1}{4}$
$F(1) = P(X \le 1) = P(X=0) + P(X=1) = \frac{1}{4} + \frac{1}{2} = \frac{1}{4} + \frac{2}{4} = \frac{3}{4}$
$F(2) = P(X \le 2) = P(X=0) + P(X=1) + P(X=2) = \frac{1}{4} + \frac{1}{2} + \frac{1}{4} = 1$

The CDF is defined for all real numbers $x$. For values of $x$ not in the set of possible outcomes, $F(x)$ remains constant between consecutive possible values and jumps up at each possible value. For $x < 0$, $F(x) = 0$. For $x \ge 2$, $F(x) = 1$. For instance:

$F(0.5) = P(X \le 0.5) = P(X=0) = 1/4$ (since the only possible value $\le 0.5$ is 0)
$F(1.5) = P(X \le 1.5) = P(X=0) + P(X=1) = 1/4 + 1/2 = 3/4$

Properties of a discrete CDF $F(x)$:

$F(x)$ is non-decreasing: If $a < b$, then $F(a) \le F(b)$.
$F(x)$ is a step function, jumping at the possible values of $X$.
$\lim\limits_{x \to -\infty} F(x) = 0$
$\lim\limits_{x \to \infty} F(x) = 1$
$P(a < X \le b) = F(b) - F(a)$
$P(X = x) = F(x) - \lim\limits_{t \to x^-} F(t)$ (the jump size at $x$)

Continuous Probability Distribution

A continuous probability distribution describes the probabilities for a continuous random variable. Since a continuous random variable can take an infinite number of values in an interval, the probability of it taking any single specific value is zero ($P(X=x) = 0$ for any $x$). Instead, probabilities are defined for intervals of values.

Probability Density Function (PDF)

For a continuous random variable $X$, the Probability Density Function (PDF), denoted by $f(x)$, is a function such that the area under the curve of $f(x)$ between two points $a$ and $b$ gives the probability that $X$ falls within the interval $[a, b]$. The height of the PDF curve at a point $x$ indicates the relative likelihood of the variable being close to $x$.

The properties of a valid PDF $f(x)$ for a continuous random variable $X$ (defined over its domain) are:

The function must be non-negative for all values in its domain:

$f(x) \ge 0$

(For all $x$ in the domain of $X$)
The total area under the curve $y = f(x)$ over the entire domain of $X$ is equal to 1. If the variable is defined over $(-\infty, \infty)$, this means:

$\int\limits_{-\infty}^{\infty} f(x) dx = 1$

... (3)

If the domain is a specific interval $[c, d]$, then $\int\limits_c^d f(x) dx = 1$, and $f(x) = 0$ outside this interval.

The probability that the random variable $X$ takes a value within a specific interval $[a, b]$ is given by the definite integral of the PDF over that interval:

$\mathbf{P(a \le X \le b) = \int\limits_a^b f(x) dx}$

... (4)

For a continuous random variable, $P(X=a) = \int_a^a f(x) dx = 0$. Therefore, for continuous distributions, $P(a \le X \le b) = P(a < X \le b) = P(a \le X < b) = P(a < X < b)$. The inclusion or exclusion of endpoints does not change the probability.

Cumulative Distribution Function (CDF)

For a continuous random variable $X$, the Cumulative Distribution Function (CDF), $F(x)$, gives the probability that $X$ takes on a value less than or equal to $x$. It is defined as the integral of the PDF from the lower limit of the domain (often $-\infty$) up to $x$.

$F(x) = P(X \le x) = \int\limits_{-\infty}^x f(t) dt$

... (5)

From the Fundamental Theorem of Calculus, if $F(x)$ is differentiable, its derivative is the PDF:

$f(x) = \frac{d}{dx} F(x)$

... (6)

Properties of a continuous CDF $F(x)$:

$F(x)$ is non-decreasing: If $a < b$, then $F(a) \le F(b)$.
$F(x)$ is continuous for all $x$.
$\lim\limits_{x \to -\infty} F(x) = 0$
$\lim\limits_{x \to \infty} F(x) = 1$
$P(a \le X \le b) = F(b) - F(a) = \int_a^b f(x) dx$

Examples

Example 1. A discrete random variable $X$ has the following probability distribution:

$x$	$P(X=x)$
1	0.2
2	$k$
3	0.4
4	0.1

Find the value of $k$.

Answer:

For any discrete probability distribution, the sum of the probabilities of all possible outcomes must be equal to 1.

$\sum_x P(X=x) = 1$

[Property of PMF]

Summing the given probabilities:

$P(X=1) + P(X=2) + P(X=3) + P(X=4) = 1$

$0.2 + k + 0.4 + 0.1 = 1$

... (1)

Combine the known values on the left side:

$0.7 + k = 1$

Solve for $k$:

$k = 1 - 0.7$

$k = 0.3$

... (2)

Thus, the value of $k$ is 0.3.

Example 2. A continuous random variable $X$ has the probability density function (PDF) given by $f(x) = cx$ for $0 \le x \le 2$, and $f(x) = 0$ otherwise. Find the value of the constant $c$. Also, find $P(1 \le X \le 1.5)$.

Answer:

Given the PDF $f(x) = cx$ for $0 \le x \le 2$, and $f(x) = 0$ elsewhere.

To find the value of the constant $c$, we use the property that the total area under the PDF curve over its entire domain is 1.

$\int\limits_{-\infty}^{\infty} f(x) dx = 1$

[Property of PDF]

Since $f(x)$ is non-zero only for $0 \le x \le 2$, the integral becomes:

$\int\limits_{0}^{2} cx \, dx = 1$

... (1)

Evaluate the integral:

$c \int\limits_{0}^{2} x \, dx = 1$

$c \left[\frac{x^2}{2}\right]_0^2 = 1$

[Using $\int x^n dx = \frac{x^{n+1}}{n+1}$]

$c \left(\frac{2^2}{2} - \frac{0^2}{2}\right) = 1$

$c \left(\frac{4}{2} - 0\right) = 1$

$c (2) = 1$

$c = \frac{1}{2}$

... (2)

The value of the constant $c$ is $\frac{1}{2}$. So the PDF is $f(x) = \frac{1}{2}x$ for $0 \le x \le 2$, and $f(x)=0$ otherwise.

Now, we need to find the probability $P(1 \le X \le 1.5)$. This is given by the integral of the PDF over the interval $[1, 1.5]$.

$P(1 \le X \le 1.5) = \int\limits_{1}^{1.5} f(x) dx$

[Formula for probability in continuous distribution]

Substitute the PDF $f(x) = \frac{1}{2}x$:

$P(1 \le X \le 1.5) = \int\limits_{1}^{1.5} \frac{1}{2}x \, dx$

... (3)

Evaluate the integral:

$= \frac{1}{2} \int\limits_{1}^{1.5} x \, dx = \frac{1}{2} \left[\frac{x^2}{2}\right]_1^{1.5}$

$= \frac{1}{2} \left(\frac{(1.5)^2}{2} - \frac{(1)^2}{2}\right) = \frac{1}{2} \left(\frac{2.25}{2} - \frac{1}{2}\right)$

$= \frac{1}{2} \left(\frac{2.25 - 1}{2}\right) = \frac{1}{2} \left(\frac{1.25}{2}\right)$

$= \frac{1.25}{4}$

To express this as a fraction: $1.25 = \frac{125}{100} = \frac{5}{4}$.

$P(1 \le X \le 1.5) = \frac{5/4}{4} = \frac{5}{4} \times \frac{1}{4} = \frac{5}{16}$

... (4)

The probability $P(1 \le X \le 1.5)$ is $\frac{5}{16}$.

Mathematical Expectation

The mathematical expectation or expected value of a random variable is a crucial concept in probability and statistics. It represents the long-run average value of the random variable if the random experiment is repeated many times. It is a measure of the central location or centre of gravity of the probability distribution. The expected value of a random variable $X$ is denoted by $E(X)$ or $\mu$ (mu).

Think of it this way: If you performed a random experiment a very large number of times and calculated the average of the outcomes, that average would be very close to the expected value.

Expected Value for Discrete Random Variables

For a discrete random variable $X$ that can take on values $x_1, x_2, x_3, \dots, x_n$ (or infinitely many values in a countable set), with corresponding probabilities $P(X=x_1), P(X=x_2), P(X=x_3), \dots, P(X=x_n)$, the expected value $E(X)$ is calculated as the weighted average of the possible values, where the weights are the corresponding probabilities.

The formula for the expected value of a discrete random variable $X$ is:

$\mathbf{E(X) = \sum_i x_i P(X=x_i)}$

... (1)

where the sum is taken over all possible values $x_i$ that the random variable $X$ can take, and $P(X=x_i)$ is the probability mass function (PMF) at $x_i$.

Expected Value for Continuous Random Variables

For a continuous random variable $X$ with a probability density function (PDF) $f(x)$, defined over the range $(-\infty, \infty)$, the expected value is calculated using integration instead of summation, because the variable can take on any value within a range.

The formula for the expected value of a continuous random variable $X$ is:

$\mathbf{E(X) = \int\limits_{-\infty}^{\infty} x f(x) dx}$

... (2)

If the PDF $f(x)$ is zero outside a specific interval $[a, b]$, then the limits of integration can be restricted to that interval:

$E(X) = \int\limits_{a}^{b} x f(x) dx$

... (3)

The integral sums up the product of each possible value $x$ and its "probability weight" $f(x) dx$ over the entire range.

Properties of Expectation

Mathematical expectation has several useful properties that simplify calculations. Let $X$ and $Y$ be random variables, and let $a$ and $b$ be constants.

Expected value of a constant: The expected value of a constant is the constant itself.

$E(c) = c$

... (4)
Expected value of a constant multiplied by a random variable: A constant factor can be pulled out of the expectation.

$E(aX) = a E(X)$

... (5)
Expected value of a sum of random variables: The expected value of the sum of random variables is the sum of their expected values. This property holds regardless of whether the variables are independent or not.

$E(X + Y) = E(X) + E(Y)$

... (6)

This property extends to any finite number of random variables: $E(X_1 + X_2 + \dots + X_n) = E(X_1) + E(X_2) + \dots + E(X_n)$.
Expected value of a linear transformation: Combining properties 1, 2, and 3.

$E(aX + b) = a E(X) + E(b) = a E(X) + b$

... (7)
Expected value of a product of independent random variables: If $X$ and $Y$ are independent random variables, then the expected value of their product is the product of their expected values.

$E(XY) = E(X) E(Y)$

... (8)

Note: This property is ONLY true if $X$ and $Y$ are independent. In general, $E(XY) \neq E(X)E(Y)$.
Expected value of a function of a random variable: If $g(X)$ is a function of a random variable $X$, the expected value of $g(X)$ is calculated as follows:
- For discrete $X$ with PMF $P(X=x_i)$:
  
  $E(g(X)) = \sum_i g(x_i) P(X=x_i)$
  
  ... (9)
- For continuous $X$ with PDF $f(x)$:
  
  $E(g(X)) = \int\limits_{-\infty}^{\infty} g(x) f(x) dx$
  
  ... (10)
A common application is finding $E(X^2)$ by setting $g(X) = X^2$.
- For discrete $X$: $E(X^2) = \sum_i x_i^2 P(X=x_i)$
- For continuous $X$: $E(X^2) = \int\limits_{-\infty}^{\infty} x^2 f(x) dx$

Examples

Example 1. Find the expected value of the random variable $X$ representing the number of heads in two fair coin tosses. The probability distribution is given by:

$x$	$P(X=x)$
0	1/4
1	1/2
2	1/4

Answer:

The random variable $X$ is discrete with possible values $x_1=0$, $x_2=1$, $x_3=2$ and corresponding probabilities $P(X=0)=\frac{1}{4}$, $P(X=1)=\frac{1}{2}$, $P(X=2)=\frac{1}{4}$.

Using the formula for the expected value of a discrete random variable:

$E(X) = \sum_{i=1}^3 x_i P(X=x_i)$

[Using formula (1)]

$E(X) = (0) \times P(X=0) + (1) \times P(X=1) + (2) \times P(X=2)$

... (1)

Substitute the given probabilities:

$E(X) = (0) \times \frac{1}{4} + (1) \times \frac{1}{2} + (2) \times \frac{1}{4}$

$E(X) = 0 + \frac{1}{2} + \frac{2}{4}$

Simplify the expression:

$E(X) = \frac{1}{2} + \frac{1}{2} = 1$

... (2)

The expected number of heads when tossing two fair coins is 1. This intuitively makes sense, as on average, you would expect half the tosses to be heads.

Example 2. A continuous random variable $X$ has the probability density function (PDF) given by $f(x) = 2x$ for $0 \le x \le 1$, and $f(x) = 0$ otherwise. Find the expected value of $X$.

Answer:

Given the PDF $f(x) = 2x$ for $0 \le x \le 1$, and $f(x) = 0$ elsewhere.

Using the formula for the expected value of a continuous random variable (Equation 3, as the PDF is non-zero only in $[0, 1]$):

$E(X) = \int\limits_{0}^{1} x f(x) dx$

[Using formula (3)]

Substitute the expression for $f(x)$:

$E(X) = \int\limits_{0}^{1} x (2x) dx$

... (1)

Simplify the integrand:

$E(X) = \int\limits_{0}^{1} 2x^2 dx$

Integrate with respect to $x$:

$E(X) = 2 \int\limits_{0}^{1} x^2 dx = 2 \left[\frac{x^{2+1}}{2+1}\right]_0^1 = 2 \left[\frac{x^3}{3}\right]_0^1$

[Using $\int x^n dx = \frac{x^{n+1}}{n+1}$]

Evaluate the definite integral:

$E(X) = 2 \left(\frac{1^3}{3} - \frac{0^3}{3}\right)$

$E(X) = 2 \left(\frac{1}{3} - 0\right) = 2 \times \frac{1}{3} = \frac{2}{3}$

... (2)

The expected value of the continuous random variable $X$ with the given PDF is $\frac{2}{3}$.

Variance

While the expected value gives us a measure of the central tendency of a random variable, it doesn't tell us anything about the dispersion or spread of the possible values around this centre. The variance and standard deviation are measures that quantify this spread.

The variance of a random variable $X$ is a measure of how far the values of the random variable are spread out from its expected value (mean). A high variance indicates that the possible values are widely spread, while a low variance indicates that they are clustered closely around the mean. Variance is denoted by $Var(X)$ or $\sigma^2$ (sigma squared).

The variance is formally defined as the expected value of the squared difference between the random variable $X$ and its mean $E(X)$ (denoted by $\mu$):

$\mathbf{Var(X) = E[(X - E(X))^2]}$

... (1)

Or, using the notation $\mu = E(X)$:

$\mathbf{Var(X) = E[(X - \mu)^2]}$

... (2)

We square the difference $(X - \mu)$ because squaring makes all differences positive (so that deviations in either direction from the mean contribute to the spread) and also emphasizes larger deviations. Taking the expected value of this squared difference gives the average squared deviation from the mean.

Alternative Formula for Variance

The definition formula $Var(X) = E[(X - \mu)^2]$ is useful for understanding what variance measures, but it can be tedious to compute directly, especially when dealing with the squared terms. An alternative formula is often easier for calculations:

$\mathbf{Var(X) = E(X^2) - [E(X)]^2}$

... (3)

This formula states that the variance is the expected value of the square of the random variable minus the square of the expected value of the random variable.

Derivation of the Alternative Formula:
Starting with the definition $Var(X) = E[(X - \mu)^2]$ where $\mu = E(X)$:

$Var(X) = E[(X - \mu)^2]$

Expand the squared term $(X - \mu)^2$:

$Var(X) = E[X^2 - 2X\mu + \mu^2]$

[Using $(a-b)^2 = a^2 - 2ab + b^2$]

Using the linearity property of expectation $E(aY + bZ) = aE(Y) + bE(Z)$:

$Var(X) = E(X^2) + E(-2X\mu) + E(\mu^2)$

Using the properties $E(cY) = cE(Y)$ and $E(c) = c$, where $c$ is a constant. Here, $-2\mu$ and $\mu^2$ are constants (since $\mu = E(X)$ is a single value):

$Var(X) = E(X^2) - 2\mu E(X) + \mu^2$

... (A)

Now, substitute $\mu = E(X)$ back into equation (A):

$Var(X) = E(X^2) - 2(E(X)) E(X) + (E(X))^2$

$Var(X) = E(X^2) - 2[E(X)]^2 + [E(X)]^2$

Combine the terms with $[E(X)]^2$:

$Var(X) = E(X^2) - [E(X)]^2$

... (3)

This confirms the alternative formula for variance. To use this formula, you first need to calculate $E(X)$ and $E(X^2)$.

Expected Value of $X^2$, $E(X^2)$

The term $E(X^2)$ is the expected value of the random variable $X^2$. Using the property $E(g(X)) = \sum g(x_i) P(X=x_i)$ for discrete variables and $E(g(X)) = \int g(x) f(x) dx$ for continuous variables (as discussed in the previous section on Properties of Expectation, point 6), we set $g(X) = X^2$.

For a discrete random variable $X$ with possible values $x_i$ and probabilities $P(X=x_i)$:

$\mathbf{E(X^2) = \sum_i x_i^2 P(X=x_i)}$

... (4)
For a continuous random variable $X$ with PDF $f(x)$:

$\mathbf{E(X^2) = \int\limits_{-\infty}^{\infty} x^2 f(x) dx}$

... (5)

If the PDF is defined over $[a, b]$, the integral becomes $\int\limits_{a}^{b} x^2 f(x) dx$.

Variance for Discrete Random Variables

For a discrete random variable $X$ with possible values $x_1, x_2, \dots, x_n$ and corresponding probabilities $P(X=x_1), P(X=x_2), \dots, P(X=x_n)$, and mean $\mu = E(X)$:

Using the definition:

$\mathbf{Var(X) = \sum_{i=1}^n (x_i - \mu)^2 P(X=x_i)}$

... (6)

Using the alternative formula:

$\mathbf{Var(X) = E(X^2) - [E(X)]^2}$

... (7)

where $E(X) = \sum x_i P(X=x_i)$ and $E(X^2) = \sum x_i^2 P(X=x_i)$.

Variance for Continuous Random Variables

For a continuous random variable $X$ with probability density function $f(x)$ and mean $\mu = E(X)$:

Using the definition:

$\mathbf{Var(X) = \int\limits_{-\infty}^{\infty} (x - \mu)^2 f(x) dx}$

... (8)

Using the alternative formula:

$\mathbf{Var(X) = E(X^2) - [E(X)]^2}$

... (9)

where $E(X) = \int x f(x) dx$ and $E(X^2) = \int x^2 f(x) dx$, with integrals taken over the appropriate range of $x$.

Standard Deviation

The standard deviation of a random variable $X$, denoted by $\sigma$, is defined as the positive square root of the variance:

$\mathbf{\sigma = \sqrt{Var(X)}}$

... (10)

The standard deviation is often preferred over variance as a measure of spread for a couple of reasons:

It has the same units as the random variable $X$ itself, making it easier to interpret in the context of the problem.
It provides a more direct measure of the typical distance of a value from the mean.

For example, if the random variable is height measured in centimetres, the variance will be in square centimetres ($cm^2$), while the standard deviation will be in centimetres ($cm$).

Properties of Variance

Like expectation, variance also has several useful properties. Let $X$ and $Y$ be random variables, and let $a$ and $b$ be constants.

Variance of a constant: A constant has no variability.

$Var(c) = 0$

... (11)
Variance of a constant times a random variable: The constant comes out squared.

$Var(aX) = a^2 Var(X)$

... (12)

Derivation: $Var(aX) = E[(aX - E(aX))^2] = E[(aX - aE(X))^2] = E[a^2 (X - E(X))^2] = a^2 E[(X - E(X))^2] = a^2 Var(X)$.
Variance of a linear transformation: Adding a constant shifts the distribution but does not change its spread.

$Var(aX + b) = Var(aX) + Var(b) = a^2 Var(X) + 0 = a^2 Var(X)$

... (13)

This shows that adding or subtracting a constant ($b$) does not affect the variance, while multiplying by a constant ($a$) multiplies the variance by the square of the constant.
Variance of a sum of independent random variables: If $X$ and $Y$ are independent random variables, the variance of their sum (or difference) is the sum of their variances.

If $X, Y$ are independent, then $Var(X + Y) = Var(X) + Var(Y)$

... (14)

If $X, Y$ are independent, then $Var(X - Y) = Var(X) + Var(Y)$

... (15)

Note: This property holds only if $X$ and $Y$ are independent. For dependent variables, a covariance term is involved: $Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y)$.

Examples

Example 1. Find the variance of the random variable $X$ from the example of tossing two fair coins (Number of heads). The probability distribution is $P(X=0)=\frac{1}{4}$, $P(X=1)=\frac{1}{2}$, $P(X=2)=\frac{1}{4}$. We found that $E(X) = 1$.

Answer:

We can use either the definition formula $Var(X) = \sum (x_i - \mu)^2 P(X=x_i)$ or the alternative formula $Var(X) = E(X^2) - [E(X)]^2$. Let's use both methods to verify. The mean is $\mu = E(X) = 1$.

Method 1: Using the Definition Formula

$Var(X) = (0 - \mu)^2 P(X=0) + (1 - \mu)^2 P(X=1) + (2 - \mu)^2 P(X=2)$

$Var(X) = (0 - 1)^2 P(X=0) + (1 - 1)^2 P(X=1) + (2 - 1)^2 P(X=2)$

[Substitute $\mu=1$]

$Var(X) = (-1)^2 \times \frac{1}{4} + (0)^2 \times \frac{1}{2} + (1)^2 \times \frac{1}{4}$

[Substitute probabilities]

$Var(X) = 1 \times \frac{1}{4} + 0 \times \frac{1}{2} + 1 \times \frac{1}{4}$

$Var(X) = \frac{1}{4} + 0 + \frac{1}{4}$

$Var(X) = \frac{2}{4} = \frac{1}{2}$

... (1)

Method 2: Using the Alternative Formula

$Var(X) = E(X^2) - [E(X)]^2$. We already know $E(X) = 1$. First, calculate $E(X^2) = \sum x_i^2 P(X=x_i)$:

$E(X^2) = (0)^2 P(X=0) + (1)^2 P(X=1) + (2)^2 P(X=2)$

[Using formula (4)]

$E(X^2) = 0^2 \times \frac{1}{4} + 1^2 \times \frac{1}{2} + 2^2 \times \frac{1}{4}$

[Substitute values and probabilities]

$E(X^2) = 0 \times \frac{1}{4} + 1 \times \frac{1}{2} + 4 \times \frac{1}{4}$

$E(X^2) = 0 + \frac{1}{2} + 1 = \frac{3}{2}$

... (2)

Now, calculate the variance using formula (7):

$Var(X) = E(X^2) - [E(X)]^2$

$Var(X) = \frac{3}{2} - (1)^2$

[Substitute $E(X^2)$ from (2) and $E(X)=1$]

$Var(X) = \frac{3}{2} - 1 = \frac{3 - 2}{2} = \frac{1}{2}$

... (3)

Both methods yield the same result. The variance of the number of heads in two coin tosses is $\frac{1}{2}$.

The standard deviation is $\sigma = \sqrt{Var(X)} = \sqrt{\frac{1}{2}} = \frac{1}{\sqrt{2}}$. We can rationalize the denominator: $\frac{1}{\sqrt{2}} \times \frac{\sqrt{2}}{\sqrt{2}} = \frac{\sqrt{2}}{2} \approx 0.707$.

Example 2. For the continuous random variable $X$ with PDF $f(x) = 2x$ for $0 \le x \le 1$, and $f(x) = 0$ otherwise, find the variance. We previously calculated that $E(X) = \frac{2}{3}$.

Answer:

Given PDF $f(x) = 2x$ for $0 \le x \le 1$, and $f(x)=0$ elsewhere. Mean $E(X) = \frac{2}{3}$.

We will use the alternative formula $Var(X) = E(X^2) - [E(X)]^2$. First, calculate $E(X^2) = \int x^2 f(x) dx$. Since the PDF is non-zero only from 0 to 1, the integral limits are 0 and 1.

$E(X^2) = \int\limits_{0}^{1} x^2 f(x) dx = \int\limits_{0}^{1} x^2 (2x) dx$

[Using formula (5)]

$E(X^2) = \int\limits_{0}^{1} 2x^3 dx$

... (1)

Evaluate the integral:

$E(X^2) = 2 \int\limits_{0}^{1} x^3 dx = 2 \left[\frac{x^{3+1}}{3+1}\right]_0^1 = 2 \left[\frac{x^4}{4}\right]_0^1$

[Using $\int x^n dx = \frac{x^{n+1}}{n+1}$]

$E(X^2) = 2 \left(\frac{1^4}{4} - \frac{0^4}{4}\right)$

$E(X^2) = 2 \left(\frac{1}{4} - 0\right) = 2 \times \frac{1}{4} = \frac{2}{4} = \frac{1}{2}$

... (2)

Now, calculate the variance using formula (9):

$Var(X) = E(X^2) - [E(X)]^2$

$Var(X) = \frac{1}{2} - \left(\frac{2}{3}\right)^2$

[Substitute $E(X^2)$ from (2) and $E(X)=\frac{2}{3}$]

$Var(X) = \frac{1}{2} - \frac{4}{9}$

To subtract the fractions, find a common denominator, which is the LCM of 2 and 9, which is 18.

$Var(X) = \frac{1 \times 9}{2 \times 9} - \frac{4 \times 2}{9 \times 2} = \frac{9}{18} - \frac{8}{18}$

[Common denominator is 18]

$Var(X) = \frac{9 - 8}{18} = \frac{1}{18}$

... (3)

The variance of $X$ is $\frac{1}{18}$.

The standard deviation is $\sigma = \sqrt{Var(X)} = \sqrt{\frac{1}{18}} = \frac{1}{\sqrt{18}} = \frac{1}{3\sqrt{2}}$. Rationalizing the denominator: $\frac{1}{3\sqrt{2}} \times \frac{\sqrt{2}}{\sqrt{2}} = \frac{\sqrt{2}}{3 \times 2} = \frac{\sqrt{2}}{6} \approx \frac{1.414}{6} \approx 0.236$.

Binomial Distribution

Among the various discrete probability distributions, the Binomial Distribution is one of the most important and frequently used. It is applied to situations where a fixed number of independent trials are conducted, and each trial has only two possible outcomes. The binomial distribution helps us calculate the probability of getting a specific number of 'successes' in these trials.

Bernoulli Trial

The foundation of the Binomial Distribution is the Bernoulli Trial. A Bernoulli trial is a single random experiment that has only two possible outcomes. These outcomes are typically labelled as:

Success: The desired outcome. The probability of success is denoted by $p$.
Failure: The other outcome. The probability of failure is denoted by $q$.

Since there are only two outcomes, the sum of their probabilities must be 1. Thus, the probability of failure is $q = 1 - p$. For a fair trial, $0 < p < 1$. If $p=0$ or $p=1$, the outcome is not random.

Examples of Bernoulli trials include:

Tossing a coin (Head or Tail)
Checking if a manufactured item is defective (Defective or Not Defective)
Answering a multiple-choice question with one correct option (Correct or Incorrect)
Drawing a card from a deck and noting its colour (Red or Black)

Binomial Experiment (or Sequence of Bernoulli Trials)

A Binomial Experiment is a sequence of independent Bernoulli trials that satisfy four specific conditions:

There must be a fixed number of trials, denoted by $n$. The experiment consists of repeating the same Bernoulli trial $n$ times.
Each trial must have only two possible outcomes, designated as "success" (S) and "failure" (F).
The probability of success, $p$, is the same for each trial. Consequently, the probability of failure, $q = 1 - p$, is also the same for each trial.
The trials must be independent. The outcome of one trial does not influence the outcome of any other trial.

In a binomial experiment, the random variable of interest is the number of successes obtained in the $n$ trials. Let this random variable be $X$. Since there are $n$ trials, the number of successes $X$ can take any integer value from 0 (no successes) up to $n$ (all trials are successes).

If a random variable $X$ follows a binomial distribution with $n$ trials and probability of success $p$, we denote this by $X \sim B(n, p)$.

Probability Mass Function (PMF) of the Binomial Distribution

The Probability Mass Function (PMF) of a binomial random variable $X$ gives the probability of obtaining exactly $k$ successes in $n$ independent Bernoulli trials, where the probability of success in a single trial is $p$.

Consider a sequence of $n$ trials. If we want exactly $k$ successes, these $k$ successes can occur in any combination of the $n$ trials. The number of ways to choose $k$ positions for the successes out of $n$ trials is given by the binomial coefficient $\binom{n}{k}$.

For any specific sequence with exactly $k$ successes and $n-k$ failures (e.g., S S ... S F F ... F), the probability of this specific sequence occurring is $p^k q^{n-k}$ due to the independence of trials ($p$ for each success, $q$ for each failure, multiplied together).

Since there are $\binom{n}{k}$ such distinct sequences, and each has a probability of $p^k q^{n-k}$, the total probability of getting exactly $k$ successes is the sum of the probabilities of all these sequences.

The PMF of a binomial random variable $X \sim B(n, p)$ for obtaining exactly $k$ successes is:

$\mathbf{P(X=k) = \binom{n}{k} p^k q^{n-k}}$

for $k = 0, 1, 2, \dots, n$

... (1)

where:

$k$ is the number of successes we are interested in ($0 \le k \le n$).
$\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient, representing the number of combinations of $n$ items taken $k$ at a time.
$p$ is the probability of success in a single trial.
$q = 1 - p$ is the probability of failure in a single trial.
$n-k$ is the number of failures in $n$ trials.

The sum of probabilities for all possible values of $k$ must be 1, i.e., $\sum_{k=0}^n P(X=k) = \sum_{k=0}^n \binom{n}{k} p^k q^{n-k}$. By the Binomial Theorem, this sum is equal to $(p+q)^n = (p + (1-p))^n = 1^n = 1$. This confirms that it is a valid probability distribution.

Mean and Variance of the Binomial Distribution

For a binomial random variable $X \sim B(n, p)$, the mean, variance, and standard deviation have simple formulas.

Mean ($E(X)$ or $\mu$)

The expected number of successes in $n$ trials is given by the product of the number of trials and the probability of success in a single trial.

$\mathbf{\mu = E(X) = np}$

... (2)

The derivation of this formula can be done using the definition of expectation for a discrete variable $E(X) = \sum_{k=0}^n k P(X=k) = \sum_{k=0}^n k \binom{n}{k} p^k q^{n-k}$. This involves algebraic manipulation of summations and binomial coefficients.

Variance ($Var(X)$ or $\sigma^2$)

The variance measures the spread of the distribution around the mean. For a binomial distribution, the variance is given by the product of the number of trials, the probability of success, and the probability of failure.

$\mathbf{\sigma^2 = Var(X) = npq}$

... (3)

The derivation of the variance formula is more involved than that for the mean and typically uses $Var(X) = E(X^2) - [E(X)]^2$, requiring the calculation of $E(X^2) = \sum_{k=0}^n k^2 P(X=k)$.

Standard Deviation ($\sigma$)

The standard deviation is the positive square root of the variance:

$\mathbf{\sigma = \sqrt{npq}}$

... (4)

Examples

Example 1. A fair coin is tossed 5 times. Let $X$ be the number of heads. Find the probability of getting exactly 3 heads. Also, find the mean and variance of $X$.

Answer:

This problem describes a binomial experiment because:

There is a fixed number of trials: $n = 5$ coin tosses.
Each trial has two outcomes: Head (Success) or Tail (Failure).
The probability of success (getting a head) is constant for each toss: $p = P(\text{Head}) = \frac{1}{2}$. The probability of failure is $q = 1 - p = 1 - \frac{1}{2} = \frac{1}{2}$.
The trials are independent (the outcome of one toss does not affect others).

The random variable $X$ is the number of heads in 5 tosses, so $X$ follows a binomial distribution with parameters $n=5$ and $p=\frac{1}{2}$. We write this as $X \sim B\left(5, \frac{1}{2}\right)$.

Probability of getting exactly 3 heads:

We need to find $P(X=3)$. Using the Binomial PMF formula (Equation 1) with $n=5$, $p=\frac{1}{2}$, $q=\frac{1}{2}$, and $k=3$:

$P(X=k) = \binom{n}{k} p^k q^{n-k}$

[Binomial PMF]

$P(X=3) = \binom{5}{3} \left(\frac{1}{2}\right)^3 \left(\frac{1}{2}\right)^{5-3}$

... (1)

First, calculate the binomial coefficient $\binom{5}{3}$:

$\binom{5}{3} = \frac{5!}{3!(5-3)!} = \frac{5!}{3!2!}$

[Definition of binomial coefficient]

$= \frac{5 \times 4 \times 3 \times 2 \times 1}{(3 \times 2 \times 1) \times (2 \times 1)} = \frac{5 \times 4}{2 \times 1} = \frac{20}{2} = 10$

[Calculate factorial and simplify]

Now, substitute this back into equation (1):

$P(X=3) = 10 \times \left(\frac{1}{2}\right)^3 \times \left(\frac{1}{2}\right)^2$

$P(X=3) = 10 \times \left(\frac{1}{8}\right) \times \left(\frac{1}{4}\right)$

$P(X=3) = 10 \times \frac{1}{32} = \frac{10}{32}$

Simplify the fraction:

$P(X=3) = \frac{\cancel{10}^{5}}{\cancel{32}_{16}} = \frac{5}{16}$

... (2)

The probability of getting exactly 3 heads in 5 coin tosses is $\frac{5}{16}$.

Mean of $X$:

Using the formula for the mean of a binomial distribution (Equation 2):

$E(X) = np$

$E(X) = 5 \times \frac{1}{2} = \frac{5}{2}$

[Substitute $n=5, p=1/2$] ... (3)

The mean number of heads is $2.5$. This is the average number of heads expected over many repetitions of tossing a fair coin 5 times.

Variance of $X$:

Using the formula for the variance of a binomial distribution (Equation 3):

$Var(X) = npq$

$Var(X) = 5 \times \frac{1}{2} \times \frac{1}{2}$

[Substitute $n=5, p=1/2, q=1/2$]

$Var(X) = \frac{5}{4}$

... (4)

The variance is $\frac{5}{4}$ or $1.25$.

The standard deviation is $\sigma = \sqrt{Var(X)} = \sqrt{\frac{5}{4}} = \frac{\sqrt{5}}{2}$.

Summary: For tossing a fair coin 5 times, the probability of getting exactly 3 heads is $\frac{5}{16}$. The mean number of heads is 2.5, and the variance is 1.25.

Example 2. A multiple-choice test has 10 questions, and each question has 4 options, with only one correct answer. If a student answers each question by guessing randomly, what is the probability of getting exactly 7 correct answers? What are the expected number of correct answers and the variance?

Answer:

This scenario fits the binomial distribution criteria:

Fixed number of trials: $n = 10$ questions.
Two outcomes per trial: Correct (Success) or Incorrect (Failure).
Probability of success (guessing correctly): Since there are 4 options and 1 is correct, $p = \frac{1}{4}$. The probability of failure is $q = 1 - p = 1 - \frac{1}{4} = \frac{3}{4}$. These probabilities are constant for each question.
The trials are independent: Guessing one question does not affect guessing another.

Let $X$ be the number of correct answers. $X$ follows a binomial distribution with $n=10$ and $p=\frac{1}{4}$. So, $X \sim B\left(10, \frac{1}{4}\right)$.

Probability of getting exactly 7 correct answers:

We need to find $P(X=7)$. Using the Binomial PMF formula (Equation 1) with $n=10$, $p=\frac{1}{4}$, $q=\frac{3}{4}$, and $k=7$:

$P(X=k) = \binom{n}{k} p^k q^{n-k}$

[Binomial PMF]

$P(X=7) = \binom{10}{7} \left(\frac{1}{4}\right)^7 \left(\frac{3}{4}\right)^{10-7}$

... (1)

Calculate the binomial coefficient $\binom{10}{7}$:

$\binom{10}{7} = \frac{10!}{7!(10-7)!} = \frac{10!}{7!3!}$

$= \frac{10 \times 9 \times 8 \times \cancel{7!}}{\cancel{7!} \times (3 \times 2 \times 1)} = \frac{10 \times 9 \times 8}{6}$

$= \frac{10 \times \cancel{9}^3 \times \cancel{8}^4}{\cancel{6}^1} = 10 \times 3 \times 4 / 3 = 10 \times 3 \times \cancel{4}^2 / \cancel{6}^1 = 10 \times \cancel{3}^1 \times 4 / \cancel{3}^1 = 10 \times 4 = 120$

$= \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = \frac{720}{6} = 120$

[Calculation of $\binom{10}{7}$]

Now, substitute this back into equation (1):

$P(X=7) = 120 \times \left(\frac{1}{4}\right)^7 \times \left(\frac{3}{4}\right)^3$

$P(X=7) = 120 \times \frac{1^7}{4^7} \times \frac{3^3}{4^3} = 120 \times \frac{1}{4^7} \times \frac{27}{4^3}$

$P(X=7) = 120 \times \frac{27}{4^{10}}$

[Using $a^m \times a^n = a^{m+n}$]

Calculate $4^{10} = (4^5)^2 = (1024)^2 \approx 10^6$. $4^3 = 64$. $4^7 = 16384$. $4^{10} = 1048576$.

$P(X=7) = \frac{120 \times 27}{1048576} = \frac{3240}{1048576}$

Simplify the fraction by dividing numerator and denominator by their greatest common divisor. Both are divisible by 8, 10, 40 etc. Divide by 40:

$3240 \div 40 = 324 \div 4 = 81$

$1048576 \div 40 = 1048576 \div (10 \times 4) = (104857.6) \div 4 = 26214.4$ (Something is wrong with dividing by 40, as the denominator must be integer.)

Let's simplify by dividing by 8 first:

$3240 \div 8 = 405$

$1048576 \div 8 = 131072$

$P(X=7) = \frac{405}{131072}$

... (2)

The probability of getting exactly 7 correct answers is $\frac{405}{131072}$. This is a very small probability, which is expected when guessing most answers.

Expected number of correct answers (Mean):

Using the formula $E(X) = np$ (Equation 2) with $n=10$ and $p=\frac{1}{4}$:

$E(X) = 10 \times \frac{1}{4} = \frac{10}{4} = \frac{5}{2}$

[Substitute $n=10, p=1/4$] ... (3)

The expected number of correct answers by guessing is $2.5$.

Variance of the number of correct answers:

Using the formula $Var(X) = npq$ (Equation 3) with $n=10$, $p=\frac{1}{4}$, and $q=\frac{3}{4}$:

$Var(X) = 10 \times \frac{1}{4} \times \frac{3}{4}$

[Substitute $n=10, p=1/4, q=3/4$]

$Var(X) = \frac{10 \times 1 \times 3}{4 \times 4} = \frac{30}{16}$

Simplify the fraction:

$Var(X) = \frac{\cancel{30}^{15}}{\cancel{16}_{8}} = \frac{15}{8}$

... (4)

The variance is $\frac{15}{8}$ or $1.875$.

Summary: When guessing on a 10-question multiple-choice test with 4 options per question, the probability of getting exactly 7 correct is $\frac{405}{131072}$. The expected number of correct answers is 2.5, and the variance is 1.875.

Poisson Distribution

The Poisson Distribution, named after the French mathematician Siméon Denis Poisson, is a discrete probability distribution that is used to model the probability of a given number of events occurring in a fixed interval of time or space, or in a fixed number of trials, provided these events occur with a known constant mean rate and independently of the time or space since the last event. It is particularly useful for describing the distribution of rare events.

For example, the number of traffic accidents at a specific intersection in a week, the number of typos on a page of a book, the number of customers arriving at a service counter in an hour, or the number of radioactive decays per second are often modeled using a Poisson distribution.

Poisson Process Conditions

A random phenomenon is said to follow a Poisson process if the occurrence of events satisfies the following conditions over a given interval (time, space, etc.):

Events are independent: The number of events occurring in any fixed interval is independent of the number of events occurring in any other disjoint interval. The occurrence of an event does not affect the probability of another event occurring in a separate interval.
Events occur singly: Events occur one at a time. It is not possible for two or more events to occur at exactly the same point in the interval.
Constant average rate: The average rate at which events occur is constant over the interval. This constant average rate is the sole parameter of the Poisson distribution and is denoted by $\lambda$ (lambda). $\lambda$ represents the mean number of events in the specified interval.

The random variable $X$ in a Poisson distribution counts the number of events that occur in a fixed interval. The possible values of $X$ are non-negative integers: $0, 1, 2, 3, \dots$. Theoretically, there is no upper limit to the number of events, although the probability of a very large number of events becomes very small.

A random variable $X$ following a Poisson distribution with parameter $\lambda$ is denoted by $X \sim P(\lambda)$ or $X \sim \text{Poisson}(\lambda)$. The parameter $\lambda$ must be positive ($\lambda > 0$).

Probability Mass Function (PMF) of the Poisson Distribution

The Probability Mass Function (PMF) of a Poisson random variable $X$ with parameter $\lambda$ gives the probability of observing exactly $k$ events in the given interval.

The PMF is given by the formula:

$\mathbf{P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}}$

for $k = 0, 1, 2, \dots$

... (1)

where:

$k$ is the number of events we are interested in (a non-negative integer).
$\lambda$ is the average rate of events in the fixed interval ($\lambda > 0$).
$e$ is the base of the natural logarithm, approximately 2.71828.
$k!$ is the factorial of $k$, defined as $k! = k \times (k-1) \times \dots \times 2 \times 1$ for $k > 0$, and $0! = 1$.

The sum of probabilities for all possible values of $k$ must be 1, i.e., $\sum_{k=0}^{\infty} P(X=k) = \sum_{k=0}^{\infty} \frac{e^{-\lambda} \lambda^k}{k!}$. This sum is equal to $e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^k}{k!}$. The series $\sum_{k=0}^{\infty} \frac{\lambda^k}{k!}$ is the Taylor series expansion of $e^{\lambda}$. So, the sum is $e^{-\lambda} \cdot e^{\lambda} = e^0 = 1$. This confirms that the PMF is valid.

Mean and Variance of the Poisson Distribution

A distinguishing characteristic of the Poisson distribution is that its mean and variance are equal to its parameter $\lambda$.

Mean ($E(X)$ or $\mu$)

The expected number of events in the fixed interval is simply the average rate $\lambda$.

$\mathbf{\mu = E(X) = \lambda}$

... (2)

Note: The derivation of $E(X)=\lambda$ from $E(X) = \sum_{k=0}^\infty k P(X=k) = \sum_{k=0}^\infty k \frac{e^{-\lambda} \lambda^k}{k!}$ involves manipulating infinite series.

Variance ($Var(X)$ or $\sigma^2$)

The variance of a Poisson distributed variable is also equal to $\lambda$.

$\mathbf{\sigma^2 = Var(X) = \lambda}$

... (3)

Note: The derivation of $Var(X)=\lambda$ from $Var(X) = E(X^2) - [E(X)]^2$ is also involved and uses properties of infinite series.

Standard Deviation ($\sigma$)

The standard deviation is the positive square root of the variance:

$\mathbf{\sigma = \sqrt{\lambda}}$

... (4)

Poisson Distribution as a Limiting Case of Binomial Distribution

The Poisson distribution can be viewed as a special or limiting case of the binomial distribution. Consider a binomial distribution $X \sim B(n, p)$ where $n$ is the number of trials and $p$ is the probability of success. If we have a situation where the number of trials $n$ is very large, but the probability of success $p$ in any single trial is very small, such that the product $np$ remains constant and finite, say $np = \lambda$, then the binomial distribution approaches the Poisson distribution with parameter $\lambda$.

Formally, if $n \to \infty$ and $p \to 0$ such that $np \to \lambda$, then for any fixed non-negative integer $k$, the binomial probability $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ converges to the Poisson probability $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$.

This relationship is practically useful because calculating binomial probabilities $\binom{n}{k} p^k (1-p)^{n-k}$ can be computationally intensive for large $n$. If $n$ is large and $p$ is small, we can approximate the binomial probability using the Poisson formula with $\lambda = np$. A common rule of thumb for this approximation to be reasonably good is when $n \ge 50$ and $np \le 5$.

Examples

Example 1. The average number of customers arriving at a bank per minute during peak hours is 3. Assuming that the number of arrivals follows a Poisson distribution, what is the probability that exactly 5 customers arrive in a given minute?

Answer:

Given:

The average number of customers arriving per minute is given as 3. In the context of a Poisson distribution, this average rate is the parameter $\lambda$.

$\lambda = 3$

[Average rate of arrival per minute]

We are asked to find the probability of exactly 5 customers arriving in a given minute.

We need to find $P(X=5)$

[Where $X$ is the number of arrivals in a minute]

To Find:

$P(X=5)$.

Solution:

The number of customer arrivals follows a Poisson distribution with parameter $\lambda=3$. We use the Poisson PMF (Equation 1) with $\lambda=3$ and $k=5$.

$P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$

[Poisson PMF]

$P(X=5) = \frac{e^{-3} 3^5}{5!}$

[Substitute $\lambda=3, k=5$] ... (1)

Now, we calculate the values: $3^5 = 3 \times 3 \times 3 \times 3 \times 3 = 243$. $5! = 5 \times 4 \times 3 \times 2 \times 1 = 120$. $e^{-3} \approx 0.049787$ (You would typically use a calculator or table for $e^{-3}$).

Substitute these values into equation (1):

$P(X=5) \approx \frac{0.049787 \times 243}{120}$

$P(X=5) \approx \frac{12.098001}{120}$

$P(X=5) \approx 0.100817$

... (2)

Rounding to four decimal places, the probability of exactly 5 customers arriving in a given minute is approximately 0.1008.

The mean number of arrivals is $E(X) = \lambda = 3$. The variance of the number of arrivals is $Var(X) = \lambda = 3$.

Example 2. On average, a certain type of fabric has 0.5 defects per square metre. What is the probability that a 2-square metre piece of fabric has at most 1 defect? (Assume a Poisson distribution for the number of defects).

Answer:

Given:

The average rate of defects is 0.5 defects per square metre. We are considering a 2-square metre piece of fabric. The average number of defects for a 2-square metre piece is the rate per unit area multiplied by the area:

$\lambda = (\text{average rate per sq. metre}) \times (\text{area in sq. metres})$

$\lambda = 0.5 \times 2 = 1$

[Average defects in a 2 sq. metre piece] ... (1)

Let $X$ be the number of defects in a 2-square metre piece of fabric. $X$ follows a Poisson distribution with $\lambda = 1$. So, $X \sim P(1)$.

We are asked for the probability that the fabric has at most 1 defect, which means $P(X \le 1)$.

To Find:

$P(X \le 1)$.

Solution:

$P(X \le 1)$ means the probability that $X$ is less than or equal to 1. Since $X$ can only take non-negative integer values (0, 1, 2, ...), $P(X \le 1)$ is the sum of the probabilities of getting exactly 0 defects or exactly 1 defect.

$P(X \le 1) = P(X=0) + P(X=1)$

... (2)

We use the Poisson PMF (Equation 1) $P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}$ with $\lambda=1$.

For $k=0$:

$P(X=0) = \frac{e^{-1} 1^0}{0!} = \frac{e^{-1} \times 1}{1} = e^{-1}$

[Recall $0!=1$ and $a^0=1$]

For $k=1$:

$P(X=1) = \frac{e^{-1} 1^1}{1!} = \frac{e^{-1} \times 1}{1} = e^{-1}$

[Recall $1!=1$ and $a^1=a$]

Substitute these probabilities back into equation (2):

$P(X \le 1) = e^{-1} + e^{-1} = 2e^{-1}$

... (3)

Using the approximation $e^{-1} \approx 0.36788$:

$P(X \le 1) \approx 2 \times 0.36788 = 0.73576$

Rounding to four decimal places, the probability of having at most 1 defect in a 2-square metre piece is approximately 0.7358.

Normal Distribution

The Normal Distribution is arguably the most significant probability distribution in the fields of statistics and probability theory. It is a continuous probability distribution characterized by its symmetrical, bell-shaped curve. Many natural phenomena and measurements (such as heights, weights, blood pressure, measurement errors, test scores) tend to follow a normal distribution. Furthermore, its theoretical importance is underscored by the Central Limit Theorem, which states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the original population distribution.

A normal distribution is completely specified by its mean ($\mu$) and its variance ($\sigma^2$). We denote a normal random variable $X$ with mean $\mu$ and variance $\sigma^2$ as $X \sim N(\mu, \sigma^2)$. The standard deviation is $\sigma = \sqrt{\sigma^2}$.

Properties of the Normal Distribution

The normal distribution has several distinctive properties:

Shape and Symmetry: The graph of the normal distribution is a symmetric, unimodal (single peak), and bell-shaped curve. It is perfectly symmetrical about its mean $\mu$.
Central Tendency: For a normal distribution, the mean ($\mu$), median, and mode are all equal and are located at the center of the distribution.
Asymptotic Nature: The normal curve extends infinitely in both directions along the horizontal axis. It approaches the x-axis but never touches it.
Total Area Under the Curve: The total area under the probability density curve of a normal distribution is always equal to 1. This reflects that the sum of probabilities for all possible outcomes is 1.
Parameters: The shape and position of the normal curve are determined solely by its two parameters: the mean ($\mu$) and the standard deviation ($\sigma$).
- The mean ($\mu$) determines the location of the center of the distribution on the horizontal axis.
- The standard deviation ($\sigma$) determines the spread or dispersion of the distribution. A larger $\sigma$ leads to a wider, flatter curve, indicating more variability. A smaller $\sigma$ leads to a narrower, taller curve, indicating less variability.
Inflection Points: The points where the curve changes its curvature (from concave down to concave up) are called inflection points. For a normal distribution, these points occur exactly at one standard deviation away from the mean, i.e., at $x = \mu - \sigma$ and $x = \mu + \sigma$.
Empirical Rule (68-95-99.7 Rule): This rule provides a quick estimate of the proportion of data that falls within certain standard deviation ranges from the mean in a normal distribution.
- Approximately 68% of the data falls within one standard deviation of the mean: $P(\mu - \sigma \le X \le \mu + \sigma) \approx 0.68$.
- Approximately 95% of the data falls within two standard deviations of the mean: $P(\mu - 2\sigma \le X \le \mu + 2\sigma) \approx 0.95$.
- Approximately 99.7% of the data falls within three standard deviations of the mean: $P(\mu - 3\sigma \le X \le \mu + 3\sigma) \approx 0.997$.
This rule highlights how concentrated the data is around the mean in a normal distribution.

Probability Density Function (PDF)

As the normal distribution is continuous, its probability distribution is described by a Probability Density Function (PDF), not a PMF. The PDF of a normal random variable $X$ with mean $\mu$ and standard deviation $\sigma$ is given by the formula:

$\mathbf{f(x; \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}}$

for $-\infty < x < \infty$

... (1)

where:

$x$ is any real number, representing a possible value of the random variable.
$\mu$ is the mean of the distribution.
$\sigma$ is the standard deviation of the distribution ($\sigma > 0$).
$\pi \approx 3.14159$ and $e \approx 2.71828$ are mathematical constants.

The height of the curve at any point $x$ is given by $f(x)$. The probability of $X$ falling within an interval $[a, b]$ is the area under the curve between $a$ and $b$, calculated by the definite integral:

$P(a \le X \le b) = \int\limits_a^b f(x; \mu, \sigma) dx = \int\limits_a^b \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2} dx$

... (2)

However, this integral does not have a simple closed-form solution and must be evaluated using numerical methods or by converting to the standard normal distribution.

Standard Normal Distribution

Since the integral of the normal PDF is difficult to compute directly for arbitrary $\mu$ and $\sigma$, probabilities for normal distributions are typically found by converting the problem to an equivalent one involving the Standard Normal Distribution.

The standard normal distribution is a special case of the normal distribution where the mean is 0 and the standard deviation is 1.

A random variable following the standard normal distribution is usually denoted by $Z$. $Z \sim N(\mu=0, \sigma^2=1)$.

The PDF of the standard normal distribution is obtained by setting $\mu=0$ and $\sigma=1$ in the general PDF formula:

$\mathbf{f(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}}$

for $-\infty < z < \infty$

... (3)

Probabilities for the standard normal distribution, $P(Z \le z)$, have been extensively calculated and are available in standard normal tables (often called Z-tables). These tables typically provide the cumulative probability, i.e., the area under the standard normal curve to the left of a given Z-value.

Z-Score (Standardization)

Any value $x$ from a general normal distribution $X \sim N(\mu, \sigma^2)$ can be converted into a corresponding value $z$ from the standard normal distribution $Z \sim N(0, 1)$ using the Z-score formula:

$\mathbf{Z = \frac{X - \mu}{\sigma}}$

... (4)

The Z-score tells us how many standard deviations a particular value $x$ is away from the mean $\mu$. A positive Z-score means $x$ is above the mean, a negative Z-score means $x$ is below the mean, and a Z-score of 0 means $x$ is equal to the mean.

Using the Z-score transformation, probabilities for a general normal distribution can be found using the Z-tables:

$P(a \le X \le b) = P\left(\frac{a - \mu}{\sigma} \le \frac{X - \mu}{\sigma} \le \frac{b - \mu}{\sigma}\right) = P\left(\frac{a - \mu}{\sigma} \le Z \le \frac{b - \mu}{\sigma}\right)$

... (5)

Let $z_1 = \frac{a - \mu}{\sigma}$ and $z_2 = \frac{b - \mu}{\sigma}$. Then $P(a \le X \le b) = P(z_1 \le Z \le z_2)$. This probability can be calculated using Z-tables as $P(Z \le z_2) - P(Z \le z_1)$.

Due to the symmetry of the standard normal distribution around 0:

$P(Z > z) = 1 - P(Z \le z)$
$P(Z < -z) = P(Z > z) = 1 - P(Z \le z)$
$P(-z \le Z \le z) = P(Z \le z) - P(Z \le -z) = P(Z \le z) - (1 - P(Z \le z)) = 2 P(Z \le z) - 1$

Examples

Example 1. The heights of adult males in a city are normally distributed with a mean of 165 cm and a standard deviation of 5 cm. Find the probability that a randomly selected adult male is taller than 170 cm.

Answer:

Given:

The heights of adult males ($X$) follow a normal distribution with mean $\mu = 165$ cm and standard deviation $\sigma = 5$ cm.

$X \sim N(\mu = 165, \sigma^2 = 5^2)$

[Normal distribution parameters]

We want to find the probability that a randomly selected male is taller than 170 cm.

We need to find $P(X > 170)$

To Find:

$P(X > 170)$.

Solution:

To find this probability, we convert the value $x = 170$ to a Z-score using the formula $Z = \frac{X - \mu}{\sigma}$.

$Z = \frac{170 - 165}{5}$

[Substitute $X=170, \mu=165, \sigma=5$]

$Z = \frac{5}{5} = 1$

... (1)

So, the probability $P(X > 170)$ is equivalent to the probability $P(Z > 1)$ in the standard normal distribution.

$P(X > 170) = P(Z > 1)$

... (2)

Standard Z-tables typically provide the cumulative probability $P(Z \le z)$, which is the area to the left of $z$. To find the area to the right of $z=1$, we use the property that the total area under the curve is 1: $P(Z > z) = 1 - P(Z \le z)$.

$P(Z > 1) = 1 - P(Z \le 1)$

[Using complement rule]

From the standard normal distribution table (Z-table), the value for $P(Z \le 1.00)$ is approximately 0.8413.

$P(Z > 1) \approx 1 - 0.8413$

[From Z-table]

$P(Z > 1) \approx 0.1587$

... (3)

Thus, the probability that a randomly selected adult male is taller than 170 cm is approximately 0.1587.

Example 2. The marks in a test are normally distributed with a mean of 70 and a standard deviation of 10. Find the probability that a randomly selected student scored between 60 and 85 marks.

Answer:

Given:

The test marks ($X$) follow a normal distribution with mean $\mu = 70$ and standard deviation $\sigma = 10$.

$X \sim N(\mu = 70, \sigma^2 = 10^2)$

We want to find the probability that a student scored between 60 and 85 marks.

We need to find $P(60 \le X \le 85)$

To Find:

$P(60 \le X \le 85)$.

Solution:

We need to convert the $x$-values 60 and 85 into Z-scores using the formula $Z = \frac{X - \mu}{\sigma}$.

For $x_1 = 60$:

$z_1 = \frac{60 - 70}{10} = \frac{-10}{10} = -1$

... (1)

For $x_2 = 85$:

$z_2 = \frac{85 - 70}{10} = \frac{15}{10} = 1.5$

... (2)

So, the probability $P(60 \le X \le 85)$ is equivalent to the probability $P(-1 \le Z \le 1.5)$ in the standard normal distribution.

$P(60 \le X \le 85) = P(-1 \le Z \le 1.5)$

... (3)

To find $P(-1 \le Z \le 1.5)$, we use the property $P(a \le Z \le b) = P(Z \le b) - P(Z \le a)$.

$P(-1 \le Z \le 1.5) = P(Z \le 1.5) - P(Z \le -1)$

... (4)

From the standard Z-table: $P(Z \le 1.5) \approx 0.9332$. $P(Z \le -1)$ is not usually directly in tables for negative values, but we can use symmetry: $P(Z \le -z) = P(Z \ge z) = 1 - P(Z \le z)$. So, $P(Z \le -1) = 1 - P(Z \le 1)$. From Z-table, $P(Z \le 1) \approx 0.8413$. $P(Z \le -1) \approx 1 - 0.8413 = 0.1587$.

Substitute these values into equation (4):

$P(-1 \le Z \le 1.5) \approx 0.9332 - 0.1587$

[Substitute cumulative probabilities]

$P(-1 \le Z \le 1.5) \approx 0.7745$

... (5)

The probability that a randomly selected student scored between 60 and 85 marks is approximately 0.7745.