Classwise Concept with Examples
6th	7th	8th	9th	10th	11th	12th

Class 11th Chapters
1. Sets	2. Relations and Functions	3. Trigonometric Functions
4. Principle of Mathematical Induction	5. Complex Numbers and Quadratic Equations	6. Linear Inequalities
7. Permutations and Combinations	8. Binomial Theorem	9. Sequences and Series
10. Straight Lines	11. Conic Sections	12. Introduction to Three Dimensional Geometry
13. Limits and Derivatives	14. Mathematical Reasoning	15. Statistics
16. Probability

Content On This Page
Describing the Dispersion	Different Methods of Measuring Dispersion	Range
Mean Deviation	Variance and Standard Deviation	Coefficient of Variation

Chapter 15 Statistics (Concepts)

Welcome to this advanced chapter on Statistics, where we significantly deepen our analysis of data distributions, moving beyond the central point summaries learned in Class 10. While measures of central tendency – the Mean, Median, and Mode – provide valuable information about the 'typical' value or the center of a dataset, they paint an incomplete picture. Imagine two different batsmen in cricket having the exact same average score; one might consistently score around the average, while the other might score very high in some innings and very low in others. Central tendency alone doesn't capture this difference in consistency or variability. This chapter introduces the crucial concept of Measures of Dispersion, which quantify the extent to which data points are spread out, scattered, or vary around a central value. Understanding dispersion is essential for assessing consistency, reliability, and the overall nature of a distribution.

We begin our exploration with the simplest measure of spread: the Range. Calculated merely as the difference between the maximum and minimum values observed in the dataset (Range = Maximum Value - Minimum Value), it provides a quick, though often crude, indication of the total spread. While easy to compute, the range is highly susceptible to the influence of extreme values (outliers) and ignores the distribution of data points between the extremes, making it a limited measure in many contexts.

To get a more representative measure of variability that considers every data point, we introduce the concept of Mean Deviation. This measures the average distance of the observations from a central point, typically either the mean or the median. Crucially, we use the absolute values of the deviations to ensure that positive and negative deviations don't cancel each other out, reflecting the total magnitude of variation. The formulas are:

Mean Deviation about the Mean ($\bar{x}$): $\frac{\sum\limits_{i=1}^{n} |x_i - \bar{x}|}{n}$ (for ungrouped data) or $\frac{\sum\limits_{i} f_i |x_i - \bar{x}|}{\sum\limits_{i} f_i}$ (for grouped data).
Mean Deviation about the Median (M): $\frac{\sum\limits_{i=1}^{n} |x_i - M|}{n}$ (for ungrouped data) or $\frac{\sum\limits_{i} f_i |x_i - M|}{\sum\limits_{i} f_i}$ (for grouped data).

While intuitive, the absolute value function can be mathematically inconvenient for further analysis.

This leads us to the most important and widely used measures of dispersion: Variance and Standard Deviation. Instead of using absolute values, variance overcomes the issue of deviation signs by squaring them. The Variance, denoted by $\sigma^2$ (sigma squared), is defined as the average of the squared deviations of each observation from the arithmetic mean ($\bar{x}$). Squaring not only eliminates negative signs but also gives greater weight to observations that are further away from the mean. The formulas are:

For ungrouped data: $\sigma^2 = \frac{\sum\limits_{i=1}^{n} (x_i - \bar{x})^2}{n}$
For grouped data: $\sigma^2 = \frac{\sum\limits_{i} f_i (x_i - \bar{x})^2}{\sum\limits_{i} f_i}$

While variance provides a good measure of spread, its units are the square of the original data units (e.g., $cm^2$ if data is in $cm$), making direct interpretation difficult.

To address the unit issue, we define the Standard Deviation, universally denoted by $\sigma$ (sigma), simply as the positive square root of the variance: $$ \mathbf{\sigma = \sqrt{\text{Variance}}} = \sqrt{\frac{\sum\limits_{i} (x_i - \bar{x})^2}{N}} $$ (using N for total frequency in grouped/ungrouped cases). The standard deviation is expressed in the same units as the original data, making it much more interpretable as a typical deviation from the mean. It is the most common and statistically significant measure of dispersion. Shortcut formulas, often involving $\sum\limits x_i^2$ or $\sum\limits f_i x_i^2$, are frequently derived and used to simplify the computation of variance and standard deviation, especially for large datasets.

We will practice calculating these measures – Range, Mean Deviation, Variance, and Standard Deviation – for both ungrouped (raw list of observations) and grouped frequency distributions. For grouped data, remember calculations typically involve using the class marks ($x_i$) as representative values for each interval, weighted by their frequencies ($f_i$) (potentially determined using tally marks like $||||$ or $\bcancel{||||}$ during data organization).

Finally, to compare the variability or consistency of two or more datasets that might have different units or vastly different means, we introduce a relative measure of dispersion called the Coefficient of Variation (CV). It expresses the standard deviation as a percentage of the mean: $$ \mathbf{CV = \left( \frac{\sigma}{\bar{x}} \right) \times 100} $$ Since the CV is a unitless ratio, it allows for meaningful comparison of dispersion across different datasets. A dataset with a lower CV is considered more consistent or stable (less variable relative to its mean) than a dataset with a higher CV. This chapter equips you with a comprehensive toolkit to not only describe the center of your data but also to quantify its spread, leading to a much more complete understanding of statistical distributions.

Describing the Dispersion

In statistics, when we analyze a set of data, our first step is often to find a "typical" or "central" value that represents the entire dataset. This is done using measures of central tendency, such as the mean, median, and mode. However, a measure of center alone provides an incomplete and sometimes misleading picture of the data.

To fully understand a dataset, we also need to know how the data values are spread out. Dispersion is the statistical term for the degree to which data points in a distribution are scattered or spread out. A measure of dispersion is a number that quantifies this spread, telling us whether the data is tightly clustered together or widely scattered.

Why is Measuring Dispersion So Important?

Imagine a doctor is comparing two different treatments for blood pressure. Both treatments result in an average (mean) blood pressure of 120 mmHg. Based on the mean alone, the treatments seem equally effective. But what if the individual results were:

Treatment A: 118, 120, 121, 122, 119
Treatment B: 90, 150, 120, 95, 145

While both have a mean of 120, Treatment A is clearly more reliable and consistent. Its results are tightly clustered around the mean. Treatment B is highly unpredictable, with some patients experiencing dangerously low pressure and others dangerously high pressure. Measuring dispersion allows us to see this crucial difference.

Key reasons to measure dispersion include:

To Judge Reliability: A small dispersion indicates that the central value is a good and reliable representative of the data. A large dispersion suggests that the central value is less representative.
To Compare Variability: We can compare the consistency of two or more datasets. For example, which of two cricket batsmen is more consistent? The one whose scores have a lower dispersion. Which of two stocks is riskier? The one whose returns have a higher dispersion.
To Control Quality: In manufacturing, the goal is to produce items that are as identical as possible. Measuring the dispersion of a product's dimensions helps to monitor and control the quality of the production process.
To Facilitate Further Analysis: Many advanced statistical techniques rely on measures of dispersion to work correctly.

An Illustrative Example

Let's formalize the concept with a simple example. Consider the performance of two students, Anjali and Bimal, in five math tests. Their scores are:

Anjali's Scores: 84, 85, 85, 86, 85

Bimal's Scores: 60, 100, 85, 70, 90

First, let's calculate the mean score for each student.

Mean for Anjali = $\frac{84 + 85 + 85 + 86 + 85}{5} = \frac{425}{5} = 85$

Mean for Bimal = $\frac{60 + 100 + 85 + 70 + 90}{5} = \frac{425}{5} = 85$

Two number lines, both with a mean of 85. The top line shows Anjali's scores clustered tightly around 85. The bottom line shows Bimal's scores spread out widely from 60 to 100.

Observation:

Both students have the exact same average score of 85. If we only looked at the mean, we would think their performance is identical. However, by looking at the raw scores, it is obvious that Anjali is an extremely consistent student, with all her scores clustered tightly around the mean. Bimal, on the other hand, is highly inconsistent; his scores are spread out over a wide range.

This difference in consistency or variability is what measures of dispersion are designed to capture numerically. A simple measure like the range shows this clearly: Anjali's range is $86 - 84 = 2$, while Bimal's range is $100 - 60 = 40$. The larger range for Bimal immediately tells us that his scores are more dispersed.

In the following sections, we will explore more robust and widely used measures for quantifying dispersion, such as Mean Deviation and Standard Deviation.

Different Methods of Measuring Dispersion

To numerically capture the spread or variability within a dataset, statisticians use several different methods. Each method provides a single number that summarizes the dispersion, but they do so in different ways and are useful in different contexts. These methods can be grouped into two main categories: absolute measures and relative measures.

Absolute Measures of Dispersion

An absolute measure of dispersion describes the variability of a dataset using the same units as the original data. For example, if we are measuring the heights of students in centimeters (cm), an absolute measure of dispersion like the standard deviation will also be expressed in cm. This makes them easy to interpret in the context of the original data.

However, absolute measures are not suitable for comparing the variability of two datasets with different units (e.g., comparing the variability of students' heights in cm with their weights in kg) or with vastly different average values (e.g., comparing the variability in salaries at a small startup vs. a large corporation).

Common Absolute Measures:

Range: The simplest and quickest measure of dispersion. It is the difference between the maximum and minimum values in the dataset.
Formula: Range = Maximum Value – Minimum Value

Usefulness: Provides a quick, rough estimate of the total spread. It is highly sensitive to outliers (extreme values).
Quartile Deviation: A measure that focuses on the spread of the middle 50% of the data, making it resistant to outliers. It is half of the interquartile range (IQR).
Formula: Quartile Deviation = $\frac{Q_3 - Q_1}{2}$, where $Q_3$ is the third quartile and $Q_1$ is the first quartile.

Usefulness: Good for skewed distributions or data with extreme values.
Mean Deviation: This measure calculates the average distance of each data point from a central value (usually the mean or median). It considers every value in the dataset.
Formula: Mean Deviation = $\frac{\sum\limits |x_i - \text{Mean}|}{n}$

Usefulness: More comprehensive than the range as it uses all data points.
Variance ($\sigma^2$): One of the most important measures of dispersion. It is the average of the squared distances of each data point from the mean. Squaring the differences ensures that all values are positive and gives greater weight to points that are further from the mean.
Formula: Variance ($\sigma^2$) = $\frac{\sum\limits (x_i - \text{Mean})^2}{n}$

Usefulness: Crucial for many advanced statistical theories and tests. Its units are the square of the original data's units (e.g., cm²), making it hard to interpret directly.
Standard Deviation ($\sigma$): This is the most widely used and important measure of dispersion. It is simply the positive square root of the variance.
Formula: Standard Deviation ($\sigma$) = $\sqrt{\text{Variance}} = \sqrt{\frac{\sum\limits (x_i - \text{Mean})^2}{n}}$

Usefulness: By taking the square root, the standard deviation is expressed in the same units as the original data, making it much more interpretable than the variance.

Relative Measures of Dispersion

A relative measure of dispersion is a unit-free number, often expressed as a ratio or a percentage. It is designed to compare the variability of two or more datasets, especially when their units or average values are different.

For example, is a variation of 5 cm in the height of men more or less significant than a variation of 5 kg in their weight? A relative measure helps answer such questions by standardizing the dispersion.

Common Relative Measures:

Coefficient of Range: The range expressed as a fraction of the sum of the maximum and minimum values.
Formula: Coefficient of Range = $\frac{\text{Max} - \text{Min}}{\text{Max} + \text{Min}}$
Coefficient of Quartile Deviation: The quartile deviation expressed as a fraction of the average of the quartiles.
Formula: Coefficient of QD = $\frac{Q_3 - Q_1}{Q_3 + Q_1}$
Coefficient of Variation (CV): This is the most common and important relative measure of dispersion. It expresses the standard deviation as a percentage of the mean.

CV = $\frac{\text{Standard Deviation}}{\text{Mean}} \times 100\%$

A dataset with a higher CV is considered to have greater relative variability or be less consistent than a dataset with a lower CV. It is the standard tool for comparing consistency across different groups.

In the following sections, we will focus on the calculation and application of the most important measures: Range, Mean Deviation, Variance, and Standard Deviation.

Range

The Range is the most straightforward and intuitive measure of dispersion. It provides a quick snapshot of the total spread of a dataset by focusing only on its most extreme values.

Definition of Range

The Range is simply the difference between the highest value (maximum) and the lowest value (minimum) in a dataset.

If $X_{\text{max}}$ is the maximum value and $X_{\text{min}}$ is the minimum value, then the formula is:

Range $= X_{\text{max}} - X_{\text{min}}$

... (i)

The range is an absolute measure of dispersion, meaning its units are the same as the units of the data itself (e.g., if the data is in kilograms, the range is in kilograms).

Calculation of Range

For Ungrouped Data: The process is simple. First, scan the data to find the largest and smallest numbers. Then, subtract the smallest from the largest.
For Grouped Data: For data presented in a frequency distribution with class intervals, the range is calculated as the difference between the upper boundary of the highest class and the lower boundary of the lowest class.

Coefficient of Range

To compare the spread of two datasets with very different scales (e.g., salaries in thousands vs. pocket money in hundreds), we use the Coefficient of Range. This is a relative measure that expresses the range as a fraction of the sum of the extreme values, making it a unit-free number.

Coefficient of Range $= \frac{X_{\text{max}} - X_{\text{min}}}{X_{\text{max}} + X_{\text{min}}}$

... (ii)

Advantages and Disadvantages of Range

Advantages (Merits):

Easy to Calculate: It is the simplest measure of dispersion to compute.
Easy to Understand: Its meaning is very clear and intuitive.
Quick: It provides a very fast, though rough, idea of the data's spread.

Disadvantages (Demerits):

Affected by Outliers: The range's biggest weakness is its extreme sensitivity to outliers. A single unusually high or low value can dramatically inflate the range, giving a misleading impression of the overall variability.
Ignores Most Data: It is calculated using only two data points (the maximum and minimum) and completely ignores the distribution and clustering of all the data points in between.
Not Suitable for Further Analysis: Because it is not based on all observations, it is generally not used in more advanced statistical calculations.
Cannot be used for Open-Ended Classes: If a dataset has an open-ended class (e.g., "over 100"), the maximum value is unknown, and the range cannot be calculated.

Due to these significant limitations, the range is typically used for a quick preliminary look at the data or in specific applications like statistical quality control, but it is not considered a robust measure of dispersion.

Example 1. Find the range and the coefficient of range for the following dataset of daily temperatures (°C): 15, 25, 18, 32, 40, 28, 12, 35.

Answer:

Given:

The data is: 15, 25, 18, 32, 40, 28, 12, 35.

Solution:

Step 1: Identify the maximum and minimum values.

By inspecting the data, we find:

Maximum Value ($X_{\text{max}}$) = 40 °C

Minimum Value ($X_{\text{min}}$) = 12 °C

Step 2: Calculate the Range.

Using the formula Range $= X_{\text{max}} - X_{\text{min}}$:

Range = 40 – 12 = 28 °C

Step 3: Calculate the Coefficient of Range.

Using the formula Coefficient of Range $= \frac{X_{\text{max}} - X_{\text{min}}}{X_{\text{max}} + X_{\text{min}}}$:

Coefficient of Range = $\frac{40 - 12}{40 + 12} = \frac{28}{52}$

Simplifying the fraction by dividing the numerator and denominator by 4:

Coefficient of Range = $\frac{7}{13}$

(As a decimal, this is approximately 0.538).

The final answer is: Range = 28 °C, Coefficient of Range = $\frac{7}{13}$.

Mean Deviation

In statistics, after understanding the central tendency (like mean, median, mode) of a dataset, the next important aspect is to understand its variability or dispersion. Dispersion measures the extent to which the values in a distribution are spread out or scattered from the average. A simple measure is the Range, which is the difference between the maximum and minimum values. However, the range is a crude measure as it only depends on two extreme values and ignores the distribution of the rest of the observations.

To overcome this limitation, we use measures that involve all the data points. One such measure is the Mean Deviation. It provides a more robust understanding of the spread by calculating the average distance of each observation from a central value.

Definition and Concept of Mean Deviation

The Mean Deviation (MD) is defined as the arithmetic mean of the absolute deviations of the observations from a suitable measure of central tendency. This central value can be the mean, median, or mode, but it is most commonly calculated with respect to the mean or the median.

Why Absolute Deviations?

A deviation is the difference between an observation and the central value (e.g., $x_i - \overline{x}$). Some of these deviations will be positive (for values greater than the mean), and some will be negative (for values less than the mean). A key property of the arithmetic mean is that the sum of these deviations is always zero, i.e., $\sum (x_i - \overline{x}) = 0$.

For example, for data {2, 4, 9}, the mean is $\overline{x} = 5$. The deviations are $(2-5) = -3$, $(4-5) = -1$, and $(9-5) = 4$. The sum is $-3 - 1 + 4 = 0$.

Because the sum is always zero, the average deviation would also be zero, which is not a useful measure of spread. To solve this, we take the absolute value of each deviation, i.e., $|x_i - \overline{x}|$. This makes all deviations positive, and their average gives a meaningful value representing the average distance of the data points from the center.

Mean Deviation for Ungrouped Data

Ungrouped data refers to data that is given as individual data points.

1. Mean Deviation about the Mean ($\text{MD}_{\overline{x}}$)

This measures the average absolute distance of each data point from the arithmetic mean of the dataset.

Formula and Derivation

Let the given data consist of $n$ distinct observations $x_1, x_2, ..., x_n$.

Step 1: Calculate the mean of the data.

$\overline{x} = \frac{\sum\limits_{i=1}^{n} x_i}{n}$

Step 2: Find the deviation of each observation $x_i$ from the mean $\overline{x}$, which is $(x_i - \overline{x})$.

Step 3: Find the absolute value of these deviations, which is $|x_i - \overline{x}|$.

Step 4: Find the arithmetic mean of these absolute deviations. This is the Mean Deviation about the Mean.

$\text{MD}_{\overline{x}} = \frac{\sum\limits_{i=1}^{n} |x_i - \overline{x}|}{n}$

... (i)

Example 1. Find the mean deviation about the mean for the data: 6, 7, 10, 12, 13, 4, 8, 12.

Answer:

Given:

Data observations: $x_i$ = 6, 7, 10, 12, 13, 4, 8, 12.

Number of observations, $n=8$.

To Find:

Mean Deviation about the Mean ($\text{MD}_{\overline{x}}$).

Solution:

Step 1: Calculate the mean ($\overline{x}$).

Sum of observations = $6 + 7 + 10 + 12 + 13 + 4 + 8 + 12 = 72$.

Mean, $\overline{x} = \frac{\sum x_i}{n} = \frac{72}{8} = 9$.

Step 2: Calculate the absolute deviations from the mean, $|x_i - 9|$.

We create a table for clarity:

$x_i$	$\|x_i - \overline{x}\| = \|x_i - 9\|$
4	$\|4 - 9\| = 5$
6	$\|6 - 9\| = 3$
7	$\|7 - 9\| = 2$
8	$\|8 - 9\| = 1$
10	$\|10 - 9\| = 1$
12	$\|12 - 9\| = 3$
12	$\|12 - 9\| = 3$
13	$\|13 - 9\| = 4$
Total	$\sum\limits_{i=1}^{8} \|x_i - \overline{x}\| = 22$

Step 3: Calculate the mean deviation about the mean.

Using the formula (i):

$\text{MD}_{\overline{x}} = \frac{\sum\limits_{i=1}^{8} |x_i - \overline{x}|}{n} = \frac{22}{8} = 2.75$.

Thus, the mean deviation about the mean is 2.75.

2. Mean Deviation about the Median ($\text{MD}_M$)

This measures the average absolute distance of each data point from the median of the dataset. An important property is that the mean deviation is minimum when calculated from the median.

Formula and Derivation

Let the given data consist of $n$ distinct observations $x_1, x_2, ..., x_n$.

Step 1: Arrange the data in ascending order.

Step 2: Calculate the median ($M$) of the data.

$M = \begin{cases} \left(\frac{n+1}{2}\right)^{th} \text{observation} & , & \text{if } n \text{ is odd} \\ \frac{\left(\frac{n}{2}\right)^{th} \text{obs} + \left(\frac{n}{2}+1\right)^{th} \text{obs}}{2} & , & \text{if } n \text{ is even} \end{cases}$

Step 3: Find the absolute value of the deviations from the median, which is $|x_i - M|$.

Step 4: Find the arithmetic mean of these absolute deviations.

$\text{MD}_M = \frac{\sum\limits_{i=1}^{n} |x_i - M|}{n}$

... (ii)

Example 2. Find the mean deviation about the median for the data: 3, 9, 5, 3, 12, 10, 18, 4, 7, 19, 21.

Answer:

Given:

Data observations: $x_i$ = 3, 9, 5, 3, 12, 10, 18, 4, 7, 19, 21.

Number of observations, $n=11$.

To Find:

Mean Deviation about the Median ($\text{MD}_M$).

Solution:

Step 1: Arrange the data in ascending order.

3, 3, 4, 5, 7, 9, 10, 12, 18, 19, 21.

Step 2: Calculate the median ($M$).

Since $n = 11$ (odd), the median is the $\left(\frac{11+1}{2}\right)^{th}$ term, which is the 6th term.

Median, $M = 9$.

Step 3: Calculate the absolute deviations from the median, $|x_i - 9|$.

$x_i$	$\|x_i - M\| = \|x_i - 9\|$
3	$\|3 - 9\| = 6$
3	$\|3 - 9\| = 6$
4	$\|4 - 9\| = 5$
5	$\|5 - 9\| = 4$
7	$\|7 - 9\| = 2$
9	$\|9 - 9\| = 0$
10	$\|10 - 9\| = 1$
12	$\|12 - 9\| = 3$
18	$\|18 - 9\| = 9$
19	$\|19 - 9\| = 10$
21	$\|21 - 9\| = 12$
Total	$\sum\limits \|x_i - M\| = 58$

Step 4: Calculate the mean deviation about the median.

Using the formula (ii):

$\text{MD}_{M} = \frac{\sum\limits_{i=1}^{11} |x_i - M|}{n} = \frac{58}{11} \approx 5.27$.

Thus, the mean deviation about the median is approximately 5.27.

Mean Deviation for Grouped Data

Grouped data is data that has been organized into a frequency distribution.

1. Discrete Frequency Distribution

In this format, each observation $x_i$ has a corresponding frequency $f_i$.

(a) Mean Deviation about the Mean

The formula is an extension of the ungrouped data formula, where each absolute deviation is weighted by its frequency.

$\text{MD}_{\overline{x}} = \frac{\sum\limits_{i=1}^{k} f_i |x_i - \overline{x}|}{\sum\limits_{i=1}^{k} f_i} = \frac{\sum f_i |x_i - \overline{x}|}{N}$

... (iii)

where $k$ is the number of distinct observations, $N = \sum f_i$ is the total frequency, and the mean is $\overline{x} = \frac{\sum f_i x_i}{N}$.

Example 3. Find the mean deviation about the mean for the following data:

$x_i$	2	5	6	8	10	12
$f_i$	2	8	10	7	8	5

Answer:

Solution:

We first need to calculate the mean $\overline{x}$. We can do this in a tabular format, which also helps in calculating the mean deviation.

$x_i$	$f_i$	$f_i x_i$	$\|x_i - \overline{x}\| = \|x_i - 7.5\|$	$f_i \|x_i - 7.5\|$
2	2	4	$\|2-7.5\|=5.5$	$2 \times 5.5 = 11.0$
5	8	40	$\|5-7.5\|=2.5$	$8 \times 2.5 = 20.0$
6	10	60	$\|6-7.5\|=1.5$	$10 \times 1.5 = 15.0$
8	7	56	$\|8-7.5\|=0.5$	$7 \times 0.5 = 3.5$
10	8	80	$\|10-7.5\|=2.5$	$8 \times 2.5 = 20.0$
12	5	60	$\|12-7.5\|=4.5$	$5 \times 4.5 = 22.5$
Total	$N=40$	$\sum f_i x_i = 300$		$\sum f_i\|x_i - \overline{x}\|=92.0$

Step 1: Calculate the mean.

$\overline{x} = \frac{\sum f_i x_i}{N} = \frac{300}{40} = 7.5$.

Step 2: Calculate $\sum f_i|x_i - \overline{x}|$.

From the table, this sum is 92.0.

Step 3: Calculate the mean deviation.

$\text{MD}_{\overline{x}} = \frac{\sum f_i |x_i - \overline{x}|}{N} = \frac{92.0}{40} = 2.3$.

The mean deviation about the mean is 2.3.

(b) Mean Deviation about the Median

The formula is similar, using the median as the central value.

$\text{MD}_M = \frac{\sum\limits_{i=1}^{k} f_i |x_i - M|}{N}$

... (iv)

To find the median for discrete data, we first find the cumulative frequency (c.f.). The median is the observation whose cumulative frequency is just greater than or equal to $\frac{N}{2}$.

Example 4. Find the mean deviation about the median for the data in Example 3.

Answer:

Solution:

We first need to find the median. For this, we calculate the cumulative frequency (c.f.).

$x_i$	$f_i$	c.f.	$\|x_i - M\| = \|x_i - 8\|$	$f_i \|x_i - 8\|$
2	2	2	$\|2-8\|=6$	$2 \times 6 = 12$
5	8	10	$\|5-8\|=3$	$8 \times 3 = 24$
6	10	20	$\|6-8\|=2$	$10 \times 2 = 20$
8	7	27	$\|8-8\|=0$	$7 \times 0 = 0$
10	8	35	$\|10-8\|=2$	$8 \times 2 = 16$
12	5	40	$\|12-8\|=4$	$5 \times 4 = 20$
Total	$N=40$			$\sum f_i\|x_i - M\|=92$

Step 1: Find the median.

Here, $N=40$. We look for $\frac{N}{2} = \frac{40}{2} = 20$.

The cumulative frequency just equal to 20 corresponds to the observation $x_i = 6$. The cumulative frequency for the next observation (8) is 27, which corresponds to observations from 21st to 27th. Since $N$ is even, the median is the average of the 20th and 21st observations.

The 20th observation is 6.

The 21st observation is 8.

Median, $M = \frac{6+8}{2} = 7$.

Alternate Median Calculation for this specific problem:

Let's re-calculate using the value $M=7$.

$x_i$	$f_i$	$\|x_i - M\| = \|x_i - 7\|$	$f_i \|x_i - 7\|$
2	2	5	10
5	8	2	16
6	10	1	10
8	7	1	7
10	8	3	24
12	5	5	25
Total	$N=40$		$\sum f_i\|x_i - M\|=92$

Step 2: Calculate $\sum f_i|x_i - M|$.

From the table, the sum is 92.

Step 3: Calculate the mean deviation.

$\text{MD}_{M} = \frac{\sum f_i |x_i - M|}{N} = \frac{92}{40} = 2.3$.

In this particular case, the mean deviation about the mean and median are the same. This is not always true.

2. Continuous Frequency Distribution

In this format, data is given in class intervals. We use the mid-point (or class mark) of each interval as the representative value $x_i$ for that class.

Mid-point $x_i = \frac{\text{Lower limit} + \text{Upper limit}}{2}$.

(a) Mean Deviation about the Mean

The formula is the same as for the discrete distribution, but $x_i$ are now the mid-points of the classes.

$\text{MD}_{\overline{x}} = \frac{\sum f_i |x_i - \overline{x}|}{N}$

... (v)

where $\overline{x} = \frac{\sum f_i x_i}{N}$.

Example 5. Calculate the mean deviation about the mean for the following data:

Marks obtained	Number of students
10 - 20	2
20 - 30	3
30 - 40	8
40 - 50	14
50 - 60	8
60 - 70	3
70 - 80	2

Answer:

Solution:

We construct a table to calculate the necessary values.

Marks	$f_i$	Mid-point ($x_i$)	$f_i x_i$	$\|x_i - \overline{x}\| = \|x_i - 45\|$	$f_i \|x_i - 45\|$
10-20	2	15	30	30	60
20-30	3	25	75	20	60
30-40	8	35	280	10	80
40-50	14	45	630	0	0
50-60	8	55	440	10	80
60-70	3	65	195	20	60
70-80	2	75	150	30	60
Total	$N=40$		$\sum f_i x_i = 1800$		$\sum f_i\|x_i - \overline{x}\|=400$

Step 1: Calculate the mean.

$\overline{x} = \frac{\sum f_i x_i}{N} = \frac{1800}{40} = 45$.

Step 2: Calculate $\sum f_i|x_i - \overline{x}|$.

From the table, this sum is 400.

Step 3: Calculate the mean deviation.

$\text{MD}_{\overline{x}} = \frac{\sum f_i |x_i - \overline{x}|}{N} = \frac{400}{40} = 10$.

The mean deviation about the mean is 10.

(b) Mean Deviation about the Median

For a continuous distribution, we first find the median class and then calculate the median using a formula.

Step 1: Find the Median Class. It is the class interval whose cumulative frequency is just greater than or equal to $\frac{N}{2}$.

Step 2: Calculate Median ($M$).

$M = l + \frac{\frac{N}{2} - C}{f} \times h$

where,

$l$ = lower limit of the median class.
$N$ = sum of frequencies.
$C$ = cumulative frequency of the class preceding the median class.
$f$ = frequency of the median class.
$h$ = class size.

Step 3: Calculate Mean Deviation about the Median.

$\text{MD}_M = \frac{\sum f_i |x_i - M|}{N}$

... (vi)

Example 6. Calculate the mean deviation about the median for the data in Example 5.

Answer:

Solution:

First, we find the median by constructing a table with cumulative frequencies.

Marks	$f_i$	c.f.	Mid-point ($x_i$)	$\|x_i - M\| = \|x_i - 45\|$	$f_i \|x_i - 45\|$
10-20	2	2	15	30	60
20-30	3	5	25	20	60
30-40	8	13	35	10	80
40-50	14	27	45	0	0
50-60	8	35	55	10	80
60-70	3	38	65	20	60
70-80	2	40	75	30	60
Total	$N=40$				$\sum f_i\|x_i - M\|=400$

Step 1: Find the median class.

$N=40$, so $\frac{N}{2} = 20$.

The cumulative frequency just greater than 20 is 27. The corresponding class is 40-50. So, the Median Class is 40-50.

Step 2: Calculate the median.

$l = 40$, $N=40$, $C = 13$, $f = 14$, $h = 10$.

$M = 40 + \frac{20 - 13}{14} \times 10 = 40 + \frac{7}{14} \times 10 = 40 + \frac{1}{2} \times 10 = 40 + 5 = 45$.

Median $M = 45$.

Step 3: Calculate the mean deviation about the median.

Since the median (45) is the same as the mean in this case, the calculations for $|x_i - M|$ and $f_i |x_i - M|$ will be identical to the mean deviation calculation.

From the table, $\sum f_i|x_i - M|=400$.

$\text{MD}_{M} = \frac{\sum f_i |x_i - M|}{N} = \frac{400}{40} = 10$.

The mean deviation about the median is 10.

Merits and Demerits of Mean Deviation

Merits (Advantages):

Based on All Observations: Unlike range, it takes into account every single data point. This makes it a much more comprehensive and representative measure of dispersion.
Simple and Intuitive: The concept of an "average distance from the center" is straightforward to understand and explain, even to a non-technical audience.
Less Affected by Extreme Values: Compared to standard deviation (which squares the deviations), the mean deviation gives less weight to extreme observations (outliers), making it a more robust measure in their presence.

Demerits (Disadvantages):

Ignores Algebraic Signs: The use of absolute values ($|...|$) to make deviations positive is a mathematical inconvenience. The absolute value function is difficult to handle algebraically in further statistical theory (e.g., in inference or regression).
Not Mathematically Tractable: Because of the absolute value issue, it's not used in more advanced statistical analysis. The Standard Deviation, which overcomes this by squaring deviations, is mathematically more manageable and has better properties, making it the preferred measure of spread in higher statistics.
Value can change depending on the central tendency used: The value of mean deviation about the mean can be different from the mean deviation about the median.

Variance and Standard Deviation

In the study of dispersion, Mean Deviation provides a good intuitive measure of spread by averaging the absolute distances from a central point. However, the use of the absolute value function ($|...|$) makes it mathematically inconvenient for more advanced statistical analysis and inference. To overcome this algebraic limitation, statisticians developed Variance and Standard Deviation. These are the most crucial and widely used measures of dispersion in statistics because they are based on squared deviations, which have more desirable mathematical properties.

Variance ($\sigma^2$)

The Variance is defined as the arithmetic mean of the squared deviations of the observations from their arithmetic mean. It quantifies the degree of spread in a set of data points.

The process involves:

Calculating the mean ($\overline{x}$) of the data.
Finding the deviation of each observation from the mean ($x_i - \overline{x}$).
Squaring each deviation ($(x_i - \overline{x})^2$).
Finding the average of these squared deviations.

Squaring the deviations accomplishes two important things:

Eliminates Negatives: It ensures that all the terms to be averaged are positive, so they don't cancel each other out (solving the same problem that absolute values solved for mean deviation).
Emphasizes Larger Deviations: By squaring, it gives more weight to values that are further away from the mean. A point that is 4 units away contributes $4^2=16$ to the sum, while a point 2 units away only contributes $2^2=4$. This makes variance highly sensitive to outliers.

Variance is denoted by the Greek letter sigma squared ($\sigma^2$). A significant drawback of variance is that its units are the square of the original data units (e.g., if the data is in centimetres, the variance is in square centimetres). This makes it difficult to interpret in the context of the original data.

Standard Deviation ($\sigma$)

The Standard Deviation is the measure of dispersion that resolves the unit-interpretation problem of variance. It is defined as the positive square root of the variance.

By taking the square root, the units of the standard deviation become the same as the units of the original data, making it directly comparable and interpretable. The Standard Deviation is the most common and important measure of dispersion.

It represents a "typical" or "standard" amount of deviation (distance) of a data point from the mean. It is denoted by $\sigma$ (sigma).

A small standard deviation indicates that the data points tend to be very close to the mean. The dataset shows low variability and high consistency.
A large standard deviation indicates that the data points are spread out over a wide range of values. The dataset shows high variability and low consistency.

Variance and Standard Deviation for Ungrouped Data

1. Definitional Formulas

For a set of $n$ observations $x_1, x_2, ..., x_n$ with mean $\overline{x}$:

The Variance is given by:

$\sigma^2 = \frac{\sum\limits_{i=1}^{n} (x_i - \overline{x})^2}{n}$

... (i)

The Standard Deviation is given by:

$\sigma = \sqrt{\frac{\sum\limits_{i=1}^{n} (x_i - \overline{x})^2}{n}}$

... (ii)

2. Shortcut (Computational) Formula and its Derivation

Calculating $(x_i - \overline{x})$ for every data point can be tedious, especially if the mean $\overline{x}$ is a decimal. A computationally simpler formula exists.

Derivation:

We start with the definitional formula for variance:

$\sigma^2 = \frac{1}{n}\sum\limits_{i=1}^{n} (x_i - \overline{x})^2$

Expanding the squared term:

$\sigma^2 = \frac{1}{n}\sum\limits_{i=1}^{n} (x_i^2 - 2x_i\overline{x} + \overline{x}^2)$

Distributing the summation:

$\sigma^2 = \frac{1}{n} \left[ \sum\limits_{i=1}^{n} x_i^2 - \sum\limits_{i=1}^{n} 2x_i\overline{x} + \sum\limits_{i=1}^{n} \overline{x}^2 \right]$

Since $2\overline{x}$ and $\overline{x}^2$ are constants with respect to the summation:

$\sigma^2 = \frac{1}{n} \left[ \sum x_i^2 - 2\overline{x}\sum x_i + n\overline{x}^2 \right]$

We know that the mean $\overline{x} = \frac{\sum x_i}{n}$, which implies $\sum x_i = n\overline{x}$. Substituting this:

$\sigma^2 = \frac{1}{n} \left[ \sum x_i^2 - 2\overline{x}(n\overline{x}) + n\overline{x}^2 \right]$

$\sigma^2 = \frac{1}{n} \left[ \sum x_i^2 - 2n\overline{x}^2 + n\overline{x}^2 \right]$

$\sigma^2 = \frac{1}{n} \left[ \sum x_i^2 - n\overline{x}^2 \right]$

Distributing the $\frac{1}{n}$ term gives the shortcut formula:

$\sigma^2 = \frac{\sum x_i^2}{n} - \left(\frac{\sum x_i}{n}\right)^2 = \frac{\sum x_i^2}{n} - (\overline{x})^2$

... (iii)

This formula, "the mean of the squares minus the square of the mean," is often much faster for calculations.

Example 1. Find the variance and standard deviation for the data: 6, 8, 10, 12, 14.

Answer:

Method 1: Using the Definitional Formula

Step 1: Calculate the mean ($\overline{x}$).

$\sum x_i = 6 + 8 + 10 + 12 + 14 = 50$

$\overline{x} = \frac{50}{5} = 10$

Step 2: Calculate the sum of squared deviations.

$x_i$	$(x_i - \overline{x})$	$(x_i - \overline{x})^2$
6	6 - 10 = -4	16
8	8 - 10 = -2	4
10	10 - 10 = 0	0
12	12 - 10 = 2	4
14	14 - 10 = 4	16
Total		$\sum (x_i - \overline{x})^2 = 40$

Step 3: Calculate the Variance ($\sigma^2$).

$\sigma^2 = \frac{\sum (x_i - \overline{x})^2}{n} = \frac{40}{5} = 8$

Method 2: Using the Shortcut Formula

Step 1: Calculate $\sum x_i$ and $\sum x_i^2$.

$\sum x_i = 50$, so $\overline{x} = 10$ and $(\overline{x})^2 = 100$.

$\sum x_i^2 = 6^2 + 8^2 + 10^2 + 12^2 + 14^2 = 36 + 64 + 100 + 144 + 196 \ $$ = 540$.

Step 2: Apply the shortcut formula.

$\sigma^2 = \frac{\sum x_i^2}{n} - (\overline{x})^2 = \frac{540}{5} - (10)^2 = 108 - 100 = 8$.

Conclusion

The variance is $\sigma^2 = 8$.

Step 4: Calculate the Standard Deviation ($\sigma$).

$\sigma = \sqrt{\text{Variance}} = \sqrt{8} = 2\sqrt{2} \approx 2.828$.

The final answer is: Variance = 8, Standard Deviation $\approx 2.828$.

Variance and Standard Deviation for Grouped Data

1. Discrete Frequency Distribution

For data with observations $x_i$ having corresponding frequencies $f_i$:

Variance: $\sigma^2 = \frac{1}{N} \sum\limits_{i=1}^{k} f_i(x_i - \overline{x})^2$, where $N=\sum f_i$ and $\overline{x}=\frac{\sum f_i x_i}{N}$.

Standard Deviation: $\sigma = \sqrt{\frac{1}{N} \sum\limits_{i=1}^{k} f_i(x_i - \overline{x})^2}$.

Shortcut Formula: $\sigma^2 = \frac{\sum f_i x_i^2}{N} - \left(\frac{\sum f_i x_i}{N}\right)^2 = \frac{\sum f_i x_i^2}{N} - (\overline{x})^2$.

Example 2. Find the variance and standard deviation for the following data:

$x_i$	4	8	11	17	20	24	32
$f_i$	3	5	9	5	4	3	1

Answer:

We will use a table to organize the calculations for the shortcut formula.

$x_i$	$f_i$	$f_i x_i$	$x_i^2$	$f_i x_i^2$
4	3	12	16	48
8	5	40	64	320
11	9	99	121	1089
17	5	85	289	1445
20	4	80	400	1600
24	3	72	576	1728
32	1	32	1024	1024
Total	$N=30$	$\sum f_i x_i=420$		$\sum f_i x_i^2=7254$

Step 1: Calculate the mean ($\overline{x}$).

$\overline{x} = \frac{\sum f_i x_i}{N} = \frac{420}{30} = 14$.

Step 2: Calculate the Variance ($\sigma^2$) using the shortcut formula.

$\sigma^2 = \frac{\sum f_i x_i^2}{N} - (\overline{x})^2$

$\sigma^2 = \frac{7254}{30} - (14)^2 = 241.8 - 196 = 45.8$.

Step 3: Calculate the Standard Deviation ($\sigma$).

$\sigma = \sqrt{45.8} \approx 6.77$.

The final answer is: Variance $\approx 45.8$, Standard Deviation $\approx 6.77$.

2. Continuous Frequency Distribution and Shortcut Methods

For continuous distributions, we use the mid-point of each class interval as $x_i$. The formulas are the same as for discrete distributions. However, when the mid-points ($x_i$) or frequencies ($f_i$) are large, calculations become tedious. We use the Deviation Method or Step-Deviation Method to simplify them.

Derivation of Variance Formulas for Grouped Data

(a) Deviation Method Formula Derivation

This method aims to simplify calculations by shifting the origin of the data to an 'assumed mean' ($a$). The relationship between the actual deviations from the mean ($x_i - \overline{x}$) and the new deviations from the assumed mean ($d_i = x_i - a$) is used to derive the formula.

Derivation:

We begin with the fundamental formula for variance of a discrete frequency distribution:

$\sigma^2 = \frac{1}{N} \sum\limits_{i=1}^{k} f_i(x_i - \overline{x})^2$

... (A)

We know the relationship between the true mean ($\overline{x}$) and the assumed mean ($a$) is given by $\overline{x} = a + \overline{d}$, where $\overline{d} = \frac{\sum f_i d_i}{N}$ and $d_i = x_i - a$.

Let's substitute this into the term $(x_i - \overline{x})$:

$x_i - \overline{x} = x_i - (a + \overline{d})$

Since $d_i = x_i - a$, we can rewrite the above as:

$x_i - \overline{x} = (x_i - a) - \overline{d} = d_i - \overline{d}$

Now, substitute this back into the variance formula (A):

$\sigma^2 = \frac{1}{N} \sum\limits_{i=1}^{k} f_i(d_i - \overline{d})^2$

This formula is identical in structure to the original definition of variance, just with $d_i$ replacing $x_i$ and $\overline{d}$ replacing $\overline{x}$. We can now apply the same logic used to derive the shortcut formula.

Expand the squared term:

$\sigma^2 = \frac{1}{N} \sum\limits_{i=1}^{k} f_i(d_i^2 - 2d_i\overline{d} + \overline{d}^2)$

Distribute the summation and the frequency $f_i$:

$\sigma^2 = \frac{1}{N} \left[ \sum f_i d_i^2 - \sum 2f_i d_i\overline{d} + \sum f_i \overline{d}^2 \right]$

Since $\overline{d}$ is a constant, it can be taken out of the summation:

$\sigma^2 = \frac{1}{N} \left[ \sum f_i d_i^2 - 2\overline{d} \sum f_i d_i + \overline{d}^2 \sum f_i \right]$

We know that $\sum f_i = N$ and by definition, $\overline{d} = \frac{\sum f_i d_i}{N}$. Substitute these in:

$\sigma^2 = \frac{1}{N} \left[ \sum f_i d_i^2 - 2\overline{d} (N\overline{d}) + \overline{d}^2 (N) \right]$

$\sigma^2 = \frac{1}{N} \left[ \sum f_i d_i^2 - 2N\overline{d}^2 + N\overline{d}^2 \right]$

$\sigma^2 = \frac{1}{N} \left[ \sum f_i d_i^2 - N\overline{d}^2 \right]$

Distribute the $\frac{1}{N}$ term:

$\sigma^2 = \frac{\sum f_i d_i^2}{N} - \overline{d}^2$

Finally, substitute back the expression for $\overline{d}$:

$\sigma^2 = \frac{\sum f_i d_i^2}{N} - \left(\frac{\sum f_i d_i}{N}\right)^2$

... (iv)

This is the required formula for variance using the deviation method.

(b) Step-Deviation Method Formula Derivation

This method further simplifies the deviation method by scaling down the deviations ($d_i$) by a common factor, $h$ (usually the class size). This results in smaller numbers ($u_i$) that are easier to work with.

Derivation:

We start with the derived formula for the deviation method (iv):

$\sigma^2 = \frac{\sum f_i d_i^2}{N} - \left(\frac{\sum f_i d_i}{N}\right)^2$

... (B)

The step-deviation, $u_i$, is defined as:

$u_i = \frac{d_i}{h} = \frac{x_i - a}{h}$

From this definition, we can express the deviation $d_i$ in terms of the step-deviation $u_i$:

$d_i = h u_i$

Now, we substitute $d_i = h u_i$ into the variance formula (B):

$\sigma^2 = \frac{\sum f_i (h u_i)^2}{N} - \left(\frac{\sum f_i (h u_i)}{N}\right)^2$

Since $h$ is a constant for all classes, $h^2$ is also a constant. We can factor these constants out of the summation signs:

$\sigma^2 = \frac{h^2 \sum f_i u_i^2}{N} - \left(\frac{h \sum f_i u_i}{N}\right)^2$

Apply the square to the second term:

$\sigma^2 = \frac{h^2 \sum f_i u_i^2}{N} - \frac{h^2 \left(\sum f_i u_i\right)^2}{N^2}$

Now, we can factor out the common term $h^2$ from the entire expression:

$\sigma^2 = h^2 \left[ \frac{\sum f_i u_i^2}{N} - \left(\frac{\sum f_i u_i}{N}\right)^2 \right]$

... (v)

This is the required formula for variance using the step-deviation method.

The Standard Deviation is simply the positive square root of this variance:

$\sigma = \sqrt{h^2 \left[ \frac{\sum f_i u_i^2}{N} - \left(\frac{\sum f_i u_i}{N}\right)^2 \right]}$

Taking $h^2$ out of the square root gives the final formula for standard deviation:

$\sigma = h \sqrt{ \frac{\sum f_i u_i^2}{N} - \left(\frac{\sum f_i u_i}{N}\right)^2 }$

... (vi)

Example 3. Calculate the standard deviation for the following data using the step-deviation method.

Class	Frequency ($f_i$)
30 - 40	3
40 - 50	7
50 - 60	12
60 - 70	15
70 - 80	8
80 - 90	3
90 - 100	2

Answer:

Let's use the step-deviation method. We choose the assumed mean $a=65$ (mid-point of the class with highest frequency) and the common factor $h=10$ (the class size).

Class	$f_i$	Mid-point ($x_i$)	$u_i = \frac{x_i - 65}{10}$	$f_i u_i$	$u_i^2$	$f_i u_i^2$
30-40	3	35	-3	-9	9	27
40-50	7	45	-2	-14	4	28
50-60	12	55	-1	-12	1	12
60-70	15	65 (a)	0	0	0	0
70-80	8	75	1	8	1	8
80-90	3	85	2	6	4	12
90-100	2	95	3	6	9	18
Total	$N=50$			$\sum f_i u_i = -15$		$\sum f_i u_i^2=105$

Step 1: Identify the values from the table.

$N=50$, $\sum f_i u_i = -15$, $\sum f_i u_i^2=105$, $h=10$.

Step 2: Calculate the Variance ($\sigma^2$) using formula (v).

$\sigma^2 = h^2 \left[ \frac{\sum f_i u_i^2}{N} - \left(\frac{\sum f_i u_i}{N}\right)^2 \right]$

$\sigma^2 = 10^2 \left[ \frac{105}{50} - \left(\frac{-15}{50}\right)^2 \right]$

$\sigma^2 = 100 \left[ 2.1 - (-0.3)^2 \right]$

$\sigma^2 = 100 \left[ 2.1 - 0.09 \right]$

$\sigma^2 = 100 [2.01] = 201$.

Step 3: Calculate the Standard Deviation ($\sigma$).

$\sigma = \sqrt{201} \approx 14.18$.

The standard deviation for the given data is approximately 14.18.

Coefficient of Variation

Measures of dispersion like Standard Deviation and Variance give us an understanding of the absolute spread or variability within a single dataset. For instance, a standard deviation of 10 cm tells us how much the heights in a group typically vary. However, what if we want to compare the variability of two different groups? The standard deviation alone can be misleading in such cases, especially if:

The groups have different units of measurement (e.g., comparing height in cm to weight in kg).
The groups have the same units but their average values (means) are significantly different.

To perform a meaningful comparison, we need a relative measure of dispersion. The most important and widely used relative measure is the Coefficient of Variation (CV).

Definition and Formula of Coefficient of Variation

The Coefficient of Variation (CV) is a standardized, relative measure of dispersion. It elegantly expresses the standard deviation as a percentage of the arithmetic mean. In simple terms, it measures the "scatter per unit of the mean," allowing for a fair comparison of variability between different datasets.

The formula is given by:

CV = $\frac{\sigma}{\overline{x}} \times 100\%$

... (i)

Where:

$\sigma$ is the standard deviation of the dataset.
$\overline{x}$ is the arithmetic mean of the dataset (note: $\overline{x}$ must not be zero).

A key feature of the CV is that it is a pure number without any units. Because both the standard deviation ($\sigma$) and the mean ($\overline{x}$) have the same units, these units cancel out when we divide them. Multiplying by 100 simply presents this ratio as an easy-to-interpret percentage.

Interpretation and Application of CV

The Coefficient of Variation is the primary tool for comparing the consistency, stability, or uniformity of two or more groups. The interpretation is straightforward:

A lower CV indicates that the data points are more tightly clustered around the mean. This implies greater consistency, higher stability, or less relative variability.
A higher CV indicates that the data points are more spread out relative to their mean. This implies less consistency, lower stability, or greater relative variability.

Use Case 1: Comparing Data with Different Units

Imagine a health study where we want to compare the variability of patients' heights (in cm) with the variability of their weights (in kg). Suppose we find:

For Heights: Standard Deviation = 10 cm
For Weights: Standard Deviation = 8 kg

We cannot conclude that heights are more variable just because 10 is greater than 8. The units are different, so a direct comparison is meaningless. The CV solves this. If the mean height is 170 cm and the mean weight is 70 kg:

CV$_{\text{height}} = \frac{10}{170} \times 100\% \approx 5.9\%$

CV$_{\text{weight}} = \frac{8}{70} \times 100\% \approx 11.4\%$

Now we can make a fair comparison: The variability in weight (11.4%) is relatively much greater than the variability in height (5.9%) for this group.

Use Case 2: Comparing Data with Widely Different Means

Consider two cricket batsmen, Virat and Rohit, with the following statistics for a season:

Virat: Mean Score ($\overline{x}$) = 75 runs, Standard Deviation ($\sigma$) = 15 runs.
Rohit: Mean Score ($\overline{x}$) = 40 runs, Standard Deviation ($\sigma$) = 12 runs.

Looking only at the standard deviation, Virat (15) appears more variable than Rohit (12). However, this is misleading because their average scores are very different. A deviation of 15 runs from an average of 75 is less significant than a deviation of 12 runs from an average of 40. The CV provides the correct perspective:

CV for Virat = $\frac{15}{75} \times 100\% = 20\%$

CV for Rohit = $\frac{12}{40} \times 100\% = 30\%$

The CV reveals that Rohit's scoring is relatively more variable (30%) than Virat's (20%). Therefore, Virat is the more consistent batsman.

Example 1. The mean and standard deviation of the salaries of two firms, A and B, are given below:

Firm	Mean Salary (₹)	Standard Deviation (₹)
A	₹ 25,000	₹ 3,000
B	₹ 28,000	₹ 3,500

Which firm has greater variability in individual salaries? Which firm has more consistent salaries?

Answer:

Given:

For Firm A: Mean $\overline{x}_A = 25000$, Standard Deviation $\sigma_A = 3000$.

For Firm B: Mean $\overline{x}_B = 28000$, Standard Deviation $\sigma_B = 3500$.

Solution:

To compare the variability of salaries, we must calculate the Coefficient of Variation (CV) for each firm, as their mean salaries are different.

For Firm A:

CV$_A = \frac{\sigma_A}{\overline{x}_A} \times 100\% = \frac{3000}{25000} \times 100\% = 0.12 \times 100\% = 12\%$

For Firm B:

CV$_B = \frac{\sigma_B}{\overline{x}_B} \times 100\% = \frac{3500}{28000} \times 100\% = \frac{1}{8} \times 100\% = 0.125 \times 100\% = 12.5\%$

Conclusion:

1. Variability: Since CV$_B$ (12.5%) > CV$_A$ (12%), Firm B shows greater relative variability in its individual salaries.

2. Consistency: Since Firm A has a lower CV, it means its salaries are more tightly clustered around the mean. Therefore, Firm A has more consistent salaries.

Example 2. An investor is considering two stocks, Stock X and Stock Y. The mean annual return and standard deviation for the past five years are given below.

Stock	Mean Annual Return	Standard Deviation of Return
X	18%	6%
Y	12%	5%

In finance, standard deviation is a measure of risk. Which stock is considered more risky or volatile?

Answer:

Given:

For Stock X: Mean $\overline{x}_X = 18$, Standard Deviation $\sigma_X = 6$.

For Stock Y: Mean $\overline{x}_Y = 12$, Standard Deviation $\sigma_Y = 5$.

Solution:

To compare the risk relative to the average return, we calculate the CV for each stock. A higher CV implies higher risk for every unit of return.

For Stock X:

CV$_X = \frac{\sigma_X}{\overline{x}_X} \times 100\% = \frac{6}{18} \times 100\% = \frac{1}{3} \times 100\% \approx 33.33\%$

For Stock Y:

CV$_Y = \frac{\sigma_Y}{\overline{x}_Y} \times 100\% = \frac{5}{12} \times 100\% \approx 0.4167 \times 100\% \approx 41.67\%$

Conclusion:

Since the CV of Stock Y (41.67%) is greater than the CV of Stock X (33.33%), Stock Y is considered more risky or volatile relative to its average return.

$x_i$	$\|x_i - \overline{x}\| = \|x_i - 9\|$
4	$\|4 - 9\| = 5$
6	$\|6 - 9\| = 3$
7	$\|7 - 9\| = 2$
8	$\|8 - 9\| = 1$
10	$\|10 - 9\| = 1$
12	$\|12 - 9\| = 3$
12	$\|12 - 9\| = 3$
13	$\|13 - 9\| = 4$
Total	$\sum\limits_{i=1}^{8} \|x_i - \overline{x}\| = 22$

$x_i$	$\|x_i - M\| = \|x_i - 9\|$
3	$\|3 - 9\| = 6$
3	$\|3 - 9\| = 6$
4	$\|4 - 9\| = 5$
5	$\|5 - 9\| = 4$
7	$\|7 - 9\| = 2$
9	$\|9 - 9\| = 0$
10	$\|10 - 9\| = 1$
12	$\|12 - 9\| = 3$
18	$\|18 - 9\| = 9$
19	$\|19 - 9\| = 10$
21	$\|21 - 9\| = 12$
Total	$\sum\limits \|x_i - M\| = 58$