Measures of Central Tendency
Introduction
After collecting, organising, and presenting data, the next step in statistical analysis is to summarise it using a single, representative value. It is often impossible to work with or remember a large set of individual observations. We need a single value that represents the central point or the "center of gravity" of the entire dataset. This representative value is known as a measure of central tendency or an average.
An average helps in condensing a vast amount of data into a single figure, making it easier to comprehend and compare different datasets. For example, to compare the academic performance of students in two different colleges, we can compare their average marks. There are several types of averages, each with its own specific use and characteristics. The three most commonly used measures of central tendency are:
- Arithmetic Mean
- Median
- Mode
This chapter will delve into the calculation, properties, and applications of these three fundamental statistical measures.
Arithmetic Mean
The Arithmetic Mean (A.M.), often simply called the mean, is the most common and widely used measure of central tendency. It is defined as the sum of the values of all observations in a dataset divided by the total number of observations.
How Arithmetic Mean Is Calculated
The calculation of the arithmetic mean depends on the type of data series: whether it is an individual series (ungrouped data) or a frequency distribution (grouped data).
Arithmetic Mean For Series Of Ungrouped Data
Ungrouped data refers to a series of individual observations. There are three methods to calculate the mean for this type of data.
1. Direct Method
This is the simplest method. If we have $n$ individual observations $x_1, x_2, \dots, x_n$, the arithmetic mean ($\bar{x}$) is calculated by summing all the observations and dividing by the number of observations.
Formula:
$ \bar{x} = \frac{x_1 + x_2 + \dots + x_n}{n} = \frac{\sum_{i=1}^{n} x_i}{n} $
Example 1. The monthly pocket money (in ₹) of 5 students is 500, 650, 700, 450, and 900. Calculate the mean pocket money.
Answer:
Sum of observations ($ \sum x_i $) = $500 + 650 + 700 + 450 + 900 = 3200$
Number of observations ($n$) = 5
$ \bar{x} = \frac{\sum x_i}{n} = \frac{3200}{5} = 640 $
The mean monthly pocket money is ₹ 640.
2. Assumed Mean Method (Short-cut Method)
When the data values are large, the direct method can be time-consuming. The assumed mean method simplifies the calculation. In this method, we assume a value as the mean (A), preferably one from the middle of the data range. We then calculate the deviations of each observation from this assumed mean ($d_i = x_i - A$).
Derivation:
We know that $d_i = x_i - A$, so $x_i = A + d_i$.
The sum of all observations is $ \sum x_i = \sum (A + d_i) = \sum A + \sum d_i $.
Since A is a constant, $ \sum A = nA $. Thus, $ \sum x_i = nA + \sum d_i $.
Dividing by $n$, we get $ \frac{\sum x_i}{n} = \frac{nA}{n} + \frac{\sum d_i}{n} $.
Therefore, $ \bar{x} = A + \frac{\sum d_i}{n} $.
Formula:
$ \bar{x} = A + \frac{\sum d_i}{n} $, where $d_i = x_i - A$.
Example 2. Using the data from Example 1, calculate the mean using the assumed mean method. Let the assumed mean (A) be 700.
Answer:
$x_i$ | $d_i = x_i - 700$ |
---|---|
500 | -200 |
650 | -50 |
700 | 0 |
450 | -250 |
900 | +200 |
$ \sum d_i = -300 $ |
$ \bar{x} = A + \frac{\sum d_i}{n} = 700 + \frac{-300}{5} = 700 - 60 = 640 $.
The mean monthly pocket money is ₹ 640.
Calculation Of Arithmetic Mean For Grouped Data
Discrete Series
A discrete series is where data is presented with its corresponding frequencies ($x_i, f_i$).
Direct Method:
$ \bar{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{\sum f_i x_i}{N} $
Assumed Mean Method:
$ \bar{x} = A + \frac{\sum f_i d_i}{N} $, where $d_i = x_i - A$.
Step Deviation Method:
This method further simplifies the calculation if the deviations ($d_i$) have a common factor ($c$).
$ \bar{x} = A + \frac{\sum f_i d'_i}{N} \times c $, where $d'_i = \frac{d_i}{c} = \frac{x_i - A}{c}$.
Continuous Series
In a continuous series, data is grouped into class intervals. The first step is to find the class mark or mid-point ($m_i$) for each class interval. The mid-point is then treated as $x_i$ and the problem is solved like a discrete series.
$ m_i = \frac{\text{Lower Limit} + \text{Upper Limit}}{2} $
Direct Method:
$ \bar{x} = \frac{\sum f_i m_i}{N} $
Step Deviation Method (Most Common):
$ \bar{x} = A + \frac{\sum f_i d'_i}{N} \times c $, where $d'_i = \frac{m_i - A}{c}$ and $c$ is the class interval width.
Example 3. Calculate the mean marks of students from the following data using the Step Deviation Method.
Marks | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 |
---|---|---|---|---|---|
No. of Students | 5 | 12 | 15 | 10 | 8 |
Answer:
Let's take Assumed Mean (A) = 25 and class width (c) = 10.
Marks | Frequency ($f_i$) | Mid-point ($m_i$) | $d_i = m_i - 25$ | $d'_i = d_i/10$ | $f_i d'_i$ |
---|---|---|---|---|---|
0-10 | 5 | 5 | -20 | -2 | -10 |
10-20 | 12 | 15 | -10 | -1 | -12 |
20-30 | 15 | 25 | 0 | 0 | 0 |
30-40 | 10 | 35 | 10 | 1 | 10 |
40-50 | 8 | 45 | 20 | 2 | 16 |
Total | $N=50$ | $ \sum f_i d'_i = 4 $ |
$ \bar{x} = A + \frac{\sum f_i d'_i}{N} \times c = 25 + \frac{4}{50} \times 10 = 25 + \frac{40}{50} = 25 + 0.8 = 25.8 $
The mean marks are 25.8.
Two Interesting Properties Of A.M.
- The sum of deviations of all observations from their arithmetic mean is always zero.
Proof: $ \sum (x_i - \bar{x}) = \sum x_i - \sum \bar{x} $. Since $\bar{x}$ is a constant, $\sum \bar{x} = n\bar{x}$. And we know $ \bar{x} = \frac{\sum x_i}{n} \implies n\bar{x} = \sum x_i $. So, $ \sum (x_i - \bar{x}) = n\bar{x} - n\bar{x} = 0 $.
- The arithmetic mean is affected by extreme values (outliers). A very large or very small value in the dataset can significantly pull the mean towards it.
Weighted Arithmetic Mean
Sometimes, different observations in a dataset have different levels of importance. In such cases, we use the weighted arithmetic mean, where each observation ($x_i$) is assigned a weight ($w_i$) according to its significance.
Formula:
$ \bar{x}_w = \frac{\sum w_i x_i}{\sum w_i} $
Median
The Median is a positional average. It is the value of the middle-most observation in a dataset when the data is arranged in ascending or descending order. It divides the dataset into two equal halves: 50% of the observations are below the median, and 50% are above it. The median is not affected by extreme values, making it a better measure of central tendency for skewed data.
Computation Of Median
Ungrouped Data
- Arrange the data in ascending or descending order.
- If the number of observations ($n$) is odd, Median = Value of $ \left(\frac{n+1}{2}\right)^{th} $ item.
- If the number of observations ($n$) is even, Median = Average of the $ \left(\frac{n}{2}\right)^{th} $ and $ \left(\frac{n}{2} + 1\right)^{th} $ items.
Discrete Series
- Arrange the data and find the cumulative frequencies (cf). Let $N = \sum f_i$.
- Find the position of the median item using $ \left(\frac{N+1}{2}\right) $.
- Locate the value of the variable ($x$) corresponding to the cumulative frequency that is equal to or just greater than $ \left(\frac{N+1}{2}\right) $.
Continuous Series
- Find the cumulative frequencies (cf).
- Find the median class by locating the class corresponding to the $ \left(\frac{N}{2}\right)^{th} $ item.
- Apply the following formula:
Formula:
$ \text{Median} = l + \frac{\frac{N}{2} - cf}{f} \times h $
where,
- $l$ = lower limit of the median class
- $N$ = total frequency ($\sum f_i$)
- $cf$ = cumulative frequency of the class preceding the median class
- $f$ = frequency of the median class
- $h$ = class width of the median class
Example 4. Using the data from Example 3, calculate the median marks.
Answer:
Marks | Frequency ($f_i$) | Cumulative Frequency (cf) |
---|---|---|
0-10 | 5 | 5 |
10-20 | 12 | 17 |
20-30 | 15 | 32 |
30-40 | 10 | 42 |
40-50 | 8 | 50 |
Here, $N=50$. Median item = $N/2 = 50/2 = 25^{th}$ item.
The $25^{th}$ item falls in the cumulative frequency of 32. So, the median class is 20-30.
$l=20$, $N=50$, $cf=17$ (cf of class preceding median class), $f=15$, $h=10$.
$ \text{Median} = 20 + \frac{25 - 17}{15} \times 10 = 20 + \frac{8}{15} \times 10 = 20 + \frac{80}{15} = 20 + 5.33 = 25.33 $
The median marks are 25.33.
Quartiles
Quartiles are values that divide the data into four equal parts. There are three quartiles:
- First Quartile ($Q_1$) or Lower Quartile: The value that has 25% of the items below it.
- Second Quartile ($Q_2$): The value that has 50% of the items below it. It is the same as the Median.
- Third Quartile ($Q_3$) or Upper Quartile: The value that has 75% of the items below it.
Calculation for Continuous Series:
$ Q_1 = l + \frac{\frac{N}{4} - cf}{f} \times h \quad \text{and} \quad Q_3 = l + \frac{\frac{3N}{4} - cf}{f} \times h $
(where l, cf, f, h correspond to the respective quartile class)
Mode
The Mode is the value that occurs most frequently in a dataset. It is the value with the highest frequency. A dataset can have one mode (unimodal), two modes (bimodal), more than two modes (multimodal), or no mode at all.
Computation Of Mode
Discrete Series
The mode can be found by simple inspection. It is the value of the variable ($x$) that corresponds to the highest frequency.
Continuous Series
- Identify the modal class, which is the class interval with the highest frequency.
- Apply the following formula:
Formula:
$ \text{Mode} = l + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \times h $
where,
- $l$ = lower limit of the modal class
- $f_1$ = frequency of the modal class
- $f_0$ = frequency of the class preceding the modal class
- $f_2$ = frequency of the class succeeding the modal class
- $h$ = class width
Example 5. Using the data from Example 3, calculate the mode.
Answer:
From the table, the highest frequency is 15, which corresponds to the class interval 20-30. This is the modal class.
$l=20$, $f_1=15$ (frequency of modal class), $f_0=12$ (frequency of preceding class), $f_2=10$ (frequency of succeeding class), $h=10$.
$ \text{Mode} = 20 + \frac{15 - 12}{2(15) - 12 - 10} \times 10 = 20 + \frac{3}{30 - 22} \times 10 = 20 + \frac{3}{8} \times 10 $
$ \text{Mode} = 20 + \frac{30}{8} = 20 + 3.75 = 23.75 $
The modal marks are 23.75.
Relative Position Of Arithmetic Mean, Median And Mode
The relationship between mean, median, and mode depends on the shape of the frequency distribution.
- Symmetrical Distribution: In a perfectly symmetrical distribution, the values of mean, median, and mode are equal. (Mean = Median = Mode).
- Asymmetrical (Skewed) Distribution:
- Positively Skewed Distribution: The distribution has a long tail to the right. The relationship is: Mean > Median > Mode.
- Negatively Skewed Distribution: The distribution has a long tail to the left. The relationship is: Mean < Median < Mode.
- Positively Skewed Distribution: The distribution has a long tail to the right. The relationship is: Mean > Median > Mode.
For moderately asymmetrical distributions, there is an empirical relationship between the three measures, given by Karl Pearson:
$ \text{Mode} = 3 \times \text{Median} - 2 \times \text{Mean} $
Using the values from our examples: Mean=25.8, Median=25.33, Mode=23.75. $ 3(25.33) - 2(25.8) = 75.99 - 51.6 = 24.39 $. This is reasonably close to our calculated mode of 23.75.
Conclusion
Measures of central tendency are indispensable tools in statistics for summarising a dataset with a single value. Each measure—Arithmetic Mean, Median, and Mode—provides a different perspective on the "center" of the data.
- The Mean is a comprehensive measure as it uses every value in the dataset, but it is sensitive to outliers.
- The Median is a robust measure of position, ideal for skewed distributions or data with extreme values.
- The Mode identifies the most typical or frequent value in a dataset and is the only measure that can be used for categorical data.
The choice of which measure to use depends on the nature of the data and the objective of the analysis. A thorough understanding of these measures allows a researcher to accurately describe and compare datasets, forming the foundation for more advanced statistical inference.