Content On This Page | ||
---|---|---|
Percentiles: Definition and Calculation | Quartiles: Definition and Calculation | Percentile Rank and Quartile Rank |
Interquartile Range and Quartile Deviation (Implicit) |
Percentiles and Quartiles
Percentiles: Definition and Calculation
Definition and Concept
Percentiles are positional measures used in statistics to indicate the value below which a specified percentage of observations in a dataset falls. The $k^{\text{th}}$ percentile, denoted as $P_k$, is the value such that $k$ percent of the observations are less than or equal to this value, and $(100-k)$ percent of the observations are greater than or equal to this value.
Percentiles divide an ordered dataset into 100 equal parts. For example, the 10th percentile separates the lowest 10% of the data from the highest 90%. The 99th percentile separates the highest 1% of the data from the lowest 99%.
Percentiles provide a way to understand the distribution of data and the relative standing of individual observations within that distribution. They are particularly useful for comparing values from different datasets (e.g., comparing a student's score on one test to their score on another test by looking at their percentile ranks).
Examples:
- $P_{25}$ (25th percentile) is the value below which 25% of the data lies.
- $P_{50}$ (50th percentile) is the value below which 50% of the data lies. This is the **Median**.
- $P_{75}$ (75th percentile) is the value below which 75% of the data lies.
Calculation for Ungrouped Data
To find the $k^{\text{th}}$ percentile ($P_k$) for ungrouped data (a simple list of individual observations), follow these steps:
-
Order the Data:
Arrange the $n$ observations in ascending order (from smallest to largest). Let the ordered observations be $x_{(1)}, x_{(2)}, \dots, x_{(n)}$, where $x_{(1)}$ is the smallest and $x_{(n)}$ is the largest.
-
Calculate the Position:
Calculate the position or rank ($L_k$) of the $k^{\text{th}}$ percentile value in the ordered list. There are different formulas for calculating percentile position, which can lead to slightly different results, especially for small datasets. A commonly used formula is:
Position $L_k = \frac{k}{100}(n+1)$
... (1)
Where $k$ is the desired percentile (e.g., 70 for the 70th percentile) and $n$ is the total number of observations.
Note: Other methods exist, e.g., $L_k = \frac{k}{100}n$ or methods involving different interpolation rules (like the method used in some statistical software). The $(n+1)$ method is often used in introductory texts.
-
Determine the Percentile Value:
Based on whether the calculated position $L_k$ is an integer or not:
- If $L_k$ is an Integer: The $k^{\text{th}}$ percentile $P_k$ is simply the value of the observation located at the $L_k^{\text{th}}$ position in the ordered data. $P_k = x_{(L_k)}$.
- If $L_k$ is not an Integer: The percentile value lies between two observations. Let $L_k = I + F$, where $I$ is the integer part and $F$ is the fractional part ($0 < F < 1$). The $k^{\text{th}}$ percentile $P_k$ is found by linear interpolation between the values at position $I$ ($x_{(I)}$) and position $I+1$ ($x_{(I+1)}$):
$P_k = x_{(I)} + F \times (x_{(I+1)} - x_{(I)})$
... (2)
This formula adds a fraction of the difference between the $(I+1)^{\text{th}}$ and $I^{\text{th}}$ values to the $I^{\text{th}}$ value.
Example
Example 1. Find the 70th percentile ($P_{70}$) for the dataset: 15, 20, 25, 18, 22, 12, 30, 19, 28.
Answer:
Given: Dataset: 15, 20, 25, 18, 22, 12, 30, 19, 28.
To Find: The 70th percentile ($P_{70}$).
Solution:
- Order the data in ascending order:
12, 15, 18, 19, 20, 22, 25, 28, 30
- Count the number of observations:
$n = 9$
- Calculate the position ($L_k$) for $k=70$:
$L_{70} = \frac{70}{100}(n+1)$
$L_{70} = \frac{70}{100}(9+1) = \frac{70}{100}(10) = 7$
... (i)
The position is the 7th observation.
- Determine the percentile value:
Since $L_{70} = 7$ is an integer, the 70th percentile $P_{70}$ is the value of the observation at the 7th position in the ordered list.
Ordered list:
12 15 18 19 20 22 25 28 30 (1st) (2nd) (3rd) (4th) (5th) (6th) (7th) (8th) (9th) The value at the 7th position is 25.
Therefore, $P_{70} = 25$. This means that 70% of the observations in this dataset are less than or equal to 25.
Calculation for Grouped Data
For data presented in a grouped frequency distribution with class intervals, individual values are not known. We estimate percentiles using a formula similar to the median formula. This involves locating the percentile class and then interpolating within that class.
-
Calculate Cumulative Frequencies:
Prepare a 'less than' cumulative frequency (cf) column for the distribution. Find the total frequency $N = \sum f_i$.
-
Find the Percentile Position:
Calculate the position of the $k^{\text{th}}$ percentile in the cumulative frequency distribution using the formula $\frac{kN}{100}$.
-
Locate the Percentile Class:
Find the class interval whose 'less than' cumulative frequency is just greater than or equal to the value $\frac{kN}{100}$. This class interval is called the **percentile class** for $P_k$. The $k^{\text{th}}$ percentile value lies within this class.
-
Determine Values for the Formula:
From the percentile class and the cumulative frequency table, identify the following values:
- $l$: The lower class boundary of the percentile class.
- $N$: The total frequency.
- $cf$: The cumulative frequency of the class immediately preceding the percentile class.
- $f$: The frequency of the percentile class itself.
- $h$: The class width (size) of the percentile class (assuming equal widths).
-
Apply the Percentile Formula:
The formula for calculating the $k^{\text{th}}$ percentile ($P_k$) for grouped data is:
$P_k = l + \left( \frac{\frac{kN}{100} - cf}{f} \right) \times h$
... (3)
This formula is a generalization of the median formula for grouped data, where the median is $P_{50}$ (so $k=50$).
Quartiles: Definition and Calculation
Definition and Relationship to Percentiles
Quartiles are specific percentiles that are widely used to divide an ordered dataset into four equal parts. They are particularly useful for understanding the spread and distribution of the data, especially the central portion.
There are three main quartiles:
First Quartile ($Q_1$):
Also known as the **lower quartile**. It is the value below which the lowest 25% of the observations lie. $Q_1$ is equivalent to the **25th percentile ($P_{25}$)**.Second Quartile ($Q_2$):
This is the value below which 50% of the observations lie. $Q_2$ is identical to the **Median** and the **50th percentile ($P_{50}$)**. It divides the data into two equal halves.Third Quartile ($Q_3$):
Also known as the **upper quartile**. It is the value below which 75% of the observations lie. $Q_3$ is equivalent to the **75th percentile ($P_{75}$)**.
The distance between the first and third quartiles ($Q_3 - Q_1$) is called the Interquartile Range (IQR), which is a measure of dispersion for the central 50% of the data.
Calculation for Ungrouped Data
Calculating quartiles for ungrouped data (a simple list of observations) follows the same procedure as calculating percentiles, using the corresponding percentile values ($k=25, 50, 75$).
- Order the Data: Arrange the $n$ observations in ascending order.
- Calculate Positions: Use the percentile position formula $L_k = \frac{k}{100}(n+1)$ with $k=25, 50, 75$.
- $Q_1$ position: $L_{25} = \frac{25}{100}(n+1) = \frac{n+1}{4}$
- $Q_2$ position: $L_{50} = \frac{50}{100}(n+1) = \frac{n+1}{2}$ (Median position)
- $Q_3$ position: $L_{75} = \frac{75}{100}(n+1) = \frac{3(n+1)}{4}$
- Determine Quartile Values: Find the value at each calculated position.
- If the position is an integer, the quartile is the value at that integer position in the ordered data.
- If the position is not an integer, interpolate between the values at the adjacent integer positions using the method described for percentiles.
Note: As with percentiles, different conventions for calculating quartile positions and values exist. The $(n+1)$ method shown above is one common approach.
Example
Example 1. Find the first quartile ($Q_1$) and third quartile ($Q_3$) for the dataset: 12, 15, 18, 19, 20, 22, 25, 28, 30.
Answer:
Given: Dataset: 12, 15, 18, 19, 20, 22, 25, 28, 30.
To Find: The first quartile ($Q_1$) and third quartile ($Q_3$).
Solution:
- The data is already ordered in ascending order:
12, 15, 18, 19, 20, 22, 25, 28, 30
- Count the number of observations:
$n = 9$
- Calculate the positions for $Q_1$ and $Q_3$:
Calculate $Q_1$ (k=25):
Position $L_{25} = \frac{n+1}{4} = \frac{9+1}{4} = \frac{10}{4} = 2.5$.
$L_{25} = 2.5$
... (i)
The position is not an integer (Integer part $I=2$, Fractional part $F=0.5$). We need to interpolate between the value at the 2nd position ($x_{(2)}$) and the value at the $(2+1)=3$rd position ($x_{(3)}$).
Ordered list: 12, 15, 18, 19, 20, 22, 25, 28, 30.
$x_{(2)} = 15$, $x_{(3)} = 18$.
Using the interpolation formula $P_k = x_{(I)} + F \times (x_{(I+1)} - x_{(I)})$:
$Q_1 = x_{(2)} + 0.5 \times (x_{(3)} - x_{(2)})$
$Q_1 = 15 + 0.5 \times (18 - 15)$
$Q_1 = 15 + 0.5 \times 3$
$Q_1 = 15 + 1.5$
$Q_1 = 16.5$
... (ii)
Calculate $Q_3$ (k=75):
Position $L_{75} = \frac{3(n+1)}{4} = \frac{3(9+1)}{4} = \frac{3(10)}{4} = \frac{30}{4} = 7.5$.
$L_{75} = 7.5$
... (iii)
The position is not an integer (Integer part $I=7$, Fractional part $F=0.5$). We need to interpolate between the value at the 7th position ($x_{(7)}$) and the value at the $(7+1)=8$th position ($x_{(8)}$).
Ordered list: 12, 15, 18, 19, 20, 22, 25, 28, 30.
$x_{(7)} = 25$, $x_{(8)} = 28$.
Using the interpolation formula:
$Q_3 = x_{(7)} + 0.5 \times (x_{(8)} - x_{(7)})$
$Q_3 = 25 + 0.5 \times (28 - 25)$
$Q_3 = 25 + 0.5 \times 3$
$Q_3 = 25 + 1.5$
$Q_3 = 26.5$
... (iv)
The first quartile ($Q_1$) is 16.5 and the third quartile ($Q_3$) is 26.5.
Note on Median ($Q_2$):
Let's also calculate the median ($Q_2$) for verification. Position $L_{50} = \frac{n+1}{2} = \frac{9+1}{2} = 5$. The 5th value is 20. So, $Q_2 = 20$. This is consistent with 25% of data below 16.5, 50% below 20, and 75% below 26.5.
Calculation for Grouped Data
For grouped data in a frequency distribution with class intervals, quartiles are calculated using the same formula as the $k^{\text{th}}$ percentile formula (Formula 3 from Section I1), by substituting the appropriate value of $k$ (25 for $Q_1$, 50 for $Q_2$, and 75 for $Q_3$) and identifying the corresponding quartile class and its values ($l, cf, f, h$).
Formula for $Q_1$ (First Quartile):
First, find the $Q_1$ class: the class interval whose cumulative frequency is just greater than or equal to $\frac{N}{4}$. Then use the formula:
$Q_1 = l + \left( \frac{\frac{N}{4} - cf}{f} \right) \times h$
... (5)
Where $l$ is the lower boundary of the $Q_1$ class, $cf$ is the cumulative frequency of the class preceding the $Q_1$ class, $f$ is the frequency of the $Q_1$ class, and $h$ is the class width of the $Q_1$ class.
Formula for $Q_2$ (Second Quartile / Median):
First, find the $Q_2$ class (Median class): the class interval whose cumulative frequency is just greater than or equal to $\frac{N}{2}$. Then use the formula:
$Q_2 = \text{Median} = l + \left( \frac{\frac{N}{2} - cf}{f} \right) \times h$
... (6)
Where $l$ is the lower boundary of the $Q_2$ class, $cf$ is the cumulative frequency of the class preceding the $Q_2$ class, $f$ is the frequency of the $Q_2$ class, and $h$ is the class width of the $Q_2$ class.
Formula for $Q_3$ (Third Quartile):
First, find the $Q_3$ class: the class interval whose cumulative frequency is just greater than or equal to $\frac{3N}{4}$. Then use the formula:
$Q_3 = l + \left( \frac{\frac{3N}{4} - cf}{f} \right) \times h$
... (7)
Where $l$ is the lower boundary of the $Q_3$ class, $cf$ is the cumulative frequency of the class preceding the $Q_3$ class, $f$ is the frequency of the $Q_3$ class, and $h$ is the class width of the $Q_3$ class.
For each quartile calculation, you need to identify the correct class interval ($Q_1$ class, $Q_2$ class, or $Q_3$ class) based on the cumulative frequency position, and then use the specific $l, cf, f,$ and $h$ values associated with that particular class in the formula.
Percentile Rank and Quartile Rank
Percentile Rank
While a percentile ($P_k$) is a specific data value that divides the dataset at a certain percentage, the **Percentile Rank** of a specific value $x$ is the percentage of observations in the dataset that are less than or equal to that value $x$. It provides the relative standing of a particular observation or score within its dataset.
If an observation has a percentile rank of $PR$, it means that $PR\%$ of the observations in the dataset are less than or equal to that observation's value.
For instance, if a student's score of 75 has a percentile rank of 90, it means 90% of the students scored 75 or less. Note that some definitions calculate the percentage of values *strictly less than* $x$, while others include values equal to $x$. Using "less than or equal to" is a common convention.
Calculation for Ungrouped Data:
To calculate the percentile rank (PR) of a specific value $x$ in a dataset of $n$ ungrouped observations:
- Order the data in ascending order.
- Count the number of observations that are less than or equal to the value $x$. Let this count be $C$.
- Calculate the percentile rank using the formula:
$\text{PR} = \frac{C}{n} \times 100$
... (1)
Where $n$ is the total number of observations.
Note: If using the definition of percentile rank as the percentage of values *strictly less than* $x$, the numerator would be $L$ (number of values less than $x$). Some formulas also add $0.5S$ (half the number of values equal to $x$) to the numerator: $PR = \frac{L + 0.5 S}{n} \times 100$. The first formula $PR = \frac{C}{n} \times 100$ is conceptually simpler and commonly used.
Example
Example 1. For the dataset: 12, 15, 18, 19, 20, 22, 25, 28, 30, find the percentile rank of the score 22.
Answer:
Given: Dataset: 12, 15, 18, 19, 20, 22, 25, 28, 30. Value $x = 22$.
To Find: Percentile rank of 22.
Solution:
- The data is already ordered: 12, 15, 18, 19, 20, 22, 25, 28, 30.
- Count the number of observations less than or equal to 22. These are 12, 15, 18, 19, 20, and 22. The count $C = 6$.
- The total number of observations is $n = 9$.
- Calculate the Percentile Rank using the formula $PR = \frac{C}{n} \times 100$:
$\text{PR} = \frac{6}{9} \times 100$
... (i)
$\text{PR} = \frac{2}{3} \times 100$
(Simplifying the fraction)
$\text{PR} \approx 0.666... \times 100$
$\text{PR} \approx 66.7$ (rounded to one decimal place)
... (ii)
The percentile rank of the score 22 is approximately 66.7. This means that about 66.7% of the scores in this dataset are less than or equal to 22.
Note: If using the "strictly less than" definition (L=5) and the $L/n$ formula, $PR = (5/9) \times 100 \approx 55.6$. If using the $L+0.5S$ formula (L=5, S=1), $PR = (5+0.5 \times 1)/9 \times 100 = 5.5/9 \times 100 \approx 61.1$. This highlights how the definition of percentile rank can vary. The $C/n$ method is often the most straightforward.
Quartile Rank
The term "Quartile Rank" is not a standard statistical term like "percentile rank". It is sometimes used in a less formal sense to indicate which quarter of the ordered dataset a particular value falls into, based on the calculated quartiles ($Q_1, Q_2, Q_3$).
Given a dataset and its calculated quartiles, a value $x$ can be assigned a 'quartile rank' based on its position relative to $Q_1, Q_2$, and $Q_3$:
- If $x \le Q_1$, the value is in the **first quarter** (or Quartile Rank 1).
- If $Q_1 < x \le Q_2$, the value is in the **second quarter** (or Quartile Rank 2).
- If $Q_2 < x \le Q_3$, the value is in the **third quarter** (or Quartile Rank 3).
- If $x > Q_3$, the value is in the **fourth quarter** (or Quartile Rank 4).
Alternatively, "quartile rank" might simply refer to the percentile rank corresponding to a specific quartile value (e.g., the quartile rank of $Q_1$ is 25%, of $Q_2$ is 50%, and of $Q_3$ is 75%). However, this is just restating the definition of quartiles as percentiles.
To determine the 'quartile rank' of a specific value $x$ in the first sense, you need to calculate the quartiles ($Q_1, Q_2, Q_3$) for the dataset and then compare the value $x$ to these calculated quartile values.
Example
Example 2. For the dataset from Example 1 (12, 15, 18, 19, 20, 22, 25, 28, 30), determine the quartile rank for the score 19. From Example 1, Section I2, we found $Q_1=16.5$, $Q_2=20$ (Median), and $Q_3=26.5$.
Answer:
Given: Dataset: 12, 15, 18, 19, 20, 22, 25, 28, 30. Value $x=19$. Quartiles $Q_1=16.5$, $Q_2=20$, $Q_3=26.5$.
To Determine: The quartile rank for the score 19.
Solution:
We compare the value $x=19$ with the calculated quartile values:
- Is $19 \le Q_1$ (16.5)? No, $19 > 16.5$.
- Is $16.5 < 19 \le Q_2$ (20)? Yes, $16.5 < 19 \le 20$.
Since the score 19 falls in the range $(Q_1, Q_2]$, which is the second quarter of the data (between the 25th and 50th percentiles), it belongs to the **second quartile range**.
Therefore, the 'quartile rank' for the score 19 is considered to be 2.
Note: This interpretation means the value falls within the second 25% of the ordered data (specifically, between the 25th and 50th percentile values). Its exact percentile rank (from Example 1 in this section, calculation for 22 gave $\approx 66.7$, so 19 would be lower, likely between 25 and 50) would provide a more precise measure of its standing.
Interquartile Range and Quartile Deviation (Implicit)
Measures of Dispersion based on Quartiles
While the Range is simple, its sensitivity to outliers is a major drawback. Measures of dispersion based on quartiles provide robust alternatives that focus on the spread of the central portion of the data, making them resistant to extreme values.
These measures are calculated using the first quartile ($Q_1$) and the third quartile ($Q_3$), which are relatively unaffected by the lowest 25% and highest 25% of the data.
1. Interquartile Range (IQR)
The **Interquartile Range (IQR)** is a measure of dispersion that represents the range covered by the middle 50% of the data. It is simply the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$).
Definition:
The IQR is the distance between the 75th percentile and the 25th percentile.Formula:
IQR $= Q_3 - Q_1$
... (1)
Interpretation:
The IQR tells us the spread of the central half of the observations. A small IQR indicates that the middle 50% of the data values are clustered closely together around the median. A large IQR suggests that the middle half of the data is more spread out. It provides a clearer picture of the spread for the majority of the data compared to the full range.Robustness:
Because the IQR is calculated using only $Q_1$ and $Q_3$, it is not directly affected by the minimum or maximum values, making it a resistant measure of dispersion, especially useful for skewed distributions or data with outliers.
2. Quartile Deviation (QD) or Semi-Interquartile Range
The **Quartile Deviation (QD)** is another measure of dispersion derived from quartiles. It is defined as half of the Interquartile Range.
Definition:
The QD is the average distance of the first and third quartiles from the median (in a symmetric distribution). It represents half the range of the central 50% of the data.Formula:
QD $= \frac{\text{Interquartile Range}}{2} = \frac{Q_3 - Q_1}{2}$
... (2)
Interpretation:
The QD gives an idea of the spread around the median. If the distribution were symmetric, the distance from the median to $Q_1$ would equal the distance from the median to $Q_3$, and the QD would be equal to this distance. Like the IQR, the QD is a resistant measure of dispersion because it is not affected by extreme values. It is particularly useful for skewed distributions or when the median is the preferred measure of central tendency.
Example
Example 1. For the dataset used in Example 1, Section I2 (12, 15, 18, 19, 20, 22, 25, 28, 30), we found the first quartile $Q_1=16.5$ and the third quartile $Q_3=26.5$. Calculate the Interquartile Range (IQR) and the Quartile Deviation (QD).
Answer:
Given: $Q_1 = 16.5$ and $Q_3 = 26.5$ for the dataset.
To Calculate: Interquartile Range (IQR) and Quartile Deviation (QD).
Solution:
Calculate IQR:
Using the formula IQR $= Q_3 - Q_1$:
IQR $= 26.5 - 16.5$
... (i)
IQR $= 10$
... (ii)
The Interquartile Range is 10.
This means the middle 50% of the scores in this dataset span a range of 10 units.
Calculate QD:
Using the formula QD $= \frac{Q_3 - Q_1}{2}$:
QD $= \frac{26.5 - 16.5}{2}$
... (iii)
QD $= \frac{10}{2}$
QD $= 5$
... (iv)
The Quartile Deviation is 5.