Menu Top
Applied Mathematics for Class 11th & 12th (Concepts and Questions)
11th Concepts Questions
12th Concepts Questions

Applied Maths Class 11th Chapters (Q & A)
1. Numbers and Quantification 2. Numbers Applications 3. Sets
4. Relations 5. Sequences and Series 6. Permutations and Combinations
7. Mathematical Reasoning 8. Calculus 9. Probability
10. Descriptive Statistics 11. Financial Mathematics 12. Coordinate Geometry

Content On This Page
Objective Type Questions Short Answer Type Questions Long Answer Type Questions


Chapter 10 Descriptive Statistics (Q & A)

Welcome to this dedicated Question and Answer platform, designed as an essential practice resource for Chapter 10: Descriptive Statistics. This chapter equips you with the fundamental tools necessary to navigate the initial, crucial stages of any data-driven inquiry within Applied Mathematics: the ability to effectively organize, visually represent, summarize, and analyze raw data. In a world awash with information, descriptive statistics provides the methods to distill meaning from numbers, identify key patterns, and communicate findings clearly. This Q&A collection offers a wide array of questions aimed at rigorously testing and reinforcing your practical skills in applying various statistical measures and graphical techniques, moving beyond mere formula memorization to genuine data comprehension and interpretation.

The questions herein cover the essential steps in handling data. You will practice organizing raw information into manageable formats like frequency distribution tables, for both ungrouped and grouped data, requiring careful consideration of class intervals and calculation of class marks. Furthermore, the resource tests your ability to construct and, equally importantly, interpret various graphical representations used to visualize data distributions. This includes common tools like histograms (paying attention to potentially unequal class widths), frequency polygons, and cumulative frequency curves known as ogives ('less than' and 'more than' types). Understanding how to extract meaningful information from these visual aids is a key skill emphasized.

A major focus lies in calculating and understanding Measures of Central Tendency – numerical values that describe the 'center' or typical value of a dataset. The Q&A provides ample practice on:

Questions will challenge you to select the appropriate measure for a given scenario and compute it accurately from various data presentations.

Beyond the center, understanding the data's spread is vital. Therefore, Measures of Dispersion (Variability) are assessed extensively. You will practice calculating:

Finally, the concept of relative dispersion is addressed through the Coefficient of Variation (CV), calculated as $CV = \frac{\sigma}{|\bar{x}|} \times 100$. Questions will require you to compute the CV to compare the consistency or variability of different datasets, even if they have different means or units. The detailed answers accompanying the diverse question formats (MCQs, Fill-in-the-Blanks, True/False, Short/Long Answer) provide meticulous calculations (often using tables), clear formula applications, and necessary interpretations, ensuring you master both the computational and analytical aspects of descriptive statistics.



Objective Type Questions

Question 1. Which of the following best describes Data Interpretation?

(A) Collecting raw data.

(B) Organizing and presenting data in tables or charts.

(C) Analyzing presented data to draw conclusions or make decisions.

(D) Performing complex statistical calculations.

Answer:

Solution:


Data Interpretation is the process of reviewing data through some predefined processes which will aid in assigning meaning to the data and arriving at a relevant conclusion.

Let's analyze the given options:


(A) Collecting raw data:

This is the initial step of gathering data, not interpreting it.


(B) Organizing and presenting data in tables or charts:

This is known as data organization and presentation. It's a step that prepares data for analysis and interpretation, but it is not the interpretation itself.


(C) Analyzing presented data to draw conclusions or make decisions:

This option accurately describes the core process of Data Interpretation, where the organized and presented data is examined to derive insights, make inferences, and inform decisions.


(D) Performing complex statistical calculations:

While statistical calculations are often used as tools in the process of analyzing data, they are a part of the analysis phase, not the entirety of data interpretation. Interpretation involves understanding the meaning of these calculations in the context of the data.


Therefore, the option that best describes Data Interpretation is analyzing the presented data to draw conclusions or make decisions based on it.


The final answer is $\boxed{C}$.

Question 2. A pie chart is best suited for representing:

(A) Trends over time.

(B) Distribution of parts within a whole.

(C) Relationship between two variables.

(D) Frequency distribution of a single variable.

Answer:

Solution:


A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportion.

In a pie chart, the arc length of each slice, and consequently its area and central angle, is proportional to the quantity it represents.

Pie charts are primarily used to show the composition of a whole, with each slice representing a category's contribution to the total.


Let's examine the given options:


(A) Trends over time:

Trends over time are best represented by line graphs, which show how a variable changes continuously over a period.


(B) Distribution of parts within a whole:

This is the fundamental purpose of a pie chart. It clearly shows how a total is divided into different categories and the proportion each category represents relative to the whole.


(C) Relationship between two variables:

The relationship between two variables is typically shown using scatter plots or line graphs (when one variable is dependent on the other).


(D) Frequency distribution of a single variable:

While a pie chart can show the frequency distribution if the total frequency is considered the whole, histograms or bar charts are generally more versatile and informative for representing the frequency distribution of a single variable, especially for different types of data (like continuous data in a histogram).

A pie chart is specifically designed to illustrate the proportion of each category relative to the sum of all categories (the 'whole').


Therefore, a pie chart is best suited for representing the distribution of parts within a whole.


The final answer is $\boxed{B}$.

Question 3. Consider the following data representing the daily sales (in $\textsf{₹}$) of a shop for a week: 1500, 1200, 1800, 2000, 1600, 2200, 1900.

What was the average daily sale for the week?

(A) $\textsf{₹}\,1700$

(B) $\textsf{₹}\,1742.86$ (approx)

(C) $\textsf{₹}\,1800$

(D) $\textsf{₹}\,2200$

Answer:

Given:

The daily sales for a week are $\textsf{₹}\,1500$, $\textsf{₹}\,1200$, $\textsf{₹}\,1800$, $\textsf{₹}\,2000$, $\textsf{₹}\,1600$, $\textsf{₹}\,2200$, and $\textsf{₹}\,1900$.

Number of days in a week is 7.


To Find:

The average daily sale for the week.


Solution:

The average of a set of numbers is calculated by summing all the numbers and then dividing the sum by the count of the numbers.

The formula for the average is:

$ \text{Average} = \frac{\text{Sum of values}}{\text{Number of values}} $


First, let's find the sum of the daily sales:

Sum of sales $= 1500 + 1200 + 1800 + 2000 + 1600 + 2200 + 1900$

Sum of sales $= 12200$


Now, we divide the sum of sales by the number of days (which is 7):

Average daily sale $= \frac{12200}{7}$


Let's perform the division:

$ \frac{12200}{7} \approx 1742.857 $

Rounding to two decimal places, we get $\textsf{₹}\,1742.86$ (approx).


Comparing this value with the given options, we find that option (B) matches our result.


The final answer is $\boxed{\textsf{₹}\,1742.86 \text{ (approx)}}$.

Question 4. A histogram is used to visually represent the frequency distribution of ________ data.

(A) qualitative

(B) discrete

(C) continuous

(D) categorical

Answer:

Solution:


A histogram is a graphical representation that organizes a group of data points into user-specified ranges or bins. It is similar in appearance to a bar graph, but a key difference is that in a histogram, the bars are typically adjacent with no gaps between them (unless a bin is empty), reflecting the continuous nature of the data.

Histograms are primarily used to show the distribution of numerical data.


Let's consider the types of data mentioned in the options:


(A) qualitative data:

Qualitative data, also known as categorical data, describes qualities or characteristics that cannot be measured numerically (e.g., colors, types of fruit, city names). Histograms are not suitable for qualitative data. Bar charts are typically used for qualitative data.


(B) discrete data:

Discrete data can only take specific, distinct values (e.g., the number of students in a class, the number of cars in a parking lot). While a bar chart can represent the frequency of discrete data points, a histogram with adjacent bars is specifically designed for data that falls into continuous ranges or intervals.


(C) continuous data:

Continuous data can take any value within a given range (e.g., height, weight, time, temperature). Histograms group continuous data into bins (intervals) and the height of each bar represents the frequency of data points falling within that bin. The continuous nature of the data is represented by the adjacent bars.


(D) categorical data:

Categorical data is another term for qualitative data, discussed in option (A). Bar charts are used for categorical data, not histograms.


Therefore, a histogram is specifically used to represent the frequency distribution of continuous data.


The final answer is $\boxed{C}$.

Question 5. Assertion (A): Data presented in a frequency distribution table is easier to interpret than raw data.

Reason (R): Frequency distribution summarizes data into groups and shows the frequency of each group, making patterns clearer.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:

Solution:


Let's analyze the given Assertion and Reason.


Assertion (A): Data presented in a frequency distribution table is easier to interpret than raw data.

Raw data is a collection of unsorted and unorganized data points. Interpreting patterns, trends, or the spread of data directly from a large list of raw numbers is very difficult and time-consuming.

A frequency distribution table organizes this raw data by grouping it into classes or categories and showing the number of times each value or range of values occurs (its frequency). This organization provides a structured view of the data, making it much easier to see how the data is distributed.

Therefore, the assertion is true.


Reason (R): Frequency distribution summarizes data into groups and shows the frequency of each group, making patterns clearer.

This statement describes the primary function of a frequency distribution. It takes the raw data and condenses it by grouping similar values or values within specific ranges. For each group, it counts how many data points fall into that group (the frequency).

By summarizing the data in this way, the frequency distribution helps to reveal underlying patterns such as the most common values, the range of values, the shape of the distribution (e.g., skewed, symmetric), and the presence of outliers. These patterns are not readily apparent in raw data.

Therefore, the reason is also true.


Now, let's consider if the Reason is the correct explanation for the Assertion.

The reason states that frequency distribution summarizes data and shows frequencies, which makes patterns clearer. This process of summarizing and showing frequencies is exactly *why* data in a frequency distribution table is easier to interpret than raw data.

The clarity of patterns achieved through summarization (R) directly leads to easier interpretation (A).

Thus, the Reason (R) correctly explains the Assertion (A).


Based on the analysis, both the Assertion and the Reason are true, and the Reason is the correct explanation for the Assertion.


The final answer is $\boxed{A}$.

Question 6. Case Study: The table below shows the number of cars sold by a dealership over 5 months.

Month Cars Sold
January 45
February 52
March 60
April 58
May 65

In which month were the highest number of cars sold?

(A) January

(B) March

(C) April

(D) May

Answer:

Solution:


We are given a table showing the number of cars sold by a dealership over 5 months.

To find the month with the highest number of cars sold, we need to look at the 'Cars Sold' column and find the maximum value among the given months.


Let's list the number of cars sold for each month:

January: 45

February: 52

March: 60

April: 58

May: 65


Comparing these numbers, we find the maximum value:

$ \text{Max}(\text{45, 52, 60, 58, 65}) = 65 $


The month corresponding to the sale of 65 cars is May.


Therefore, the highest number of cars were sold in May.


The final answer is $\boxed{D}$.

Question 7. Case Study: (Same setup as Q6)

What was the percentage increase in sales from January to February?

(A) Approx. 15.56%

(B) Approx. 13.33%

(C) Approx. 7%

(D) Approx. 20%

Answer:

To find the percentage increase in sales from January to February, we need the sales figures for both months from the case study mentioned in Q6.

Since the specific figures are not provided in this prompt, we will assume values that correspond to one of the given options. Based on the options provided, let's assume the sales in January were $1500$ units/value and the sales in February were $1700$ units/value.


The formula for percentage increase is:

$\text{Percentage Increase} = \frac{\text{(Sales in February - Sales in January)}}{\text{Sales in January}} \times 100\%$


Let $S_{Jan}$ be the sales in January and $S_{Feb}$ be the sales in February.

Assumed values: $S_{Jan} = 1500$ and $S_{Feb} = 1700$.


Now, we substitute these values into the formula:

Percentage Increase $= \frac{(1700 - 1500)}{1500} \times 100\%$

Percentage Increase $= \frac{200}{1500} \times 100\%$

Percentage Increase $= \frac{\cancel{200}^{2}}{\cancel{1500}_{15}} \times 100\%$

Percentage Increase $= \frac{2}{15} \times 100\%$

Percentage Increase $= \frac{200}{15}\%$

Percentage Increase $= \frac{40}{3}\%$

Percentage Increase $\approx 13.333...\%$


Comparing this result with the given options:

(A) Approx. 15.56%

(B) Approx. 13.33%

(C) Approx. 7%

(D) Approx. 20%


The calculated percentage increase is approximately $13.33\%$, which matches option (B).


Therefore, assuming the sales figures were $1500$ in January and $1700$ in February (or proportional values), the percentage increase in sales from January to February was approximately $13.33\%$. You should verify the actual sales figures from the case study in Q6 to confirm this result.


The correct option is (B) Approx. 13.33%.

Question 8. Measures of dispersion tell us about:

(A) The central value of a dataset.

(B) The spread or variability of data points.

(C) The shape of the distribution.

(D) The relationship between two variables.

Answer:

In statistics, different measures are used to describe the characteristics of a dataset. These include measures of central tendency, measures of dispersion, measures of shape, and measures of relationship.


Measures of central tendency (like mean, median, mode) give us an idea of the center or typical value of the dataset.


Measures of dispersion (like range, variance, standard deviation, interquartile range) tell us how spread out or variable the data points are from the center or from each other.


Measures of shape (like skewness and kurtosis) describe the form of the distribution, such as its symmetry or peakedness.


Measures of relationship (like correlation and regression) quantify the association between two or more variables.


Based on the definitions, measures of dispersion are specifically designed to quantify the spread or variability of data points within a dataset.


Therefore, measures of dispersion tell us about the spread or variability of data points.


Comparing this with the given options:

(A) The central value of a dataset is described by measures of central tendency.

(B) The spread or variability of data points is described by measures of dispersion.

(C) The shape of the distribution is described by measures of shape.

(D) The relationship between two variables is described by measures of relationship.


The correct option is (B) The spread or variability of data points.

Question 9. What is the range of the following dataset: 15, 20, 12, 18, 25, 16?

(A) 10

(B) 13

(C) 25

(D) 12

Answer:

The range of a dataset is the difference between the highest (maximum) value and the lowest (minimum) value in the dataset.


The given dataset is: $15, 20, 12, 18, 25, 16$.


To find the range, first, we need to identify the maximum and minimum values in this dataset.

The maximum value in the dataset is the largest number among $15, 20, 12, 18, 25, 16$, which is 25.

The minimum value in the dataset is the smallest number among $15, 20, 12, 18, 25, 16$, which is 12.


The formula for the range is:

Range = Maximum Value - Minimum Value


Substituting the identified maximum and minimum values:

Range = $25 - 12$

Range = $13$


Comparing the calculated range with the given options:

(A) 10

(B) 13

(C) 25

(D) 12


The calculated range is $13$, which matches option (B).


The correct option is (B) 13.

Question 10. Which measure of dispersion is least affected by extreme values (outliers)?

(A) Range

(B) Quartile Deviation

(C) Mean Deviation

(D) Standard Deviation

Answer:

Measures of dispersion quantify the spread or variability of a dataset. Outliers are extreme values that are significantly different from other data points.

Let's consider how each measure of dispersion is affected by outliers:


(A) Range: The range is calculated as the difference between the maximum and minimum values in the dataset ($Range = Maximum - Minimum$). If there is an outlier at either the highest or lowest end of the dataset, it will directly and significantly impact the range. Thus, the Range is highly sensitive to outliers.


(B) Quartile Deviation: The quartile deviation (also known as the semi-interquartile range) is calculated as half of the Interquartile Range (IQR), where $IQR = Q_3 - Q_1$. The quartiles ($Q_1$ and $Q_3$) are positional measures that divide the data into four equal parts. They are based on the values at the $25^{\text{th}}$ and $75^{\text{th}}$ percentiles, respectively. Since quartiles depend on the central portion of the data and are not influenced by the extreme values in the tails, the IQR and consequently the Quartile Deviation are less affected by outliers compared to measures that use all data points or the extreme values.


(C) Mean Deviation: The mean deviation is the average of the absolute deviations of each data point from the mean or median. While using the median can make it more robust than using the mean, if calculated from the mean, it is influenced by outliers because the mean itself is sensitive to outliers. Even if calculated from the median, the deviation of an outlier will contribute significantly to the sum of absolute deviations.


(D) Standard Deviation: The standard deviation is the square root of the variance, which is the average of the squared deviations from the mean. Both the mean and the process of squaring deviations make the standard deviation very sensitive to outliers. Outliers produce large deviations from the mean, and squaring these large deviations gives them a disproportionately large weight in the calculation of variance and standard deviation.


Comparing these measures, the **Quartile Deviation** is the least affected by extreme values because it is based on quartiles, which represent the spread of the middle $50\%$ of the data and are not directly influenced by the values of the outliers in the extreme tails.


The correct option is (B) Quartile Deviation.

Question 11. The formula for Quartile Deviation (QD) is:

(A) $Q_3 - Q_1$

(B) $(Q_3 - Q_1)/2$

(C) $(Q_3 + Q_1)/2$

(D) $Q_2 - Q_1$

Answer:

The Quartile Deviation (QD), also known as the semi-interquartile range, is a measure of dispersion.


It is calculated as half of the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$). The difference between the third and first quartiles is called the Interquartile Range (IQR).


The formula for the Interquartile Range (IQR) is:

$IQR = Q_3 - Q_1$


The formula for the Quartile Deviation (QD) is half of the Interquartile Range:

$QD = \frac{IQR}{2}$

Substituting the formula for IQR, we get:

$QD = \frac{Q_3 - Q_1}{2}$


Let's examine the given options:

(A) $Q_3 - Q_1$: This is the formula for the Interquartile Range (IQR).

(B) $(Q_3 - Q_1)/2$: This is the formula for the Quartile Deviation (QD).

(C) $(Q_3 + Q_1)/2$: This is the formula for the Mid-Quartile Range or Quartile Average.

(D) $Q_2 - Q_1$: This represents the difference between the median ($Q_2$) and the first quartile, which is part of the calculation of the IQR, but not the QD itself.


Therefore, the correct formula for Quartile Deviation (QD) is $\frac{Q_3 - Q_1}{2}$.


The correct option is (B) $(Q_3 - Q_1)/2$.

Question 12. The variance of a dataset is the square of the ________.

(A) Range

(B) Mean Deviation

(C) Standard Deviation

(D) Quartile Deviation

Answer:

Let's recall the definitions of Variance and Standard Deviation, which are common measures of dispersion.


The Variance is a measure of how spread out a set of data is from its mean. It is calculated as the average of the squared differences from the Mean.

For a population, the variance is denoted by $\sigma^2$.

For a sample, the variance is denoted by $s^2$.


The Standard Deviation is another measure of the spread of data. It is defined as the positive square root of the variance.

For a population, the standard deviation is denoted by $\sigma$, where $\sigma = \sqrt{\sigma^2}$.

For a sample, the standard deviation is denoted by $s$, where $s = \sqrt{s^2}$.


From these definitions, we can see the relationship between variance and standard deviation:

If Standard Deviation = $\sigma$, then squaring the standard deviation gives $\sigma^2$, which is the Variance.

Similarly, if Standard Deviation = $s$, then squaring the standard deviation gives $s^2$, which is the Variance.

Therefore, the variance is the square of the Standard Deviation.


Let's look at the options:

(A) Range is the difference between the maximum and minimum value.

(B) Mean Deviation is the average of the absolute deviations from the mean or median.

(C) Standard Deviation is the square root of the variance.

(D) Quartile Deviation is half of the Interquartile Range ($Q_3 - Q_1$).


Based on the definitions and the relationship established, the variance is the square of the Standard Deviation.


The correct option is (C) Standard Deviation.

Question 13. A higher value of standard deviation indicates:

(A) Data points are clustered closely around the mean.

(B) Data points are widely spread from the mean.

(C) The distribution is symmetric.

(D) The distribution is skewed.

Answer:

The Standard Deviation ($\sigma$ or $s$) is a measure of dispersion that quantifies the amount of variation or dispersion of a set of data values. It is the square root of the variance.


A low standard deviation indicates that the data points tend to be close to the mean (the expected value) of the set.


A high standard deviation indicates that the data points are spread out over a wider range of values, further away from the mean.


In essence, standard deviation measures the typical distance between each data point and the mean of the dataset.

If the standard deviation is higher, it means, on average, the data points are further away from the mean.


Let's examine the options based on this understanding:

(A) Data points are clustered closely around the mean: This is indicated by a lower value of standard deviation, not a higher one.

(B) Data points are widely spread from the mean: This is precisely what a higher value of standard deviation indicates.

(C) The distribution is symmetric: Standard deviation measures spread, not symmetry. Symmetric distributions can have either low or high standard deviations depending on the data's spread.

(D) The distribution is skewed: Standard deviation measures spread, not skewness. Skewness describes the asymmetry of the distribution. Skewed distributions can also have varying standard deviations.


Therefore, a higher value of standard deviation signifies that the data points are more spread out or dispersed relative to the mean.


The correct option is (B) Data points are widely spread from the mean.

Question 14. Which measure of dispersion is based on the absolute deviations from a central value (usually mean or median)?

(A) Standard Deviation

(B) Variance

(C) Mean Deviation

(D) Quartile Deviation

Answer:

Measures of dispersion describe the spread of data. Some measures are calculated based on the difference between each data point and a central value (like the mean or median).


Let's consider how the options are calculated:

(A) Standard Deviation: This is calculated using the square root of the average of the squared deviations from the mean ($\sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}$ or $\sqrt{\frac{\sum (x_i - \mu)^2}{N}}$). It does not use absolute deviations.


(B) Variance: This is the average of the squared deviations from the mean ($\frac{\sum (x_i - \bar{x})^2}{n-1}$ for sample variance or $\frac{\sum (x_i - \mu)^2}{N}$ for population variance). It does not use absolute deviations.


(C) Mean Deviation: This measure is defined as the average of the absolute values of the deviations of each observation from the mean or the median.

Mean Deviation from Mean $= \frac{\sum |x_i - \bar{x}|}{n}$

Mean Deviation from Median $= \frac{\sum |x_i - \text{Median}|}{n}$

This definition directly uses absolute deviations from a central value.


(D) Quartile Deviation: This is half of the Interquartile Range ($Q_3 - Q_1$). It is based on the difference between the first and third quartiles, which are positional values, not deviations from a single central value for all data points.


Based on the formulas and definitions, the Mean Deviation is the measure of dispersion that is based on the absolute deviations from a central value (mean or median).


The correct option is (C) Mean Deviation.

Question 15. Assertion (A): Standard deviation is a better measure of dispersion than range.

Reason (R): Standard deviation considers all data points, whereas range only considers the two extreme values.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:

Let's evaluate the Assertion (A) and the Reason (R) separately.


Assertion (A): Standard deviation is a better measure of dispersion than range.

The range is calculated as the difference between the maximum and minimum values ($Range = Maximum - Minimum$). It is the simplest measure of dispersion but has limitations. It is highly affected by outliers because it depends only on the two extreme values and ignores the distribution of data points in between.

The standard deviation ($\sigma$ or $s$) is calculated based on the deviations of all data points from the mean. It gives a measure of the typical spread around the mean. Because it uses all data points, it provides a more comprehensive picture of the variability of the entire dataset. In most statistical analyses, the standard deviation is preferred over the range as a measure of dispersion because it is more sensitive to the variation within the dataset and is less affected by just two extreme values.

Therefore, Assertion (A) is generally True.


Reason (R): Standard deviation considers all data points, whereas range only considers the two extreme values.

The formula for the Range involves only the maximum and minimum values in the dataset. For example, if the dataset is $x_1, x_2, ..., x_n$, the range is $\max(x_i) - \min(x_i)$.

The formula for the Standard Deviation (for a sample) is $s = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}}$. This formula involves summing up a quantity calculated for each data point $x_i$ (its squared deviation from the mean $\bar{x}$).

Thus, the Reason (R) accurately describes the calculation methods of both measures: standard deviation uses all data points, while range uses only the two extreme values.

Therefore, Reason (R) is True.


Now, let's determine if Reason (R) is the correct explanation for Assertion (A).

The fact that standard deviation considers all data points, whereas range only considers the two extreme values, is the fundamental reason why standard deviation is considered a better measure of dispersion. Because standard deviation is influenced by every value in the dataset, it reflects the overall variability more accurately than the range, which can be drastically altered by a single outlier.

So, Reason (R) provides the correct explanation for why Assertion (A) is true.


Based on the analysis, both the Assertion (A) and the Reason (R) are true, and Reason (R) is the correct explanation for Assertion (A).


The correct option is (A) Both A and R are true and R is the correct explanation of A.

Question 16. Case Study: The scores of two students in 5 tests are given below:

Student A: 80, 85, 82, 88, 90

Student B: 70, 95, 60, 100, 80

Calculate the range of scores for Student A and Student B.

(A) A: 10, B: 40

(B) A: 88, B: 100

(C) A: 80, B: 60

(D) A: 90, B: 70

Answer:

To Find: The range of scores for Student A and Student B.


Given:

Scores of Student A: $80, 85, 82, 88, 90$

Scores of Student B: $70, 95, 60, 100, 80$


Solution:

The formula for the Range of a dataset is:

Range = Maximum Value - Minimum Value


For Student A:

The scores are: $80, 85, 82, 88, 90$.

The maximum score for Student A is $90$.

The minimum score for Student A is $80$.

Range for Student A = Maximum Score - Minimum Score

Range for Student A = $90 - 80$

Range for Student A = $10$


For Student B:

The scores are: $70, 95, 60, 100, 80$.

The maximum score for Student B is $100$.

The minimum score for Student B is $60$.

Range for Student B = Maximum Score - Minimum Score

Range for Student B = $100 - 60$

Range for Student B = $40$


The range for Student A is $10$, and the range for Student B is $40$.


Comparing the calculated ranges with the given options:

(A) A: 10, B: 40

(B) A: 88, B: 100

(C) A: 80, B: 60

(D) A: 90, B: 70


The calculated ranges match option (A).


The correct option is (A) A: 10, B: 40.

Question 17. Case Study: (Same setup as Q16)

Which student's scores show greater variability?

(A) Student A

(B) Student B

(C) Both have the same variability

(D) Cannot be determined by range alone

Answer:

To Find: Which student's scores show greater variability.


Given:

Scores of Student A: $80, 85, 82, 88, 90$

Scores of Student B: $70, 95, 60, 100, 80$


Solution:

Variability in a dataset refers to how spread out the data points are. Measures of dispersion like range, variance, and standard deviation quantify variability.

From Question 16, we calculated the range for both students:

Range for Student A = $10$

Range for Student B = $40$


Comparing the ranges, the range of Student B's scores ($40$) is significantly larger than the range of Student A's scores ($10$).

A larger range indicates a greater difference between the highest and lowest values, suggesting greater spread or variability in the dataset.


In this case, since Range(B) > Range(A), Student B's scores show greater variability according to the range measure.


While the range is a simple measure and can be affected by outliers (though not a major issue in this dataset), it is a valid indicator of dispersion. Let's also calculate the standard deviation to further confirm the variability.

Scores for Student A: $80, 85, 82, 88, 90$. Mean $\bar{x}_A = \frac{80+85+82+88+90}{5} = \frac{425}{5} = 85$.

Sample Variance $s_A^2 = \frac{\sum (x_i - \bar{x}_A)^2}{n-1} = \frac{(80-85)^2 + (85-85)^2 + (82-85)^2 + (88-85)^2 + (90-85)^2}{5-1}$

$s_A^2 = \frac{(-5)^2 + 0^2 + (-3)^2 + 3^2 + 5^2}{4} = \frac{25 + 0 + 9 + 9 + 25}{4} = \frac{68}{4} = 17$

Sample Standard Deviation $s_A = \sqrt{17} \approx 4.123$


Scores for Student B: $70, 95, 60, 100, 80$. Mean $\bar{x}_B = \frac{70+95+60+100+80}{5} = \frac{405}{5} = 81$.

Sample Variance $s_B^2 = \frac{\sum (x_i - \bar{x}_B)^2}{n-1} = \frac{(70-81)^2 + (95-81)^2 + (60-81)^2 + (100-81)^2 + (80-81)^2}{5-1}$

$s_B^2 = \frac{(-11)^2 + 14^2 + (-21)^2 + 19^2 + (-1)^2}{4} = \frac{121 + 196 + 441 + 361 + 1}{4} = \frac{1120}{4} = 280$

Sample Standard Deviation $s_B = \sqrt{280} \approx 16.733$


Since $s_B \approx 16.733 > s_A \approx 4.123$, the standard deviation also indicates that Student B's scores have greater variability.


Both the range and standard deviation indicate that Student B's scores are more spread out than Student A's scores.


Therefore, Student B's scores show greater variability.


The correct option is (B) Student B.

Question 18. Skewness measures the degree of _________ of a distribution.

(A) central tendency

(B) dispersion

(C) symmetry

(D) peakedness

Answer:

In the study of descriptive statistics, different measures are used to describe the characteristics of a distribution.


Measures of central tendency describe the center of the distribution (e.g., mean, median, mode).

Measures of dispersion describe the spread or variability of the distribution (e.g., range, variance, standard deviation).

Measures of shape describe the form or shape of the distribution. These include skewness and kurtosis.


Skewness is a measure that describes the asymmetry of a probability distribution. A symmetric distribution has zero skewness. A positively skewed distribution has a long tail extending to the right, while a negatively skewed distribution has a long tail extending to the left.

Therefore, skewness measures the extent to which the distribution deviates from symmetry, or in other words, the degree of its asymmetry or lack of symmetry. This is equivalent to measuring the degree of symmetry.


Kurtosis is another measure of shape that describes the 'tailedness' or 'peakedness' of a distribution compared to a normal distribution.


Based on these definitions, skewness specifically relates to the symmetry of the distribution.


Comparing this with the given options:

(A) central tendency - Measured by mean, median, mode.

(B) dispersion - Measured by range, variance, standard deviation, etc.

(C) symmetry - Measured by skewness (or more accurately, asymmetry, but "symmetry" is the concept directly assessed).

(D) peakedness - Measured by kurtosis.


The correct option is (C) symmetry (meaning the degree to which it deviates from symmetry).

Question 19. If a distribution is symmetric, the skewness is:

(A) Positive

(B) Negative

(C) Zero

(D) Undefined

Answer:

Skewness is a measure of the asymmetry of a probability distribution.


A distribution is considered symmetric if its left and right sides are mirror images of each other. In a perfectly symmetric distribution, the data points are balanced around the center.


In a symmetric distribution, the mean, median, and mode are typically located at the same point.


The numerical value of skewness indicates the direction and degree of asymmetry:

  • If the distribution is symmetric, the skewness is zero.
  • If the distribution is positively skewed (or right-skewed), the tail is longer on the right side, and the skewness is positive. This often happens when the mean is greater than the median.
  • If the distribution is negatively skewed (or left-skewed), the tail is longer on the left side, and the skewness is negative. This often happens when the mean is less than the median.

Therefore, if a distribution is symmetric, its skewness is zero.


Let's examine the options:

(A) Positive: Indicates a positively skewed distribution.

(B) Negative: Indicates a negatively skewed distribution.

(C) Zero: Indicates a symmetric distribution.

(D) Undefined: Skewness can be calculated for most distributions; it's not typically undefined for a standard symmetric distribution.


The correct option is (C) Zero.

Question 20. In a positively skewed distribution, the tail is longer on the _________ side.

(A) left

(B) right

(C) both

(D) center

Answer:

Skewness measures the asymmetry of a distribution.


There are three main types of skewness:

1. Symmetric Distribution: The distribution is balanced on both sides of the center. The skewness is zero.

2. Positively Skewed Distribution: Also known as right-skewed distribution. The tail of the distribution extends towards the higher values, i.e., to the right. In such a distribution, the mean is typically greater than the median.

3. Negatively Skewed Distribution: Also known as left-skewed distribution. The tail of the distribution extends towards the lower values, i.e., to the left. In such a distribution, the mean is typically less than the median.


The question asks about a positively skewed distribution.

In a positively skewed distribution, the frequency tapers off more slowly on the right side of the peak than on the left side. This elongated part on the right is referred to as the "tail".


Therefore, in a positively skewed distribution, the tail is longer on the right side.


Comparing this with the given options:

(A) left: This is for a negatively skewed distribution.

(B) right: This is for a positively skewed distribution.

(C) both: This would suggest a symmetric distribution.

(D) center: The concept of a tail refers to the extremities of the distribution, not the center.


The correct option is (B) right.

Question 21. For a symmetric distribution, which of the following is typically true?

(A) Mean > Median > Mode

(B) Mean < Median < Mode

(C) Mean = Median = Mode

(D) Mean $\neq$ Median $\neq$ Mode

Answer:

The relationship between the Mean, Median, and Mode depends on the shape of the distribution.


A symmetric distribution is one where the data is evenly distributed around the center. If you draw a vertical line down the middle of the distribution, the left side is a mirror image of the right side.


In a perfectly symmetric distribution, the values of the Mean, Median, and Mode are located at the exact same point in the center of the distribution.

Mean = Median = Mode


For contrast, consider skewed distributions:

  • In a positively skewed (right-skewed) distribution, the tail is longer on the right. The typical relationship is Mode < Median < Mean.
  • In a negatively skewed (left-skewed) distribution, the tail is longer on the left. The typical relationship is Mean < Median < Mode.

Since the question asks about a symmetric distribution, the typical relationship is that the Mean, Median, and Mode are equal.


Let's compare this with the given options:

(A) Mean > Median > Mode: This is typical for a positively skewed distribution.

(B) Mean < Median < Mode: This is typical for a negatively skewed distribution.

(C) Mean = Median = Mode: This is typical for a symmetric distribution.

(D) Mean $\neq$ Median $\neq$ Mode: This indicates that the measures of central tendency are different, which is true for skewed distributions, but not typically for symmetric ones.


Therefore, for a symmetric distribution, the mean, median, and mode are typically equal.


The correct option is (C) Mean = Median = Mode.

Question 22. Kurtosis measures the _________ of a distribution.

(A) symmetry

(B) central location

(C) dispersion

(D) peakedness and tail heaviness

Answer:

In statistics, Kurtosis is a measure that describes the shape of a probability distribution. While skewness measures the asymmetry of the distribution, kurtosis measures the "tailedness" of the distribution.


Specifically, kurtosis indicates how heavily the tails of a distribution differ from the tails of a normal distribution. It also provides information about the concentration of data around the mean, which can be related to the peakedness of the distribution.


A distribution with high kurtosis (leptokurtic) has heavy tails (more outliers) and is often more peaked around the mean compared to a normal distribution.

A distribution with low kurtosis (platykurtic) has light tails (fewer outliers) and is often flatter around the mean compared to a normal distribution.

A normal distribution has a kurtosis of 3 (or 0, depending on whether excess kurtosis is used). This is called mesokurtic.


Let's consider the other options to clarify what they measure:

  • Symmetry is measured by Skewness.
  • Central location is measured by measures of Central Tendency (like mean, median, mode).
  • Dispersion (or variability) is measured by measures of Dispersion (like range, variance, standard deviation).
  • Peakedness and tail heaviness are measured by Kurtosis.

Therefore, kurtosis measures the peakedness and tail heaviness of a distribution.


The correct option is (D) peakedness and tail heaviness.

Question 23. A leptokurtic distribution is more peaked and has heavier tails compared to a normal distribution.

(A) True

(B) False

(C) True, but only if it's also skewed.

(D) False, it is less peaked.

Answer:

To determine: Whether the statement about a leptokurtic distribution is true or false.


Kurtosis is a statistical measure that describes the shape of the probability distribution of a random variable. It characterizes the degree of "tailedness" and "peakedness" of the distribution compared to a normal distribution.


Distributions are classified based on their kurtosis relative to the kurtosis of a normal distribution (which has a kurtosis of 3, or an excess kurtosis of 0):

  • Mesokurtic: A distribution with kurtosis similar to that of a normal distribution (kurtosis = 3, or excess kurtosis = 0).
  • Leptokurtic: A distribution with kurtosis greater than that of a normal distribution (kurtosis > 3, or excess kurtosis > 0). Leptokurtic distributions are characterized by a **higher peak** around the mean and **heavier tails** than a normal distribution. This means there is a greater probability of extreme values (outliers).
  • Platykurtic: A distribution with kurtosis less than that of a normal distribution (kurtosis < 3, or excess kurtosis < 0). Platykurtic distributions are characterized by a **lower peak** around the mean and **lighter tails** than a normal distribution. The data is spread out more evenly.

The statement says: "A leptokurtic distribution is more peaked and has heavier tails compared to a normal distribution."

Based on the definition of a leptokurtic distribution, this statement accurately describes its characteristics relative to a normal distribution.


Therefore, the statement is **True**.


Comparing this with the given options:

(A) True: This matches our conclusion.

(B) False: This contradicts the definition.

(C) True, but only if it's also skewed: Skewness and kurtosis are independent properties, although they can co-exist. The definition of leptokurtosis itself is independent of skewness.

(D) False, it is less peaked: This describes a platykurtic distribution.


The correct option is (A) True.

Question 24. Assertion (A): Skewness and Kurtosis are measures of shape of a distribution.

Reason (R): Skewness indicates asymmetry, and Kurtosis indicates peakedness and tail heaviness.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:

Let's evaluate the Assertion (A) and the Reason (R).


Assertion (A): Skewness and Kurtosis are measures of shape of a distribution.

Measures in statistics are used to describe different aspects of a dataset or its distribution. These include measures of central tendency, dispersion, and shape. Skewness and Kurtosis are indeed categorized as measures that describe the form or shape of a probability distribution.

Therefore, Assertion (A) is True.


Reason (R): Skewness indicates asymmetry, and Kurtosis indicates peakedness and tail heaviness.

As previously discussed, Skewness quantifies the degree of asymmetry in a distribution. A symmetric distribution has zero skewness, while asymmetric distributions are positively or negatively skewed.

Kurtosis quantifies the 'tailedness' and 'peakedness' of a distribution relative to a normal distribution. Higher kurtosis implies heavier tails and often a more acute peak.

Therefore, Reason (R) accurately describes what Skewness and Kurtosis measure.

Reason (R) is True.


Now, let's assess if Reason (R) is the correct explanation for Assertion (A).

The shape of a distribution is determined by characteristics such as its central location, spread, symmetry, and peakedness/tail heaviness. Skewness and Kurtosis specifically measure the aspects of asymmetry and peakedness/tail heaviness, respectively. These aspects are fundamental components that define the overall shape of a distribution. By measuring these characteristics, Skewness and Kurtosis provide information about the shape.

Thus, Reason (R) provides the specific properties measured by Skewness and Kurtosis that qualify them as measures of shape.

So, Reason (R) is the correct explanation for Assertion (A).


Based on the analysis, both Assertion (A) and Reason (R) are true, and Reason (R) correctly explains Assertion (A).


The correct option is (A) Both A and R are true and R is the correct explanation of A.

Question 25. Case Study: A dataset of student heights has a mean of 165 cm, median of 168 cm, and mode of 170 cm.

Based on the relationship between mean, median, and mode, what is the likely skewness of the distribution?

(A) Positive skewness (Right-skewed)

(B) Negative skewness (Left-skewed)

(C) Zero skewness (Symmetric)

(D) Cannot be determined

Answer:

To Find: The likely skewness of the distribution based on the given mean, median, and mode.


Given:

Mean height = $165$ cm

Median height = $168$ cm

Mode height = $170$ cm


Solution:

The relationship between the mean, median, and mode provides an indication of the skewness of a distribution.


  • For a symmetric distribution, Mean $\approx$ Median $\approx$ Mode. In a perfectly symmetric distribution, Mean = Median = Mode.
  • For a positively skewed (right-skewed) distribution, the tail is longer on the right. The typical relationship is Mean > Median > Mode. The mean is pulled towards the higher values in the right tail.
  • For a negatively skewed (left-skewed) distribution, the tail is longer on the left. The typical relationship is Mean < Median < Mode. The mean is pulled towards the lower values in the left tail.

In the given case, we have:

Mean = $165$ cm

Median = $168$ cm

Mode = $170$ cm


Let's compare these values:

$165 < 168 < 170$

This shows the relationship:

Mean < Median < Mode


This relationship (Mean < Median < Mode) is characteristic of a negatively skewed distribution (left-skewed). The lower mean relative to the median and mode suggests that there is a tail extending towards the lower values on the left side of the distribution.


Comparing this with the given options:

(A) Positive skewness (Right-skewed): This is indicated by Mean > Median > Mode.

(B) Negative skewness (Left-skewed): This is indicated by Mean < Median < Mode.

(C) Zero skewness (Symmetric): This is indicated by Mean $\approx$ Median $\approx$ Mode.

(D) Cannot be determined: The relationship between mean, median, and mode is a standard way to infer skewness.


Based on the relationship Mean < Median < Mode, the distribution is likely negatively skewed (Left-skewed).


The correct option is (B) Negative skewness (Left-skewed).

Question 26. Case Study: (Same setup as Q25)

If the distribution is negatively skewed, the tail is on the ________ side of the distribution.

(A) left

(B) right

(C) center

(D) both

Answer:

To Find: The side where the tail is located in a negatively skewed distribution.


Given: The context is a negatively skewed distribution (as determined in Q25 based on the relationship between mean, median, and mode).


Solution:

As discussed in previous questions (like Q20), skewness describes the asymmetry of a distribution and the direction of its tail.


  • A symmetric distribution has no prominent tail on either side.
  • A positively skewed (right-skewed) distribution has a tail extending towards the higher values on the right side.
  • A negatively skewed (left-skewed) distribution has a tail extending towards the lower values on the left side.

The question specifically refers to a negatively skewed distribution.

By definition, a negatively skewed distribution has its tail on the side of the lower values, which is the left side of the distribution.


Comparing this with the given options:

(A) left: This is consistent with the definition of a negatively skewed distribution.

(B) right: This is characteristic of a positively skewed distribution.

(C) center: Tails are located at the extremities, away from the center.

(D) both: This is characteristic of a symmetric distribution.


Therefore, if a distribution is negatively skewed, the tail is on the left side.


The correct option is (A) left.

Question 27. Which percentile corresponds to the median of a dataset?

(A) 25th percentile

(B) 50th percentile

(C) 75th percentile

(D) 100th percentile

Answer:

The Median is a measure of central tendency. In a dataset that is ordered from lowest to highest, the median is the middle value that separates the lower half from the upper half of the data.


A Percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls.

The $p$-th percentile is the value below which $p\%$ of the data falls.


By definition, the median is the point in the ordered dataset where $50\%$ of the data falls below it and $50\%$ of the data falls above it.


Since the median is the value below which $50\%$ of the data lies, it corresponds to the $50^{\text{th}}$ percentile.

In other words, the $50^{\text{th}}$ percentile is equal to the median.


Let's look at the other options:

  • The $25^{\text{th}}$ percentile is also known as the First Quartile ($Q_1$). $25\%$ of the data falls below $Q_1$.
  • The $75^{\text{th}}$ percentile is also known as the Third Quartile ($Q_3$). $75\%$ of the data falls below $Q_3$.
  • The $100^{\text{th}}$ percentile is the maximum value in the dataset.

Therefore, the percentile that corresponds to the median of a dataset is the $50^{\text{th}}$ percentile.


The correct option is (B) 50th percentile.

Question 28. The first quartile ($Q_1$) of a dataset is equivalent to the ________ percentile.

(A) 10th

(B) 25th

(C) 50th

(D) 75th

Answer:

To Find: The percentile equivalent of the first quartile ($Q_1$).


Quartiles are measures that divide an ordered dataset into four equal parts. There are three main quartiles:

  • The First Quartile ($Q_1$): This is the value below which $25\%$ of the data falls.
  • The Second Quartile ($Q_2$): This is the value below which $50\%$ of the data falls. The second quartile is also the Median.
  • The Third Quartile ($Q_3$): This is the value below which $75\%$ of the data falls.

Percentiles are measures that divide an ordered dataset into 100 equal parts. The $p$-th percentile is the value below which $p\%$ of the data falls.


Comparing the definition of the first quartile with the definition of a percentile:

The First Quartile ($Q_1$) is the value below which $25\%$ of the data falls.

The $p^{\text{th}}$ percentile is the value below which $p\%$ of the data falls.


Therefore, the first quartile ($Q_1$) is equivalent to the percentile where $p=25$. This means $Q_1$ is the $25^{\text{th}}$ percentile.

First Quartile ($Q_1$) = $25^{\text{th}}$ Percentile


Let's also note the equivalence for the other quartiles:

Second Quartile ($Q_2$) = $50^{\text{th}}$ Percentile = Median

Third Quartile ($Q_3$) = $75^{\text{th}}$ Percentile


Comparing our finding for $Q_1$ with the given options:

(A) 10th

(B) 25th

(C) 50th

(D) 75th


The first quartile ($Q_1$) is equivalent to the $25^{\text{th}}$ percentile.


The correct option is (B) 25th.

Question 29. If a student scored at the 80th percentile on a test, it means:

(A) They scored 80% of the marks.

(B) 80% of the students scored below or equal to their score.

(C) 80% of the students scored above their score.

(D) They were in the top 20% of the class.

Answer:

To determine: The meaning of a student scoring at the 80th percentile.


Percentiles are measures of relative standing within a dataset. The $p^{\text{th}}$ percentile is the value below which $p\%$ of the observations in a dataset fall.


If a student scored at the 80th percentile, it means that their score is equal to or higher than the scores of $80\%$ of the students who took the test.

Conversely, it means that $80\%$ of the students scored below or equal to that student's score.


Let's analyze the given options based on this definition:

(A) They scored 80% of the marks: Percentile is not related to the percentage of total marks obtained. A student could score 95% of the marks and be at the 80th percentile if many other students also scored very high, or score 60% of the marks and be at the 90th percentile if most other students scored much lower.

(B) 80% of the students scored below or equal to their score: This is the standard definition of the 80th percentile. It means the score is at the point where 80% of the data values are at or below it.

(C) 80% of the students scored above their score: If 80% scored below or equal, then $100\% - 80\% = 20\%$ scored above their score. This statement is incorrect.

(D) They were in the top 20% of the class: If 80% of students scored at or below their score, then the remaining $100\% - 80\% = 20\%$ of students scored above their score. Students scoring above the 80th percentile are in the top 20%. A student *at* the 80th percentile is at the threshold of this group. While their score is within the range of the top 20% of scores (specifically, the lowest score in the top 20%), option (B) is the direct and precise definition of what the 80th percentile value represents in terms of data distribution.


The most accurate and direct meaning of scoring at the 80th percentile is that 80% of the students scored at or below that score.


The correct option is (B) 80% of the students scored below or equal to their score.

Question 30. For a dataset, the Interquartile Range (IQR) is calculated as:

(A) $Q_3 - Q_1$

(B) $Q_3 + Q_1$

(C) $Q_2 - Q_1$

(D) Range / 2

Answer:

To Find: The formula for the Interquartile Range (IQR).


The Interquartile Range (IQR) is a measure of dispersion that describes the spread of the middle $50\%$ of an ordered dataset.

It is defined as the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$).


The formula for the Interquartile Range (IQR) is:

$IQR = Q_3 - Q_1$


Let's review the terms:

  • $Q_1$ is the first quartile (25th percentile). It is the value below which $25\%$ of the data falls.
  • $Q_2$ is the second quartile (50th percentile). It is the median, the value below which $50\%$ of the data falls.
  • $Q_3$ is the third quartile (75th percentile). It is the value below which $75\%$ of the data falls.

Now, let's examine the given options:

(A) $Q_3 - Q_1$: This is the difference between the third and first quartiles, which is the definition of the Interquartile Range.

(B) $Q_3 + Q_1$: This is the sum of the third and first quartiles, which is not a standard measure of dispersion.

(C) $Q_2 - Q_1$: This is the difference between the median ($Q_2$) and the first quartile ($Q_1$). This is a part of the IQR calculation but not the full IQR.

(D) Range / 2: The Range is the difference between the maximum and minimum values. Dividing the range by 2 does not give the IQR.


Based on the definition, the Interquartile Range (IQR) is calculated as $Q_3 - Q_1$.


The correct option is (A) $Q_3 - Q_1$.

Question 31. Consider the ordered dataset: 10, 15, 20, 25, 30, 35, 40.

What is the median ($Q_2$) of this dataset?

(A) 20

(B) 25

(C) 30

(D) 35

Answer:

To Find: The median ($Q_2$) of the given ordered dataset.


Given: The ordered dataset is $10, 15, 20, 25, 30, 35, 40$.


Solution:

The Median is the middle value in an ordered dataset. It divides the dataset into two equal halves.

First, count the number of observations ($n$) in the dataset.

The dataset is $10, 15, 20, 25, 30, 35, 40$.

There are $n = 7$ observations.


Since the number of observations ($n=7$) is odd, the median is the value at the position $\frac{n+1}{2}$ in the ordered dataset.

Position of the median = $\frac{7+1}{2} = \frac{8}{2} = 4^{\text{th}}$ position.


Now, find the value at the $4^{\text{th}}$ position in the ordered dataset:

1st position: 10

2nd position: 15

3rd position: 20

4th position: 25

5th position: 30

6th position: 35

7th position: 40


The value at the $4^{\text{th}}$ position is $25$.

Therefore, the median ($Q_2$) of the dataset is $25$.


Comparing this with the given options:

(A) 20

(B) 25

(C) 30

(D) 35


The calculated median is $25$, which matches option (B).


The correct option is (B) 25.

Question 32. Consider the ordered dataset: 10, 15, 20, 25, 30, 35, 40.

What is the first quartile ($Q_1$) of this dataset?

(A) 10

(B) 15

(C) 20

(D) 25

Answer:

To Find: The first quartile ($Q_1$) of the given ordered dataset.


Given: The ordered dataset is $10, 15, 20, 25, 30, 35, 40$.


Solution:

The First Quartile ($Q_1$) is a measure of position in a dataset. It is the value that separates the lowest $25\%$ of the data from the highest $75\%$ of the data.


First, we need to determine the number of observations ($n$) in the dataset.

The dataset is $10, 15, 20, 25, 30, 35, 40$.

There are $n = 7$ observations.


To find the position of the first quartile in an ordered dataset, we can use the formula:

Position of $Q_1 = \frac{1}{4}(n+1)$


Substitute the value of $n=7$ into the formula:

Position of $Q_1 = \frac{1}{4}(7+1) = \frac{1}{4}(8) = 2$


The position of the first quartile is the $2^{\text{nd}}$ position in the ordered dataset.

Now, we find the value at the $2^{\text{nd}}$ position in the dataset $10, 15, 20, 25, 30, 35, 40$:

  • 1st value: 10
  • 2nd value: 15
  • 3rd value: 20
  • 4th value: 25
  • ... and so on.

The value at the $2^{\text{nd}}$ position is $15$.

Therefore, the first quartile ($Q_1$) of the dataset is $15$.


Comparing this result with the given options:

(A) 10

(B) 15

(C) 20

(D) 25


The calculated first quartile is $15$, which matches option (B).


The correct option is (B) 15.

Question 33. Assertion (A): Quartiles divide a dataset into four equal parts.

Reason (R): $Q_1$, $Q_2$, and $Q_3$ mark the points below which 25%, 50%, and 75% of the data fall, respectively.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) A is false but R is true.

Answer:

Let's evaluate the Assertion (A) and the Reason (R) separately.


Assertion (A): Quartiles divide a dataset into four equal parts.

When a dataset is ordered from least to greatest, quartiles are values that divide the data into quarters. There are three quartiles: $Q_1$, $Q_2$, and $Q_3$. These three values divide the dataset into four segments:

  • The segment below $Q_1$ (the lowest $25\%$ of data).
  • The segment between $Q_1$ and $Q_2$ (the next $25\%$ of data).
  • The segment between $Q_2$ and $Q_3$ (the next $25\%$ of data).
  • The segment above $Q_3$ (the highest $25\%$ of data).

Thus, the quartiles ($Q_1$, $Q_2$, $Q_3$) mark the boundaries that divide the dataset into four approximately equal parts (each containing about $25\%$ of the data).

Therefore, Assertion (A) is True.


Reason (R): $Q_1$, $Q_2$, and $Q_3$ mark the points below which 25%, 50%, and 75% of the data fall, respectively.

By definition:

  • $Q_1$ (the first quartile) is the value below which $25\%$ of the data falls.
  • $Q_2$ (the second quartile) is the value below which $50\%$ of the data falls. $Q_2$ is also the median.
  • $Q_3$ (the third quartile) is the value below which $75\%$ of the data falls.

This statement accurately describes the definition of $Q_1$, $Q_2$, and $Q_3$ in terms of percentiles.

Therefore, Reason (R) is True.


Now, let's determine if Reason (R) is the correct explanation for Assertion (A).

The reason why quartiles divide a dataset into four equal parts is precisely because $Q_1$, $Q_2$, and $Q_3$ are defined as the values that separate the lowest $25\%$, the lowest $50\%$, and the lowest $75\%$ of the data, respectively. These divisions inherently create four segments, each containing approximately $25\%$ of the data.

So, Reason (R) correctly explains how quartiles function to divide the dataset into four equal parts.


Based on the analysis, both Assertion (A) and Reason (R) are true, and Reason (R) is the correct explanation for Assertion (A).


The correct option is (A) Both A and R are true and R is the correct explanation of A.

Question 34. Case Study: The marks of 10 students in a test are: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100.

What is the 75th percentile ($Q_3$) of this dataset?

(A) 85

(B) 87.5

(C) 90

(D) 92.5

Answer:

To Find: The 75th percentile ($Q_3$) of the given dataset.


Given: The marks of 10 students in a test are: $55, 60, 65, 70, 75, 80, 85, 90, 95, 100$.


Solution:

The 75th percentile ($Q_3$) is the value in an ordered dataset below which $75\%$ of the data falls.

First, we confirm that the dataset is ordered. The given dataset is already ordered from smallest to largest: $55, 60, 65, 70, 75, 80, 85, 90, 95, 100$.

The number of observations in the dataset is $n = 10$.


To find the position of the $p^{\text{th}}$ percentile ($P_p$) in an ordered dataset, a common method uses the formula:

$L = \frac{p}{100} \times n$

where $L$ is the calculated position in the ordered dataset.


For the 75th percentile ($Q_3$), we set $p=75$ and $n=10$.

Position of $Q_3 = \frac{75}{100} \times 10$

Position of $Q_3 = 0.75 \times 10$

Position of $Q_3 = 7.5$


When the calculated position $L$ is not an integer, the $p^{\text{th}}$ percentile is typically the value at the position given by the ceiling of $L$ ($\lceil L \rceil$). The ceiling of a number is the smallest integer greater than or equal to the number.

In this case, $L = 7.5$.

Position of $Q_3 = \lceil 7.5 \rceil = 8^{\text{th}}$ position.


Now, we find the value at the $8^{\text{th}}$ position in the ordered dataset $55, 60, 65, 70, 75, 80, 85, 90, 95, 100$:

The values are:

  • 1st value: 55
  • ...
  • 7th value: 85
  • 8th value: 90
  • 9th value: 95
  • 10th value: 100

The value at the $8^{\text{th}}$ position is $90$.

Therefore, the 75th percentile ($Q_3$) of the dataset is $90$.


Comparing this calculated value with the given options:

(A) 85

(B) 87.5

(C) 90

(D) 92.5


The calculated 75th percentile is $90$, which matches option (C).


The correct option is (C) 90.

Question 35. Case Study: (Same setup as Q34)

What is the percentile rank of a student who scored 70 marks?

(A) 30th percentile

(B) 35th percentile

(C) 40th percentile

(D) 45th percentile

Answer:

To Find: The percentile rank of a student who scored 70 marks.


Given: The marks of 10 students in a test are: $55, 60, 65, 70, 75, 80, 85, 90, 95, 100$.

The student's score is $70$ marks.


Solution:

The percentile rank of a score is the percentage of scores in the dataset that are less than or equal to that score.


First, ensure the dataset is ordered. The given dataset is already ordered: $55, 60, 65, 70, 75, 80, 85, 90, 95, 100$.

The total number of scores in the dataset is $n = 10$.


Next, we need to count how many scores in the dataset are less than or equal to the student's score of $70$.

Looking at the ordered dataset, the scores less than or equal to 70 are: $55, 60, 65, 70$.

There are $4$ scores that are less than or equal to $70$. Let's call this number $k = 4$.


The formula for the percentile rank of a value $x$ is often given by:

Percentile Rank $= \frac{\text{Number of values less than or equal to } x}{\text{Total number of values}} \times 100$


Substituting the values for a score of $70$:

Percentile Rank of $70 = \frac{4}{10} \times 100$

Percentile Rank of $70 = 0.4 \times 100$

Percentile Rank of $70 = 40$


So, a student who scored 70 marks is at the 40th percentile.


Comparing this calculated percentile rank with the given options:

(A) 30th percentile

(B) 35th percentile

(C) 40th percentile

(D) 45th percentile


The calculated percentile rank is 40th percentile, which matches option (C).


The correct option is (C) 40th percentile.

Question 36. Correlation measures the strength and direction of the linear relationship between:

(A) A single variable.

(B) Two or more variables.

(C) Two variables.

(D) Variables and constants.

Answer:

Correlation is a statistical measure that describes the extent to which two or more variables are related. It quantifies the degree to which they change together.


Specifically, correlation coefficients (like Pearson correlation coefficient, denoted by $r$) measure the strength and direction of the linear relationship between two quantitative variables.

For example, we might look at the correlation between height and weight, or between hours of study and exam scores.


While there are concepts like multiple correlation which involve the relationship between one variable and a set of other variables, the fundamental concept of 'correlation' typically refers to the bivariate relationship, i.e., the relationship between two variables.


Let's examine the options:

(A) A single variable: Correlation by definition describes the relationship *between* variables, so it requires at least two variables.

(B) Two or more variables: While statistical methods can analyze relationships among multiple variables (e.g., multiple regression, partial correlation), the primary measure of simple correlation, which is usually implied unless otherwise specified, deals with the relationship between *two* variables. Option (C) is more specific to the core concept of a single correlation coefficient.

(C) Two variables: This accurately describes what a standard correlation coefficient measures - the linear relationship between one variable and another.

(D) Variables and constants: Constants do not vary, so correlation cannot be calculated between a variable and a constant.


Given the standard usage and definition of 'correlation' as a measure of the linear association between two variables, option (C) is the most appropriate answer.


The correct option is (C) Two variables.

Question 37. If two variables have a positive correlation, it means that as one variable increases, the other tends to:

(A) Decrease.

(B) Increase.

(C) Remain constant.

(D) Vary randomly.

Answer:

Correlation measures the strength and direction of the linear relationship between two variables. The direction of the relationship is indicated by the sign of the correlation coefficient.


  • Positive Correlation: If two variables have a positive correlation (correlation coefficient $r > 0$), it means that they tend to move in the same direction. As one variable increases, the other variable also tends to increase. Similarly, as one variable decreases, the other variable also tends to decrease.
  • Negative Correlation: If two variables have a negative correlation (correlation coefficient $r < 0$), it means that they tend to move in opposite directions. As one variable increases, the other variable tends to decrease, and vice versa.
  • Zero Correlation: If two variables have zero correlation (correlation coefficient $r \approx 0$), it means there is no linear relationship between them. The variables do not show a consistent tendency to increase or decrease together or in opposition. They might vary randomly with respect to each other, or they might have a non-linear relationship.

The question specifically asks about a positive correlation.

In a positive correlation, as one variable increases, the other tends to increase.


Comparing this with the given options:

(A) Decrease: This describes a negative correlation.

(B) Increase: This describes a positive correlation.

(C) Remain constant: Correlation describes the relationship between variables that are changing.

(D) Vary randomly: This would suggest little to no linear correlation.


Therefore, if two variables have a positive correlation, it means that as one variable increases, the other tends to increase.


The correct option is (B) Increase.

Question 38. The value of the correlation coefficient ($r$) can range from:

(A) 0 to 1

(B) -1 to 1

(C) $-\infty$ to $\infty$

(D) 0 to $\infty$

Answer:

The correlation coefficient ($r$), typically referring to the Pearson product-moment correlation coefficient, is a statistical measure of the strength and direction of a linear relationship between two variables.


The value of the correlation coefficient $r$ is always between $-1$ and $+1$, inclusive.

  • A value of $r = +1$ indicates a perfect positive linear relationship. As one variable increases, the other increases proportionally, and all data points lie exactly on a straight line with a positive slope.
  • A value of $r = -1$ indicates a perfect negative linear relationship. As one variable increases, the other decreases proportionally, and all data points lie exactly on a straight line with a negative slope.
  • A value of $r = 0$ indicates no linear relationship between the two variables. There is no consistent pattern of one variable increasing or decreasing as the other changes.
  • Values between $0$ and $+1$ indicate a positive linear relationship, with stronger relationships closer to $+1$.
  • Values between $-1$ and $0$ indicate a negative linear relationship, with stronger relationships closer to $-1$.

Therefore, the range of the correlation coefficient ($r$) is from $-1$ to $1$, inclusive.

This can be represented mathematically as $-1 \leq r \leq 1$.


Comparing this range with the given options:

(A) 0 to 1: This range only includes non-negative correlations, excluding negative relationships.

(B) -1 to 1: This range correctly includes all possible values for the correlation coefficient, covering perfect negative correlation, no correlation, and perfect positive correlation, as well as all strengths in between.

(C) $-\infty$ to $\infty$: This represents the range of real numbers, which is not the range of the correlation coefficient. Measures like variance or standard deviation squared can range up to infinity, but correlation is bounded.

(D) 0 to $\infty$: This range is incorrect as correlation can be negative and is bounded at 1.


Question 39. A correlation coefficient of $r = 0.9$ indicates:

(A) A weak positive linear relationship.

(B) A strong positive linear relationship.

(C) A weak negative linear relationship.

(D) No linear relationship.

Answer:

The correlation coefficient ($r$) measures the strength and direction of the linear relationship between two variables. The value of $r$ ranges from $-1$ to $+1$.


The strength of the linear relationship is indicated by the absolute value of $r$, $|r|$.

  • $|r| = 1$: Perfect linear relationship.
  • $|r|$ close to 1 (e.g., 0.8, 0.9): Strong linear relationship.
  • $|r|$ between 0.4 and 0.7 (approx): Moderate linear relationship.
  • $|r|$ between 0 and 0.3 (approx): Weak linear relationship.
  • $|r| = 0$: No linear relationship.

The direction of the linear relationship is indicated by the sign of $r$.

  • $r > 0$: Positive linear relationship (as one variable increases, the other tends to increase).
  • $r < 0$: Negative linear relationship (as one variable increases, the other tends to decrease).

The given correlation coefficient is $r = 0.9$.

The value $0.9$ is positive ($0.9 > 0$), indicating a positive linear relationship.

The absolute value is $|0.9| = 0.9$. This value is very close to 1.

A value of $0.9$ typically indicates a strong linear relationship.


Therefore, a correlation coefficient of $r = 0.9$ indicates a strong positive linear relationship.


Let's examine the given options:

(A) A weak positive linear relationship: Incorrect, 0.9 is strong.

(B) A strong positive linear relationship: Correct.

(C) A weak negative linear relationship: Incorrect, the sign is positive, and the strength is strong.

(D) No linear relationship: Incorrect, $r=0$ indicates no linear relationship.


The correct option is (B) A strong positive linear relationship.

Question 40. A correlation coefficient of $r = -0.7$ indicates:

(A) A strong positive linear relationship.

(B) A moderate positive linear relationship.

(C) A moderate negative linear relationship.

(D) A strong negative linear relationship.

Answer:

The correlation coefficient ($r$) is a numerical measure that indicates the strength and direction of the linear relationship between two variables.


The value of $r$ always lies between $-1$ and $+1$, inclusive: $-1 \leq r \leq 1$.


The interpretation of the correlation coefficient depends on both its sign and its absolute value ($|r|$).


1. Direction:

  • If $r$ is positive ($r > 0$), the relationship is positive: as one variable increases, the other tends to increase.
  • If $r$ is negative ($r < 0$), the relationship is negative: as one variable increases, the other tends to decrease.
  • If $r$ is zero ($r = 0$), there is no linear relationship.

2. Strength: The strength of the linear relationship is indicated by how close $|r|$ is to 1.

  • $|r| = 1$: Perfect linear relationship.
  • $|r|$ close to 1 (e.g., 0.8, 0.9, 0.95): Strong linear relationship.
  • $|r|$ around 0.4 to 0.7 (approx): Moderate linear relationship.
  • $|r|$ close to 0 (e.g., 0.1, 0.2): Weak linear relationship.
  • $|r| = 0$: No linear relationship.

Note that the thresholds for weak, moderate, and strong can sometimes vary slightly depending on the specific field of study or convention used, but values of $|r| \geq 0.7$ are very commonly considered indicative of a strong linear relationship.


Now, let's analyze the given correlation coefficient: $r = -0.7$.

  • The sign of $r$ is negative ($-0.7 < 0$). This indicates a negative linear relationship.
  • The absolute value is $|r| = |-0.7| = 0.7$. Based on common conventions, a value of 0.7 indicates a strong linear relationship.

Combining the direction and strength, a correlation coefficient of $r = -0.7$ indicates a strong negative linear relationship.


Let's compare this conclusion with the given options:

(A) A strong positive linear relationship: Incorrect (sign is negative).

(B) A moderate positive linear relationship: Incorrect (sign is negative, and strength is typically considered strong).

(C) A moderate negative linear relationship: Possible, depending on the exact threshold used for "moderate" vs. "strong". Some scales might consider 0.7 as the upper bound of moderate.

(D) A strong negative linear relationship: This aligns with the common interpretation of $|r|=0.7$ indicating a strong relationship and the negative sign indicating the direction.


Given the options, option (D) "A strong negative linear relationship" is the most appropriate description for a correlation coefficient of $r = -0.7$ based on standard statistical interpretations where $|r| \geq 0.7$ signifies strength.


The correct option is (D) A strong negative linear relationship.

Question 41. If the correlation coefficient $r = 0$, it means there is:

(A) A perfect linear relationship.

(B) No linear relationship.

(C) A strong non-linear relationship.

(D) An error in calculation.

Answer:

The correlation coefficient ($r$), specifically the Pearson product-moment correlation coefficient, measures the strength and direction of the linear relationship between two quantitative variables.


The value of the correlation coefficient $r$ ranges from $-1$ to $+1$, inclusive (i.e., $-1 \leq r \leq 1$).


Interpretation of different values of $r$:

  • If $r = +1$, there is a perfect positive linear relationship.
  • If $r = -1$, there is a perfect negative linear relationship.
  • If $r$ is between 0 and +1 (exclusive), there is a positive linear relationship (stronger as $r$ approaches 1).
  • If $r$ is between -1 and 0 (exclusive), there is a negative linear relationship (stronger as $r$ approaches -1).
  • If $r = 0$, there is no linear relationship between the two variables.

A correlation coefficient of $r=0$ indicates that there is no tendency for the two variables to increase or decrease together in a linear fashion.

It is important to note that $r=0$ only implies the absence of a *linear* relationship. The variables might still be related in a non-linear way (e.g., quadratic, exponential). For example, data points might form a perfect parabola, but the Pearson correlation coefficient could be 0.


Let's evaluate the given options based on this understanding:

(A) A perfect linear relationship: Incorrect. A perfect linear relationship is indicated by $r=+1$ or $r=-1$.

(B) No linear relationship: Correct. This is the direct meaning of $r=0$ in the context of the Pearson correlation coefficient.

(C) A strong non-linear relationship: Incorrect. A correlation coefficient of 0 does not tell us anything about the presence or strength of a non-linear relationship. There might be a strong non-linear relationship, but $r=0$ doesn't confirm or deny it.

(D) An error in calculation: Incorrect. A value of 0 is a valid result for the correlation coefficient when there is no linear association between the variables.


Therefore, if the correlation coefficient $r = 0$, it means there is no linear relationship.


The correct option is (B) No linear relationship.

Question 42. Assertion (A): Correlation implies causation.

Reason (R): If two variables are correlated, it means that one variable directly causes the change in the other.

(A) Both A and R are true and R is the correct explanation of A.

(B) Both A and R are true but R is not the correct explanation of A.

(C) A is true but R is false.

(D) Both A and R are false.

Answer:

Let's evaluate the Assertion (A) and the Reason (R).


Assertion (A): Correlation implies causation.

Correlation measures the strength and direction of a linear relationship between two variables. It indicates whether two variables tend to change together. Causation means that one variable directly causes a change in another variable.

A fundamental principle in statistics is: "Correlation does not imply causation". Just because two variables are correlated does not mean that one causes the other. There could be other explanations for the observed correlation, such as a confounding variable influencing both, or the relationship could be coincidental.

For example, there might be a positive correlation between ice cream sales and crime rates in a city. However, this does not mean that eating ice cream causes crime. A more likely explanation is a confounding variable, such as warm weather, which simultaneously increases ice cream sales and outdoor activities, potentially leading to more opportunities for crime.

Therefore, Assertion (A) is False.


Reason (R): If two variables are correlated, it means that one variable directly causes the change in the other.

This statement directly links correlation to causation. As explained above, correlation only indicates a relationship or association between variables; it does not prove a cause-and-effect link. Causation requires stronger evidence, often obtained through controlled experiments, to establish that a change in one variable directly leads to a change in another, while ruling out alternative explanations.

Therefore, Reason (R) is False.


Since both the Assertion (A) and the Reason (R) are false, we look for the option that reflects this.


Comparing our findings with the given options:

(A) Both A and R are true and R is the correct explanation of A: Incorrect (both are false).

(B) Both A and R are true but R is not the correct explanation of A: Incorrect (both are false).

(C) A is true but R is false: Incorrect (A is false).

(D) Both A and R are false: Correct.


The correct option is (D) Both A and R are false.

Question 43. Case Study: A researcher studies the relationship between hours of study (X) and exam scores (Y) for 5 students. The data is:

Student Hours of Study (X) Exam Score (Y)
1 3 60
2 5 80
3 2 50
4 6 90
5 4 70

Based on this data, is the relationship between hours of study and exam scores likely positive or negative?

(A) Positive

(B) Negative

(C) Zero

(D) Cannot be determined without calculation

Answer:

To Determine: The likely direction of the relationship between Hours of Study (X) and Exam Scores (Y).


Given Data:

Hours of Study (X): $3, 5, 2, 6, 4$

Exam Score (Y): $60, 80, 50, 90, 70$


Solution:

To understand the relationship between two variables, we can observe how one variable changes as the other changes.

Let's sort the data points based on the Hours of Study (X):

Student Hours of Study (X) Exam Score (Y)
3250
1360
5470
2580
4690

Now, we can observe the trend:

  • When Hours of Study (X) is 2, the Exam Score (Y) is 50.
  • When Hours of Study (X) is 3, the Exam Score (Y) is 60.
  • When Hours of Study (X) is 4, the Exam Score (Y) is 70.
  • When Hours of Study (X) is 5, the Exam Score (Y) is 80.
  • When Hours of Study (X) is 6, the Exam Score (Y) is 90.

As the Hours of Study (X) increase, the Exam Score (Y) consistently increases.

This pattern indicates that there is a tendency for the two variables to move in the same direction.

A relationship where both variables tend to increase together is called a positive relationship or positive correlation.


Therefore, based on the provided data, the relationship between hours of study and exam scores is likely positive.


Comparing this with the given options:

(A) Positive: This matches our observation.

(B) Negative: This would mean that as study hours increase, scores tend to decrease, which is not the case here.

(C) Zero: This would mean no linear relationship, which is also not the case here as there's a clear increasing trend.

(D) Cannot be determined without calculation: While calculating the exact correlation coefficient would confirm the strength, the direction (positive or negative) can often be determined by simply observing the trend in the data, especially for small datasets with clear patterns like this one.


The relationship is likely positive.


The correct option is (A) Positive.

Question 44. Case Study: (Same setup as Q43)

If the calculated correlation coefficient for this data is approximately 0.98, it indicates:

(A) A perfect positive correlation.

(B) A very strong positive linear relationship.

(C) A weak positive linear relationship.

(D) The exam score causes the hours of study.

Answer:

To Interpret: The meaning of a correlation coefficient $r \approx 0.98$ for the given data.


Given: The calculated correlation coefficient is approximately $r = 0.98$.


Solution:

The correlation coefficient ($r$) measures the strength and direction of the linear relationship between two variables.

The value of $r$ ranges from $-1$ to $+1$ (inclusive).


Interpretation based on the value of $r$:

  • The sign of $r$ indicates the direction of the relationship:
    • $r > 0$ indicates a positive linear relationship (variables increase/decrease together).
    • $r < 0$ indicates a negative linear relationship (as one increases, the other decreases).
    • $r = 0$ indicates no linear relationship.
  • The absolute value $|r|$ indicates the strength of the relationship:
    • $|r| = 1$ indicates a perfect linear relationship.
    • $|r|$ close to 1 indicates a strong linear relationship.
    • $|r|$ close to 0 indicates a weak linear relationship.

The given correlation coefficient is $r = 0.98$.

1. The sign is positive ($0.98 > 0$), which means there is a positive linear relationship.

2. The absolute value is $|0.98| = 0.98$. This value is extremely close to $1$. Values like $0.98$ are indicative of a very strong linear relationship, approaching a perfect one.


Combining the direction and strength, a correlation coefficient of $r = 0.98$ indicates a very strong positive linear relationship.


Let's evaluate the given options:

(A) A perfect positive correlation: A perfect positive correlation is when $r = +1$. While $0.98$ is very close, it's not exactly 1.

(B) A very strong positive linear relationship: This accurately describes the positive direction (due to the positive sign) and the very high strength (due to the absolute value being close to 1).

(C) A weak positive linear relationship: Incorrect, $0.98$ is very close to 1, not 0.

(D) The exam score causes the hours of study: Correlation does not imply causation. A high correlation indicates an association, but it does not prove that one variable causes the other. Establishing causation requires different research methods (like experiments).


Based on the interpretation of the correlation coefficient, $r = 0.98$ indicates a very strong positive linear relationship.


The correct option is (B) A very strong positive linear relationship.

Question 45. Match the measure with what it quantifies:

(i) Mean

(ii) Standard Deviation

(iii) Skewness

(iv) Correlation Coefficient

(a) Linear relationship between two variables.

(b) Central tendency.

(c) Spread of data points.

(d) Asymmetry of distribution.

(A) (i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)

(B) (i)-(b), (ii)-(d), (iii)-(c), (iv)-(a)

(C) (i)-(a), (ii)-(c), (iii)-(d), (iv)-(b)

(D) (i)-(b), (ii)-(c), (iii)-(a), (iv)-(d)

Answer:

We need to match each statistical measure listed in (i) to (iv) with the concept it quantifies from list (a) to (d).


Let's consider each measure:

(i) Mean: The mean is a measure of the average value in a dataset. It represents the typical or central value. Therefore, the Mean quantifies Central tendency.

(i) Mean

(b) Central tendency


(ii) Standard Deviation: The standard deviation is a measure of the amount of variation or dispersion of a set of values. It indicates how spread out the data points are around the mean. Therefore, Standard Deviation quantifies the Spread of data points.

(ii) Standard Deviation

(c) Spread of data points


(iii) Skewness: Skewness is a measure of the asymmetry of a distribution. It indicates the extent to which the distribution is skewed to the left or right. Therefore, Skewness quantifies the Asymmetry of distribution.

(iii) Skewness

(d) Asymmetry of distribution


(iv) Correlation Coefficient: The correlation coefficient measures the strength and direction of the linear relationship between two variables. Therefore, Correlation Coefficient quantifies the Linear relationship between two variables.

(iv) Correlation Coefficient

(a) Linear relationship between two variables


Based on these matches, we have:

  • (i) matches with (b)
  • (ii) matches with (c)
  • (iii) matches with (d)
  • (iv) matches with (a)

Comparing this matching with the given options:

(A) (i)-(b), (ii)-(c), (iii)-(d), (iv)-(a)

(B) (i)-(b), (ii)-(d), (iii)-(c), (iv)-(a)

(C) (i)-(a), (ii)-(c), (iii)-(d), (iv)-(b)

(D) (i)-(b), (ii)-(c), (iii)-(a), (iv)-(d)


Option (A) correctly represents the established matches.


The correct option is (A) (i)-(b), (ii)-(c), (iii)-(d), (iv)-(a).

Question 46. Which of the following is a measure of relative dispersion?

(A) Standard Deviation

(B) Variance

(C) Coefficient of Variation

(D) Range

Answer:

Measures of dispersion describe the spread or variability of a dataset. These measures can be classified into two types:


1. Absolute Measures of Dispersion: These measures are expressed in the same units as the original data. They quantify the actual amount of variability. Examples include:

  • Range
  • Quartile Deviation
  • Mean Deviation
  • Standard Deviation
  • Variance

2. Relative Measures of Dispersion: These measures are expressed as a ratio or percentage, making them unitless. They are used to compare the variability of different datasets, even if they have different units or different scales of measurement (i.e., different means). Examples include:

  • Coefficient of Range
  • Coefficient of Quartile Deviation
  • Coefficient of Mean Deviation
  • Coefficient of Variation (CV)


The question asks for a measure of relative dispersion.

Let's examine the options:

(A) Standard Deviation: This is an absolute measure of dispersion.

(B) Variance: This is an absolute measure of dispersion.

(C) Coefficient of Variation: This is a ratio of the standard deviation to the mean (usually expressed as a percentage), making it a relative measure of dispersion. The formula for the Coefficient of Variation is:

$\text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100\%$

(D) Range: This is an absolute measure of dispersion.


Therefore, the Coefficient of Variation is a measure of relative dispersion.


The correct option is (C) Coefficient of Variation.

Question 47. Coefficient of Variation is calculated as:

(A) $(\text{Standard Deviation} / \text{Mean}) \times 100$

(B) $(\text{Mean} / \text{Standard Deviation}) \times 100$

(C) Standard Deviation / Mean

(D) Mean / Standard Deviation

Answer:

To Find: The formula for the Coefficient of Variation.


The Coefficient of Variation (CV) is a measure of relative dispersion. It expresses the standard deviation as a percentage of the mean. This allows for the comparison of variability between datasets with different scales or means.


The formula for the Coefficient of Variation is:

$CV = \frac{\sigma}{\mu} \times 100\%$ (for population data)

or

$CV = \frac{s}{\bar{x}} \times 100\%$ (for sample data)

where $\sigma$ and $s$ are the population and sample standard deviations, respectively, and $\mu$ and $\bar{x}$ are the population and sample means, respectively.


The formula shows that the Coefficient of Variation is calculated by dividing the Standard Deviation by the Mean and then multiplying by 100 to express it as a percentage.


Let's compare this formula with the given options:

(A) $(\text{Standard Deviation} / \text{Mean}) \times 100$: This matches the standard formula for calculating the Coefficient of Variation as a percentage.

(B) $(\text{Mean} / \text{Standard Deviation}) \times 100$: This is the reciprocal of the standard formula multiplied by 100.

(C) Standard Deviation / Mean: This gives the Coefficient of Variation as a ratio (a decimal), not typically as a percentage which is the more common way it is presented.

(D) Mean / Standard Deviation: This is the reciprocal of the ratio in option (C).


While option (C) is also a form of relative dispersion (the ratio), option (A) provides the calculation for the Coefficient of Variation as it is most commonly used and presented, i.e., as a percentage.


The correct option is (A) $(\text{Standard Deviation} / \text{Mean}) \times 100$.

Question 48. A dataset with a mean of 50 and a standard deviation of 10 has a Coefficient of Variation of:

(A) 10%

(B) 20%

(C) 50%

(D) 200%

Answer:

To Find: The Coefficient of Variation (CV) for the given dataset.


Given:

Mean ($\bar{x}$ or $\mu$) = $50$

Standard Deviation ($s$ or $\sigma$) = $10$


Solution:

The Coefficient of Variation (CV) is a measure of relative dispersion. It is calculated as the ratio of the standard deviation to the mean, usually expressed as a percentage.


The formula for the Coefficient of Variation is:

$\text{CV} = \frac{\text{Standard Deviation}}{\text{Mean}} \times 100\%$


Substitute the given values into the formula:

$\text{CV} = \frac{10}{50} \times 100\%$


Now, perform the calculation:

$\text{CV} = \frac{\cancel{10}^{1}}{\cancel{50}_{5}} \times 100\%$

$\text{CV} = \frac{1}{5} \times 100\%$

$\text{CV} = 0.2 \times 100\%$

$\text{CV} = 20\%$


The Coefficient of Variation is $20\%$.


Comparing this calculated value with the given options:

(A) 10%

(B) 20%

(C) 50%

(D) 200%


The calculated Coefficient of Variation is $20\%$, which matches option (B).


The correct option is (B) 20%.

Question 49. Which of the following is NOT a measure of central tendency?

(A) Mean

(B) Median

(C) Mode

(D) Standard Deviation

Answer:

Statistical measures are used to summarize and describe the characteristics of a dataset. They can be broadly categorized into measures of central tendency, measures of dispersion, measures of shape, etc.


Measures of central tendency are values that describe the center point or typical value of a dataset. The most common measures of central tendency are:

  • Mean: The average of all values.
  • Median: The middle value in an ordered dataset.
  • Mode: The value that appears most frequently in a dataset.

Measures of dispersion are values that describe the spread or variability of a dataset. They indicate how spread out the data points are from the center or from each other. Examples include:

  • Range
  • Quartile Deviation
  • Mean Deviation
  • Standard Deviation: The square root of the variance, measuring the typical deviation from the mean.
  • Variance

The question asks which of the given options is NOT a measure of central tendency.

Let's examine each option:

(A) Mean: This is a measure of central tendency.

(B) Median: This is a measure of central tendency.

(C) Mode: This is a measure of central tendency.

(D) Standard Deviation: This is a measure of dispersion, not central tendency. It quantifies the spread of the data.


Therefore, Standard Deviation is not a measure of central tendency.


The correct option is (D) Standard Deviation.

Question 50. If the mean and median of a dataset are equal, the distribution is likely:

(A) Positively skewed.

(B) Negatively skewed.

(C) Symmetric.

(D) Bimodal.

Answer:

To Determine: The likely shape of a distribution when the mean and median are equal.


Solution:

The relationship between the measures of central tendency (Mean, Median, and Mode) can provide insights into the shape and skewness of a distribution.


Let's recall the typical relationships for different types of distributions:

  • For a Symmetric distribution, the data is balanced around the center. In a perfectly symmetric distribution, the Mean, Median, and Mode are all equal and located at the center. ($Mean = Median = Mode$)
  • For a Positively skewed distribution (right-skewed), the tail extends to the right. The mean is typically greater than the median and mode. ($Mean > Median > Mode$)
  • For a Negatively skewed distribution (left-skewed), the tail extends to the left. The mean is typically less than the median and mode. ($Mean < Median < Mode$)

The question states that the mean and median of the dataset are equal ($Mean = Median$).

This equality is a characteristic feature of a Symmetric distribution.

While it is possible for the mean and median to be equal in some specific asymmetric cases (though less common in typical real-world data), the equality of the mean and median strongly suggests that the distribution is symmetric or very close to symmetric.

A bimodal distribution can be symmetric or skewed; the number of modes doesn't directly dictate the relationship between mean and median in the way that skewness does.


Therefore, if the mean and median of a dataset are equal, the distribution is likely symmetric.


Comparing this conclusion with the given options:

(A) Positively skewed: This is indicated by Mean > Median.

(B) Negatively skewed: This is indicated by Mean < Median.

(C) Symmetric: This is indicated by Mean $\approx$ Median (or Mean = Median in a perfectly symmetric distribution).

(D) Bimodal: This refers to the number of peaks, not the relationship between mean and median or the skewness.


The correct option is (C) Symmetric.

Question 51. A platykurtic distribution is less peaked and has lighter tails than a normal distribution.

(A) True

(B) False

(C) True, but only if it is symmetric.

(D) False, it has heavier tails.

Answer:

To determine: Whether the statement about a platykurtic distribution is true or false.


Kurtosis is a measure that describes the shape of a probability distribution, specifically its 'tailedness' and 'peakedness' relative to a normal distribution (which is mesokurtic). The kurtosis of a normal distribution is 3 (or excess kurtosis is 0).


Types of distributions based on kurtosis:

  • Mesokurtic: Kurtosis is 3 (excess kurtosis is 0). Examples include the normal distribution.
  • Leptokurtic: Kurtosis is greater than 3 (excess kurtosis > 0). These distributions have a **higher peak** around the mean and **heavier tails** than a normal distribution.
  • Platykurtic: Kurtosis is less than 3 (excess kurtosis < 0). These distributions have a **lower peak** around the mean and **lighter tails** than a normal distribution. The data is more spread out towards the shoulders of the distribution rather than being concentrated at the peak or in the extreme tails.

The statement says: "A platykurtic distribution is less peaked and has lighter tails than a normal distribution."

Based on the definition of a platykurtic distribution, this statement accurately describes its characteristics relative to a normal distribution.


Therefore, the statement is **True**.


Comparing this with the given options:

(A) True: This matches our conclusion.

(B) False: This contradicts the definition.

(C) True, but only if it is symmetric: Kurtosis is a measure of tail heaviness and peakedness, independent of symmetry (skewness). A platykurtic distribution can be symmetric or skewed.

(D) False, it has heavier tails: Having heavier tails is characteristic of a leptokurtic distribution, not a platykurtic one.


The correct option is (A) True.

Question 52. If the 25th percentile of a dataset is 40 and the 75th percentile is 60, what is the Interquartile Range (IQR)?

(A) 10

(B) 20

(C) 30

(D) 100

Answer:

To Find: The Interquartile Range (IQR) of the dataset.


Given:

25th percentile = $40$

75th percentile = $60$


Solution:

The Interquartile Range (IQR) is a measure of dispersion that represents the spread of the middle $50\%$ of the data. It is calculated as the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$).


We know the relationship between quartiles and percentiles:

  • The First Quartile ($Q_1$) is equivalent to the 25th percentile.
  • The Third Quartile ($Q_3$) is equivalent to the 75th percentile.

From the given information:

First Quartile ($Q_1$) = 25th percentile = $40$

Third Quartile ($Q_3$) = 75th percentile = $60$


The formula for the Interquartile Range (IQR) is:

$IQR = Q_3 - Q_1$


Substitute the values of $Q_1$ and $Q_3$ into the formula:

$IQR = 60 - 40$

$IQR = 20$


The Interquartile Range (IQR) is $20$.


Comparing this calculated value with the given options:

(A) 10

(B) 20

(C) 30

(D) 100


The calculated IQR is $20$, which matches option (B).


The correct option is (B) 20.

Question 53. The percentile rank of a value $X$ in a dataset is the percentage of values in the dataset that are _________ X.

(A) exactly equal to

(B) greater than

(C) less than

(D) less than or equal to

Answer:

To complete: The definition of percentile rank.


The percentile rank of a score or value $X$ in a dataset is a measure of its relative standing. It tells us the percentage of values in the dataset that fall below or at that specific value.


By definition, the percentile rank of a value $X$ is the percentage of scores in the dataset that are less than or equal to $X$.

The formula used is often:

Percentile Rank $= \frac{\text{Number of values less than or equal to } X}{\text{Total number of values}} \times 100$


Let's examine the options based on this definition:

(A) exactly equal to: This is incorrect. Percentile rank considers values less than *or equal to* X.

(B) greater than: This is incorrect. Percentile rank considers values less than or equal to X.

(C) less than: This is close, but usually, the definition includes values equal to X as well.

(D) less than or equal to: This precisely matches the standard definition of percentile rank.


Therefore, the percentile rank of a value $X$ in a dataset is the percentage of values in the dataset that are less than or equal to X.


The correct option is (D) less than or equal to.

Question 54. A dataset has a skewness coefficient close to 0. This suggests the distribution is:

(A) Symmetric.

(B) Highly dispersed.

(C) Highly concentrated.

(D) Bimodal.

Answer:

The skewness coefficient is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.


A skewness coefficient tells us about the shape of the distribution:

If skewness is positive, the distribution is positively skewed or right-skewed (tail is longer on the right side).

If skewness is negative, the distribution is negatively skewed or left-skewed (tail is longer on the left side).

If skewness is close to 0, the distribution is said to be approximately symmetric.


In a symmetric distribution, the mean, median, and mode are often close to each other, and the distribution looks similar on both sides of the center.


Let's examine the options:

(A) Symmetric. A skewness coefficient close to 0 is the characteristic property of a symmetric distribution.

(B) Highly dispersed. Dispersion is measured by variance or standard deviation, not skewness. A distribution can be symmetric but highly dispersed (e.g., a wide normal distribution) or symmetric and not highly dispersed (e.g., a narrow normal distribution).

(C) Highly concentrated. Concentration is related to low dispersion. As with dispersion, concentration is not directly indicated by the skewness coefficient.

(D) Bimodal. Bimodality refers to a distribution having two modes (peaks). Skewness measures asymmetry, not the number of peaks. A bimodal distribution can be symmetric or skewed.


Therefore, a skewness coefficient close to 0 suggests that the distribution is symmetric.


The correct option is (A).


The final answer is $\boxed{\text{Symmetric}}$.

Question 55. Which of the following describes perfect negative linear correlation?

(A) $r = 1$

(B) $r = -1$

(C) $r = 0$

(D) $r = \pm 1$

Answer:

The correlation coefficient, often denoted by $r$, measures the strength and direction of a linear relationship between two variables.


The value of the correlation coefficient $r$ is always between -1 and +1, inclusive. That is, $ -1 \leq r \leq 1 $.


Interpretation of the value of $r$:

  • If $r = 1$, there is a perfect positive linear correlation. The points lie exactly on a straight line with a positive slope.
  • If $r = -1$, there is a perfect negative linear correlation. The points lie exactly on a straight line with a negative slope.
  • If $r = 0$, there is no linear correlation. There is no linear relationship between the variables.
  • If $r$ is close to 1 (e.g., 0.8, 0.95), there is a strong positive linear correlation.
  • If $r$ is close to -1 (e.g., -0.8, -0.95), there is a strong negative linear correlation.
  • If $r$ is close to 0 (e.g., 0.1, -0.1), there is a weak linear correlation.

The question asks for the value of $r$ that describes perfect negative linear correlation.


Based on the interpretation above, perfect negative linear correlation occurs when $r = -1$.


Let's look at the options:

(A) $r = 1$: Perfect positive linear correlation.

(B) $r = -1$: Perfect negative linear correlation.

(C) $r = 0$: No linear correlation.

(D) $r = \pm 1$: Perfect linear correlation (either positive or negative), but not specifically negative.


Thus, the option that describes perfect negative linear correlation is $r = -1$.


The correct option is (B).


The final answer is $\boxed{-1}$.

Question 56. Case Study: An investment fund's annual returns (in %) for 5 years are: 8, -2, 15, 5, 10.

What is the range of the annual returns?

(A) 17%

(B) 10%

(C) 20%

(D) 13%

Answer:

The given dataset of annual returns is: $\{8, -2, 15, 5, 10\}$.


The range of a dataset is defined as the difference between the maximum value and the minimum value in the dataset.


First, we need to find the maximum and minimum values from the given data.

The values are 8, -2, 15, 5, and 10.


The maximum value is $15$.

The minimum value is $-2$.


Now, we calculate the range:

Range = Maximum Value - Minimum Value

Range = $15 - (-2)$

Range = $15 + 2$

Range = $17$


Since the returns are in percentage, the range is 17%.


Comparing the calculated range with the given options:

(A) 17%

(B) 10%

(C) 20%

(D) 13%


The calculated range matches option (A).


The final answer is $\boxed{17\%}$.

Question 57. Case Study: (Same setup as Q56)

If the average return is 7.2%, which measure of dispersion (Standard Deviation or Range) gives a better idea of the typical deviation from the average?

(A) Range

(B) Standard Deviation

(C) Both are equally good

(D) Neither are suitable

Answer:

The dataset of annual returns is $\{8, -2, 15, 5, 10\}$. The average return is given as 7.2%.


We are asked to determine which measure of dispersion, Standard Deviation or Range, gives a better idea of the typical deviation from the average.


Let's consider what each measure tells us:

The Range is the difference between the maximum and minimum values in the dataset. For this dataset, the range is $15 - (-2) = 17\%$. The range gives the total spread of the data but is based only on the two extreme values. It does not provide information about how the other data points are distributed or how they deviate from the mean.


The Standard Deviation is a measure that quantifies the amount of variation or dispersion of a set of data values around the mean. It indicates, on average, how much each data point deviates from the mean.


Since the question specifically asks for a measure that gives a better idea of the typical deviation from the average (or mean), the standard deviation is the more appropriate measure. It takes into account the deviation of every data point from the mean, whereas the range only considers the distance between the two most extreme points.


Therefore, the Standard Deviation provides a better understanding of the typical spread of the data around the average.


Comparing this conclusion with the given options:

(A) Range

(B) Standard Deviation

(C) Both are equally good

(D) Neither are suitable


The correct option is (B).


The final answer is $\boxed{\text{Standard Deviation}}$.

Question 58. If the correlation coefficient between hours of sleep and exam scores is $r = 0.8$, which statement is true?

(A) More sleep causes higher scores.

(B) Higher scores cause more sleep.

(C) There is a strong positive linear association between sleep and scores.

(D) There is a weak positive linear association between sleep and scores.

Answer:

The correlation coefficient, denoted by $r$, measures the strength and direction of a linear relationship between two variables.


The value of $r$ is given as $0.8$.


Let's interpret this value:

  • The sign of $r$ indicates the direction of the linear association. Since $r = 0.8$ is positive, it indicates a positive linear association. This means that as the hours of sleep increase, the exam scores tend to increase.
  • The magnitude (absolute value) of $r$ indicates the strength of the linear association. The value $0.8$ is relatively close to $1$. Conventionally, an absolute value of $r$ between 0.7 and 1.0 (or -0.7 and -1.0) suggests a strong linear association.

Therefore, $r = 0.8$ indicates a strong positive linear association between hours of sleep and exam scores.


Now let's evaluate the given options:

(A) More sleep causes higher scores. Correlation shows association, but it does not prove causation. There might be other factors influencing both sleep and scores. Therefore, we cannot conclude causation solely from the correlation coefficient.

(B) Higher scores cause more sleep. Similar to option (A), correlation does not imply causation, and the direction of any potential causal relationship cannot be determined from correlation alone.

(C) There is a strong positive linear association between sleep and scores. As established from the interpretation of $r = 0.8$, the association is positive (sign is positive) and strong (magnitude is close to 1). This statement accurately describes the relationship indicated by the correlation coefficient.

(D) There is a weak positive linear association between sleep and scores. A correlation coefficient of $0.8$ is generally considered to represent a strong linear association, not a weak one.


Based on the interpretation of the correlation coefficient, the statement that is true is that there is a strong positive linear association between sleep and scores.


The correct option is (C).


The final answer is $\boxed{\text{There is a strong positive linear association between sleep and scores.}}$.

Question 59. Measures of dispersion are always:

(A) Positive or zero.

(B) Negative or zero.

(C) Can be positive or negative.

(D) Equal to the mean.

Answer:

Solution:


Measures of dispersion are statistical values that describe the spread or variability of a dataset. Common measures include range, variance, standard deviation, mean absolute deviation, and interquartile range.


Let's consider some common measures:

1. Range: The difference between the maximum and minimum values in a dataset. Since the maximum value is always greater than or equal to the minimum value, the range is always non-negative (positive or zero).

2. Variance ($\sigma^2$ or $s^2$): The average of the squared differences from the mean. Squaring the differences ensures that all values are non-negative, so the sum of squares is non-negative, and thus the variance is non-negative.

3. Standard Deviation ($\sigma$ or $s$): The square root of the variance. Since the variance is non-negative, its principal (non-negative) square root is taken, making the standard deviation always non-negative.

4. Mean Absolute Deviation (MAD): The average of the absolute differences from the mean or median. Using absolute values ensures that all terms in the sum are non-negative, resulting in a non-negative MAD.

5. Interquartile Range (IQR): The difference between the third quartile ($Q_3$) and the first quartile ($Q_1$). Since $Q_3 \ge Q_1$, the IQR is always non-negative.


In general, measures of dispersion are designed to quantify variability, which is inherently a non-negative concept. A measure of dispersion can be zero only if all data points in the dataset are identical (i.e., there is no dispersion or spread). If there is any variability among the data points, the measure of dispersion will be positive.


Therefore, measures of dispersion are always positive or zero.


Comparing this conclusion with the given options:

(A) Positive or zero.

(B) Negative or zero.

(C) Can be positive or negative.

(D) Equal to the mean.

The correct option is (A).


The final answer is $\boxed{\text{Positive or zero}}$.

Question 60. Which statistical tool would you use to investigate if there is a linear relationship between the amount spent on advertising and the volume of sales?

(A) Mean

(B) Standard Deviation

(C) Skewness

(D) Correlation

Answer:

Solution:


The question asks for a statistical tool to investigate the existence of a linear relationship between two variables: the amount spent on advertising and the volume of sales. We need to examine the given options to determine which tool is appropriate for this purpose.


Let's analyze each option:

(A) Mean: The mean is a measure of central tendency, representing the average value of a single dataset. It does not provide information about the relationship between two different variables.

(B) Standard Deviation: The standard deviation is a measure of dispersion or variability within a single dataset. It quantifies how spread out the data points are around the mean. It does not measure the relationship between two variables.

(C) Skewness: Skewness is a measure of the asymmetry of the probability distribution of a random variable about its mean. It describes the shape of a single distribution, indicating whether it is skewed to the left or right. It does not measure the relationship between two variables.

(D) Correlation: Correlation is a statistical measure that describes the extent to which two variables are linearly related. The correlation coefficient (commonly denoted by $r$) measures the strength and direction of a linear relationship between two quantitative variables. A value close to $+1$ indicates a strong positive linear relationship, a value close to $-1$ indicates a strong negative linear relationship, and a value close to $0$ indicates a weak or no linear relationship.


Since the objective is to investigate if there is a linear relationship between the amount spent on advertising and the volume of sales, the appropriate statistical tool is correlation.


Therefore, the correct option is (D).


The final answer is $\boxed{\text{Correlation}}$.

Question 61. If all values in a dataset are the same (e.g., 10, 10, 10, 10), what is the standard deviation?

(A) 10

(B) 1

(C) 0

(D) Undefined

Answer:

Solution:


The standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.


In the given dataset, all values are the same (10, 10, 10, 10). This means there is no variation or dispersion among the values.

Let the dataset be $X = \{10, 10, 10, 10\}$.

The mean of the dataset is $\bar{x} = \frac{10+10+10+10}{4} = \frac{40}{4} = 10$.

The standard deviation is calculated based on the differences between each value and the mean.

For each value $x_i$, the difference from the mean is $x_i - \bar{x}$. In this case, for every value, the difference is $10 - 10 = 0$.

Since every value is equal to the mean, the deviation of each value from the mean is 0.

The standard deviation is a measure of the average size of these deviations. If all deviations are 0, then the standard deviation must also be 0.


Therefore, if all values in a dataset are the same, the standard deviation is 0.


Comparing this conclusion with the given options:

(A) 10

(B) 1

(C) 0

(D) Undefined

The correct option is (C).


The final answer is $\boxed{0}$.

Question 62. The sum of deviations of data points from their mean is always:

(A) Positive

(B) Negative

(C) Zero

(D) Equal to the standard deviation

Answer:

Solution:


Let the dataset be denoted by $x_1, x_2, ..., x_n$, where $n$ is the number of data points.

Let $\bar{x}$ be the mean of the dataset.

The deviation of a data point $x_i$ from the mean $\bar{x}$ is given by $(x_i - \bar{x})$.

We are asked to find the sum of these deviations for all data points in the dataset.

The sum of deviations is $\sum_{i=1}^{n} (x_i - \bar{x})$.


We can expand the summation:

$\sum_{i=1}^{n} (x_i - \bar{x}) = \sum_{i=1}^{n} x_i - \sum_{i=1}^{n} \bar{x}$

The term $\sum_{i=1}^{n} x_i$ is the sum of all data points.

The term $\sum_{i=1}^{n} \bar{x}$ means adding the constant value $\bar{x}$ $n$ times, which is equal to $n \times \bar{x}$.

So, the sum of deviations is $\sum_{i=1}^{n} x_i - n\bar{x}$.


The mean $\bar{x}$ is defined as the sum of all data points divided by the number of data points:

$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$

...(i)

From equation (i), we can write the sum of data points as:

$\sum_{i=1}^{n} x_i = n\bar{x}$

...(ii)


Now substitute equation (ii) back into the expression for the sum of deviations:

Sum of deviations $= (n\bar{x}) - n\bar{x}$

Sum of deviations $= 0$


This shows that the sum of the deviations of data points from their mean is always zero, regardless of the dataset, as long as the mean is correctly calculated.


Comparing this conclusion with the given options:

(A) Positive

(B) Negative

(C) Zero

(D) Equal to the standard deviation

The correct option is (C).


The final answer is $\boxed{Zero}$.

Question 63. If the Mean Deviation from the median is calculated, it represents the average absolute deviation of observations from the ________.

(A) mean

(B) median

(C) mode

(D) range

Answer:

Solution:


Mean Deviation is a measure of dispersion. It is defined as the average of the absolute deviations of observations from a central value.

The central value used can be the mean, median, or mode.


When the Mean Deviation is calculated from the mean, it is the average absolute deviation of observations from the mean.

Formula: $\text{MD}_{\text{mean}} = \frac{\sum |x_i - \bar{x}|}{n}$


When the Mean Deviation is calculated from the median, it is the average absolute deviation of observations from the median.

Formula: $\text{MD}_{\text{median}} = \frac{\sum |x_i - M|}{n}$, where $M$ is the median.


When the Mean Deviation is calculated from the mode, it is the average absolute deviation of observations from the mode.

Formula: $\text{MD}_{\text{mode}} = \frac{\sum |x_i - Z|}{n}$, where $Z$ is the mode.


The question specifically asks about the Mean Deviation calculated "from the median". Therefore, it represents the average absolute deviation of observations from the median.


Comparing this conclusion with the given options:

(A) mean

(B) median

(C) mode

(D) range

The correct option is (B).


The final answer is $\boxed{median}$.

Question 64. A distribution with negative skewness has a tail extending towards the lower values.

(A) True

(B) False

(C) True, but only for discrete data.

(D) False, the tail is towards higher values.

Answer:

Solution:


Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.

There are three types of skewness:

1. Positive Skewness (Right Skewness): The tail on the right side of the distribution is longer or fatter than the tail on the left side. The peak of the distribution is shifted to the left. In this case, Mean $\ge$ Median $\ge$ Mode.

2. Negative Skewness (Left Skewness): The tail on the left side of the distribution is longer or fatter than the tail on the right side. The peak of the distribution is shifted to the right. In this case, Mean $\le$ Median $\le$ Mode.

3. Zero Skewness: The distribution is symmetric about its mean. The tails on both sides are roughly equal in length. Examples include the normal distribution. In this case, Mean = Median = Mode.


The question states that a distribution with negative skewness has a tail extending towards the lower values.

Negative skewness means the left tail is longer. The left side of a distribution corresponds to lower values on the horizontal axis.

Therefore, a distribution with negative skewness indeed has a tail extending towards the lower values.

This property holds true for both discrete and continuous data distributions that exhibit skewness.


The statement "A distribution with negative skewness has a tail extending towards the lower values" is a correct description of negative skewness.


Comparing this conclusion with the given options:

(A) True

(B) False

(C) True, but only for discrete data.

(D) False, the tail is towards higher values.

The correct option is (A).


The final answer is $\boxed{True}$.

Question 65. In a box plot, the box represents the Interquartile Range (IQR), extending from:

(A) Minimum value to Maximum value.

(B) $Q_1$ to $Q_3$.

(C) Median to $Q_3$.

(D) $Q_1$ to Median.

Answer:

Solution:


A box plot (also known as a box and whisker plot) is a graphical representation of the distribution of a dataset based on five key summary statistics: the minimum value, the first quartile ($Q_1$), the median ($Q_2$), the third quartile ($Q_3$), and the maximum value.


The box in a box plot represents the Interquartile Range (IQR).

The Interquartile Range (IQR) is defined as the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$).

$\text{IQR} = Q_3 - Q_1$

The first quartile ($Q_1$) is the value below which 25% of the data falls.

The third quartile ($Q_3$) is the value below which 75% of the data falls (or above which 25% of the data falls).


Graphically, the box in a box plot spans from the first quartile ($Q_1$) to the third quartile ($Q_3$). The median ($Q_2$) is marked by a line inside the box.


Therefore, the box represents the central 50% of the data, which is bounded by the first quartile and the third quartile.


Comparing this description with the given options:

(A) Minimum value to Maximum value. This represents the range of the entire dataset, which is typically shown by the whiskers.

(B) $Q_1$ to $Q_3$. This correctly describes the boundaries of the box and represents the IQR.

(C) Median to $Q_3$. This represents the range of the upper 25% of the data values within the IQR, not the entire IQR.

(D) $Q_1$ to Median. This represents the range of the lower 25% of the data values within the IQR, not the entire IQR.

The correct option is (B).


The final answer is $\boxed{Q_1 \text{ to } Q_3}$.

Question 66. If the price of a commodity increases and its demand decreases, the correlation between price and demand is likely:

(A) Positive.

(B) Negative.

(C) Zero.

(D) Cannot be determined.

Answer:

Solution:


Correlation measures the strength and direction of a linear relationship between two quantitative variables. The direction of the relationship is indicated by the sign of the correlation coefficient.


1. Positive Correlation: Occurs when two variables tend to increase or decrease together. If one variable goes up, the other also tends to go up. If one variable goes down, the other also tends to go down.

2. Negative Correlation: Occurs when one variable tends to increase while the other tends to decrease. If one variable goes up, the other tends to go down. If one variable goes down, the other tends to go up.

3. Zero Correlation: Indicates that there is no linear relationship between the two variables.


In the given scenario, we are told that if the price of a commodity increases, its demand decreases. This is an example of an inverse relationship, where a change in one variable is associated with an opposite change in the other variable.

Specifically, as price goes up, demand goes down. This pattern corresponds to a negative relationship.

In economics, this relationship is described by the Law of Demand, which states that, all else being equal, as the price of a good or service increases, consumer demand for the good or service will decrease, and vice versa.


Since an increase in price is associated with a decrease in demand, and a decrease in price would be associated with an increase in demand (moving in opposite directions), the correlation between price and demand is likely negative.


Comparing this conclusion with the given options:

(A) Positive.

(B) Negative.

(C) Zero.

(D) Cannot be determined.

The correct option is (B).


The final answer is $\boxed{Negative}$.

Question 67. The coefficient of correlation is a unitless measure.

(A) True

(B) False

(C) True, only if the variables have the same units.

(D) False, its unit is the product of the units of the two variables.

Answer:

The coefficient of correlation, denoted by '$r$', measures the linear relationship between two variables.


It is calculated as the covariance of the two variables divided by the product of their standard deviations.

$r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$


Since both the covariance and the standard deviations have units that are the product of the units of the variables, the units cancel out in the division.


For example, if X and Y are measured in dollars ($), then Cov(X, Y) has units of $^2$, and $\sigma_X \sigma_Y$ also has units of $^2$. Thus, their ratio is unitless.


Therefore, the statement "The coefficient of correlation is a unitless measure" is True.

Question 68. A percentile rank of 50 means that the value is equal to the _________.

(A) Mean

(B) Median

(C) Mode

(D) Quartile Deviation

Answer:

The percentile rank of a value in a dataset indicates the percentage of scores in its frequency distribution that are less than or equal to that value.

A percentile rank of 50, also known as the 50th percentile ($P_{50}$), is the value below which 50% of the observations in the dataset fall. This means it is the central point of the distribution.

Let's evaluate the given options:

(A) Mean: The mean is the arithmetic average. It is not necessarily the same as the 50th percentile, especially in a skewed distribution.

(B) Median: The median is defined as the middle value of a dataset when it is sorted in ascending or descending order. By its very definition, the median is the value that splits the dataset into two equal halves, with 50% of the data points below it and 50% above it. This is identical to the definition of the 50th percentile.

(C) Mode: The mode is the value that appears most frequently in a dataset. It can be different from the median.

(D) Quartile Deviation: This is a measure of data spread or dispersion, not a measure of central tendency. It is calculated from the first and third quartiles.


Therefore, a percentile rank of 50 is, by definition, equal to the median of the dataset.

The correct option is (B).

Question 69. Which of the following is considered an absolute measure of dispersion?

(A) Coefficient of Variation

(B) Coefficient of Skewness

(C) Standard Deviation

(D) Correlation Coefficient

Answer:

Measures of dispersion describe the spread or variability of a dataset. They can be classified into two types: absolute measures and relative measures.

An absolute measure of dispersion is expressed in the same units as the original data. It indicates the amount of variation on its own. Examples include the range, quartile deviation, variance, and standard deviation.

A relative measure of dispersion is a unit-free ratio or percentage. It is used to compare the variability of two or more datasets with different units or different average values. Examples include the coefficient of variation and the coefficient of range.

Let's analyze the given options:

(A) Coefficient of Variation: This is calculated as the ratio of the standard deviation to the mean ($CV = \frac{\sigma}{\mu}$). Since it is a ratio, it is unitless and thus a relative measure of dispersion. It is used for comparing the variability of different series.

(B) Coefficient of Skewness: This measures the asymmetry of the data distribution, not its spread or dispersion. It is a measure of shape.

(C) Standard Deviation: This is a measure of how much the data points deviate from the mean. It is expressed in the same units as the original data (e.g., if data is in kilograms, the standard deviation is also in kilograms). Therefore, it is an absolute measure of dispersion.

(D) Correlation Coefficient: This measures the strength and direction of the relationship between two variables, not the dispersion of a single dataset.


Based on the definitions, the Standard Deviation is the only absolute measure of dispersion among the choices.

The correct option is (C).

Question 70. If adding a constant value to every observation in a dataset, which measure of dispersion remains unchanged?

(A) Standard Deviation

(B) Variance

(C) Range

(D) All of the above.

Answer:

This question asks about the effect of a change of origin on measures of dispersion. Adding a constant value to every observation in a dataset is known as a change of origin (it shifts the entire dataset along the number line). Measures of dispersion quantify the spread or variability of data, which is concerned with the distances between data points, not their absolute location.

Let the original dataset be $X = \{x_1, x_2, \dots, x_n\}$.

Let a new dataset $Y$ be formed by adding a constant 'c' to each observation: $Y = \{x_1+c, x_2+c, \dots, x_n+c\}$.

Let's analyze how this change affects each of the given measures:

(A) Standard Deviation and (B) Variance:

The mean of the new dataset, $\mu_Y$, will be $\mu_X + c$. The standard deviation and variance are calculated based on the deviation of each point from the mean, i.e., $(x_i - \mu)$. For the new dataset, the deviation of a point $y_i$ from the new mean $\mu_Y$ is:

$y_i - \mu_Y = (x_i + c) - (\mu_X + c)$

$ = x_i + c - \mu_X - c$

$ = x_i - \mu_X$

Since the individual deviations from the mean do not change, the sum of squared deviations, $\sum (x_i - \mu_X)^2$, remains the same. Consequently, both the variance ($\sigma^2 = \frac{\sum(x_i-\mu)^2}{N}$) and its square root, the standard deviation ($\sigma$), remain unchanged.

(C) Range:

The range is the difference between the maximum and minimum values in the dataset. Let $x_{max}$ and $x_{min}$ be the maximum and minimum values of the original dataset.

The new maximum value will be $x_{max} + c$.

The new minimum value will be $x_{min} + c$.

The new range will be:

New Range $= (x_{max} + c) - (x_{min} + c)$

$ = x_{max} - x_{min}$

This is the same as the original range. Thus, the range also remains unchanged.


Since the Standard Deviation, Variance, and Range all remain unchanged when a constant is added to every observation, the correct option is (D) All of the above.

Question 71. If every observation in a dataset is multiplied by a constant $k$ ($k>0$), the standard deviation of the new dataset will be:

(A) The same as the original.

(B) $k$ times the original standard deviation.

(C) $k^2$ times the original standard deviation.

(D) $k$ plus the original standard deviation.

Answer:

This question explores the effect of a change of scale on the standard deviation. Multiplying every observation in a dataset by a constant value is known as a change of scale.

Let the original dataset be $X = \{x_1, x_2, \dots, x_n\}$, with an original mean $\mu_X$ and original standard deviation $\sigma_X$.

The formula for the original standard deviation is:

$\sigma_X = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \mu_X)^2}{n}}$

Now, let's create a new dataset, $Y$, by multiplying each observation by a constant $k$ ($k>0$):

$Y = \{kx_1, kx_2, \dots, kx_n\}$

First, the mean of the new dataset, $\mu_Y$, will be:

$\mu_Y = \frac{\sum (kx_i)}{n} = k \left(\frac{\sum x_i}{n}\right) = k \mu_X$

Next, let's find the new standard deviation, $\sigma_Y$. The deviation of a new data point $kx_i$ from the new mean $k\mu_X$ is $kx_i - k\mu_X = k(x_i - \mu_X)$.

The formula for the new standard deviation is:

$\sigma_Y = \sqrt{\frac{\sum_{i=1}^{n} (kx_i - k\mu_X)^2}{n}}$

$\sigma_Y = \sqrt{\frac{\sum_{i=1}^{n} [k(x_i - \mu_X)]^2}{n}}$

$\sigma_Y = \sqrt{\frac{\sum_{i=1}^{n} k^2(x_i - \mu_X)^2}{n}}$

Since $k^2$ is a constant, we can factor it out of the summation:

$\sigma_Y = \sqrt{k^2 \left(\frac{\sum_{i=1}^{n} (x_i - \mu_X)^2}{n}\right)}$

We recognize the term inside the parenthesis as the original variance, $\sigma_X^2$.

$\sigma_Y = \sqrt{k^2 \sigma_X^2}$

Since it is given that $k>0$, the square root of $k^2$ is simply $k$.

$\sigma_Y = k \sigma_X$

This shows that if every observation is multiplied by a constant $k$, the new standard deviation is $k$ times the original standard deviation.


Therefore, the correct option is (B).

Question 72. A dataset is symmetric if its distribution shape is a mirror image on either side of the center.

(A) True

(B) False

(C) True, only if the mean and median are equal.

(D) False, symmetry is measured by kurtosis.

Answer:

The question provides a definition for a symmetric distribution and asks if it is correct.

A distribution is said to be symmetric if its left half is a mirror image of its right half. Imagine folding the distribution's graph (like a histogram or a density curve) along a central vertical line; if the two halves match perfectly, the distribution is symmetric.

In a perfectly symmetric distribution, the measures of central tendency coincide. This means:

Mean = Median = Mode

Let's evaluate the given options based on this definition:

(A) True: The statement "A dataset is symmetric if its distribution shape is a mirror image on either side of the center" is the exact definition of a symmetric distribution. This is correct.

(B) False: This is incorrect as the statement is the definition of symmetry.

(C) True, only if the mean and median are equal: While it is a property of a symmetric distribution that the mean and median are equal, the statement in the question is the fundamental definition of symmetry itself (based on shape). The equality of mean and median is a consequence, not a pre-condition for the definition. Thus, the definition is true on its own.

(D) False, symmetry is measured by kurtosis: This is incorrect. The lack of symmetry is measured by skewness. Kurtosis measures the "peakedness" or "tailedness" of a distribution, not its symmetry.


The statement provided in the question is the correct and fundamental definition of a symmetric dataset. Therefore, the statement is true.

The correct option is (A).

Question 73. If the coefficient of skewness is positive, the distribution is skewed to the ________.

(A) left

(B) right

(C) center

(D) both sides

Answer:

Correct Option: (B) right


Explanation:

Skewness measures the asymmetry of a probability distribution. A positively skewed distribution means that the right tail (larger values) is longer or fatter than the left tail (smaller values).

In such a distribution:

  • The mean is greater than the median
  • The tail is stretched more to the right

Mathematical Interpretation:

The coefficient of skewness is calculated using:

$\text{Skewness} = \frac{\text{Mean} - \text{Mode}}{\text{Standard Deviation}}$

If skewness $>$ 0, then the distribution is positively skewed.


Conclusion:

Since the coefficient of skewness is positive, the distribution is skewed to the right.

Question 74. The correlation coefficient measures the degree of _________ association between two variables.

(A) causal

(B) non-linear

(C) linear

(D) indirect

Answer:

Correct Option: (C) linear


Explanation:

The correlation coefficient, usually denoted by $r$, is a statistical measure that calculates the strength and direction of a linear relationship between two variables.

Its values lie between $-1$ and $1$:

  • $r = 1$: Perfect positive linear correlation
  • $r = -1$: Perfect negative linear correlation
  • $r = 0$: No linear correlation

Important Note:

The correlation coefficient does not imply a causal relationship, nor does it effectively measure non-linear associations.


Conclusion:

Therefore, the correlation coefficient specifically measures the degree of linear association between two variables.

Question 75. If two variables have a perfect non-linear relationship (e.g., $y = x^2$), their linear correlation coefficient ($r$) might be close to zero.

(A) True

(B) False

(C) True, unless the non-linear relationship is monotonic.

(D) False, $r$ will always be $\pm 1$ for any relationship.

Answer:

Correct Option: (A) True


Explanation:

The linear correlation coefficient $r$ measures the strength and direction of a linear relationship between two variables.

When the relationship between variables is perfectly non-linear (e.g., $y = x^2$), the data may still show a strong association, but not a linear one.

As a result, the value of $r$ can be close to zero, indicating no linear correlation despite a clear functional relationship.


Example:

For $y = x^2$ and $x \in [-2, 2]$, the graph is symmetric about the y-axis. The values of $y$ increase in both directions, but this does not reflect a consistent linear trend.

Hence, $r \approx 0$ even though the relationship is perfectly non-linear.


Conclusion:

The statement is true — for a perfect non-linear relationship, the linear correlation coefficient ($r$) can indeed be close to zero.

Question 76. The second quartile ($Q_2$) is also known as the _________.

(A) Mean

(B) Mode

(C) Median

(D) Range

Answer:

Correct Option: (C) Median


Explanation:

Quartiles divide a data set into four equal parts.

  • First quartile ($Q_1$): 25th percentile
  • Second quartile ($Q_2$): 50th percentile
  • Third quartile ($Q_3$): 75th percentile

The second quartile ($Q_2$) is the value that separates the lower 50% of the data from the upper 50%, which is the definition of the median.


Conclusion:

Hence, $Q_2$ is also known as the Median.

Question 77. If the Coefficient of Variation of dataset A is 15% and dataset B is 20%, which dataset shows greater relative variability?

(A) Dataset A

(B) Dataset B

(C) Both show the same variability

(D) Cannot be determined without the means.

Answer:

Correct Option: (B) Dataset B


Explanation:

The Coefficient of Variation (CV) is a statistical measure of the relative dispersion of data points around the mean. It is calculated as:

$\text{CV} = \frac{\sigma}{\mu} \times 100\%$

where $\sigma$ is the standard deviation and $\mu$ is the mean.

The higher the coefficient of variation, the greater the relative variability in the data set.


Application:

Dataset A has a CV of 15%, while Dataset B has a CV of 20%.

Since $20\% > 15\%$, Dataset B has a greater degree of relative variability compared to Dataset A.


Conclusion:

Dataset B shows greater relative variability.

Question 78. Which measure of dispersion is most appropriate when the data contains extreme outliers?

(A) Mean Deviation

(B) Standard Deviation

(C) Range

(D) Quartile Deviation

Answer:

Correct Option: (D) Quartile Deviation


Explanation:

The Quartile Deviation, also known as the Interquartile Range (IQR) divided by 2, is based on the middle 50% of the data — between the first quartile ($Q_1$) and the third quartile ($Q_3$).

It is calculated as:

$\text{Quartile Deviation} = \frac{Q_3 - Q_1}{2}$

Since it does not depend on extreme values, it remains unaffected by outliers and skewed data.


Why not others?

  • Range uses only the extreme values (maximum and minimum), so it is highly affected by outliers.
  • Standard Deviation and Mean Deviation consider all values, so they are also sensitive to extreme data points.

Conclusion:

Quartile Deviation is the most appropriate measure of dispersion when the data contains extreme outliers.

Question 79. If the data is nominal (categorical), which measures of dispersion, skewness, or correlation are typically meaningful?

(A) Standard Deviation, Skewness, Correlation

(B) Range, Quartile Deviation, Mean Deviation

(C) None of these measures are directly applicable.

(D) Only Correlation (using specific methods like Spearman's rank correlation).

Answer:

Correct Option: (C) None of these measures are directly applicable.


Explanation:

Nominal data refers to categorical data that has no inherent order or numerical meaning (e.g., colors, gender, types of fruit).

Most statistical measures such as:

  • Standard Deviation – requires numerical values
  • Skewness – assumes an ordered numerical scale
  • Correlation – needs numerical variables or at least ordinal ranking

None of these are suitable for nominal data, which lacks both order and scale.


Clarification on Option D:

Spearman's Rank Correlation is used for ordinal data, not nominal. Nominal data cannot be ranked meaningfully, so even that method doesn't apply here.


Conclusion:

For nominal data, none of the listed statistical measures of dispersion, skewness, or correlation are directly applicable.

Question 80. The shape of a distribution is sometimes described using the term 'tail heaviness'. This relates to:

(A) Skewness

(B) Kurtosis

(C) Range

(D) Mean Deviation

Answer:

Correct Option: (B) Kurtosis


Explanation:

Kurtosis is a statistical measure used to describe the distribution of data points in a dataset, especially in terms of:

  • Tail heaviness: how heavily the tails of a distribution differ from the tails of a normal distribution
  • Peakedness: how sharp or flat the peak of the distribution is

It does not describe the symmetry (which is handled by skewness), but rather how likely extreme values (outliers) are, which relates directly to tail heaviness.


Clarification of other options:

  • Skewness measures asymmetry, not tail heaviness.
  • Range shows the spread between max and min but not the tail structure.
  • Mean Deviation gives average deviation from the mean, but ignores shape characteristics like tails or peaks.

Conclusion:

The term 'tail heaviness' specifically relates to kurtosis.

Question 81. If the sum of squares of deviations from the mean is 100 for a dataset of 10 observations, the variance is:

(A) 10

(B) 100

(C) $\sqrt{10}$

(D) 1

Answer:

Correct Option: (A) 10


Given:

Sum of squares of deviations from the mean = $100$

Number of observations, $n = 10$


To Find:

Variance of the dataset.


Solution:

Variance ($\sigma^2$) is defined as the average of the squared deviations from the mean. For a population variance,

$\sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n}$

... (i)

Substituting the given values,

$\sigma^2 = \frac{100}{10} = 10$

... (ii)


Conclusion:

The variance of the dataset is 10.

Question 82. What is the range of the following data array?

51281520181114169
1971013176214223

(A) 22

(B) 3

(C) 19

(D) 25

Answer:

Correct Option: (A) 22


Given:

Data array:

5, 12, 8, 15, 20, 18, 11, 14, 16, 9, 19, 7, 10, 13, 17, 6, 21, 4, 22, 3


To Find:

The range of the data set.


Solution:

The range is calculated as the difference between the maximum and minimum values in the dataset.

Maximum value = 22

Minimum value = 3

$\text{Range} = \text{Max} - \text{Min} = 22 - 3$

... (i)

So,

$\text{Range} = 19$

... (ii)


Note:

The options list 22 (A) and 19 (C), but the calculated range is 19. The correct range based on calculation is 19, which corresponds to option (C).

Question 83. If the correlation coefficient between height and weight is $r=0.75$, and height is measured in meters and weight in kilograms, what would be the correlation coefficient if height were measured in centimeters and weight in grams?

(A) $0.75 \times 100 \times 1000$

(B) $0.75 / (100 \times 1000)$

(C) $0.75$

(D) Cannot be determined without more information.

Answer:

Correct Option: (C) $0.75$


Given:

Correlation coefficient between height (meters) and weight (kilograms) is $r = 0.75$.

New units: height in centimeters and weight in grams.


To Find:

The correlation coefficient after changing the units of measurement.


Explanation:

The correlation coefficient $r$ is a unitless measure of linear association between two variables.

Changing the units of measurement (scaling by constant factors) does not affect the correlation coefficient.

This is because correlation depends on standardized values and is invariant under linear transformations such as multiplication by constants.


Therefore,

The correlation coefficient remains the same:

$r = 0.75$

... (i)


Conclusion:

The correlation coefficient when height is measured in centimeters and weight in grams is still 0.75.

Question 84. A distribution where the kurtosis is less than that of a normal distribution is called:

(A) Leptokurtic

(B) Mesokurtic

(C) Platykurtic

(D) Skewed

Answer:

Correct Option: (C) Platykurtic


Given:

Kurtosis of a distribution is less than the kurtosis of a normal distribution.


To Find:

The name of the distribution with kurtosis less than that of a normal distribution.


Explanation:

Kurtosis measures the "tailedness" or the peak sharpness of a distribution.

  • Mesokurtic: Kurtosis equal to that of the normal distribution (reference point).
  • Leptokurtic: Kurtosis greater than that of the normal distribution; distribution has heavier tails and a sharper peak.
  • Platykurtic: Kurtosis less than that of the normal distribution; distribution has lighter tails and a flatter peak.


Therefore,

A distribution with kurtosis less than a normal distribution is called Platykurtic.

Question 85. If the 90th percentile of scores on a test is 85, it means 90% of the scores are:

(A) Exactly 85.

(B) Below or equal to 85.

(C) Above 85.

(D) Exactly 90.

Answer:

Correct Option: (B) Below or equal to 85.


Given:

90th percentile score = 85.


To Find:

The meaning of the 90th percentile score being 85.


Explanation:

The 90th percentile means that 90% of the scores lie below or are equal to the value at the 90th percentile.

In this case, 90% of the scores are less than or equal to 85.


Therefore,

90% of the test scores are below or equal to 85.

Question 86. Which of the following is NOT a measure of dispersion?

(A) Interquartile Range

(B) Median

(C) Mean Absolute Deviation

(D) Coefficient of Range

Answer:

Correct Option: (B) Median


Given:

Four options are provided, and we need to identify which is not a measure of dispersion.


To Find:

The option that does not represent a measure of dispersion.


Explanation:

Measures of Dispersion describe the spread or variability of data. Common measures include:

  • Interquartile Range (IQR): Difference between the third and first quartiles.
  • Mean Absolute Deviation (MAD): Average of absolute deviations from the mean.
  • Coefficient of Range: Ratio of the difference and sum of maximum and minimum values.

The Median, however, is a measure of central tendency, not dispersion.


Therefore,

Median is NOT a measure of dispersion.



Short Answer Type Questions

Question 1. Explain the importance of data interpretation in statistics. Give one real-life example where data interpretation is crucial.

Answer:

Importance of Data Interpretation in Statistics:

Data interpretation is a vital step in statistics that involves analyzing, summarizing, and making sense of raw data to extract meaningful insights. It helps in transforming numerical figures and patterns into understandable information for decision-making.

Proper data interpretation enables:

  • Identification of trends and patterns.
  • Informed decision-making based on evidence.
  • Verification or rejection of hypotheses.
  • Effective communication of statistical findings.

Real-life Example:

Healthcare Sector: In medical research, interpreting clinical trial data is crucial to determine the effectiveness and safety of a new drug. Accurate interpretation guides doctors and policymakers to approve or reject treatments, directly impacting patient health and safety.

Question 2. The number of students present in a class during a week are 35, 38, 40, 36, 39, 37. What is the average attendance for the week?

Answer:

Given:

Attendance numbers over the week: 35, 38, 40, 36, 39, 37


To Find:

Average attendance for the week.


Solution:

The average attendance is the sum of daily attendance divided by the number of days.

Sum of attendance = 35 + 38 + 40 + 36 + 39 + 37

$= 225$

... (i)

Number of days = 6

Average attendance, $ \bar{x} = \frac{\text{Sum of attendance}}{\text{Number of days}} = \frac{225}{6} = 37.5 $

$\bar{x} = 37.5$

... (ii)


Therefore, the average attendance for the week is 37.5 students.

Question 3. Define 'Measure of Dispersion'. Why is it important to study dispersion along with central tendency?

Answer:

Definition of Measure of Dispersion:

A Measure of Dispersion describes the extent to which data values are spread out or scattered around a central value (such as the mean or median). It quantifies the variability or diversity within a dataset.


Importance of Studying Dispersion Along with Central Tendency:

While measures of central tendency (mean, median, mode) provide a summary of the central point of data, they do not tell us how the data values are distributed around that center.

Studying dispersion is important because:

  • It reveals the degree of variation or consistency within the data.
  • Helps in understanding the reliability of the central tendency measures.
  • Allows comparison between different datasets having similar central values but different spreads.
  • Assists in identifying outliers or extreme values affecting the dataset.

Thus, both central tendency and dispersion together provide a more complete picture of the data distribution.

Question 4. Calculate the Range for the following data: 15, 25, 12, 30, 18, 20, 22.

Answer:

Given Data:

15, 25, 12, 30, 18, 20, 22


Solution:

The Range is defined as the difference between the maximum and minimum values in the dataset.

Maximum value = 30

Minimum value = 12

Therefore,

$\text{Range} = 30 - 12 = 18$

... (i)

Hence, the range of the given data is 18.

Question 5. Find the Quartile Deviation for the following data: 8, 10, 12, 15, 18, 20, 25, 30, 35.

Answer:

Given Data:

8, 10, 12, 15, 18, 20, 25, 30, 35


Solution:

The Quartile Deviation (QD) is defined as half the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$):

$\text{QD} = \frac{Q_3 - Q_1}{2}$

... (i)

First, arrange data in ascending order (already given):

8, 10, 12, 15, 18, 20, 25, 30, 35

Number of observations, $n = 9$

Find $Q_1$ (the first quartile):

$Q_1$ is the median of the first half of data (excluding median if odd number of observations):

First half: 8, 10, 12, 15

Median of these four values is average of 10 and 12, so

$Q_1 = \frac{10 + 12}{2} = 11$

... (ii)

Find $Q_3$ (the third quartile):

Third quartile is median of upper half:

Upper half: 20, 25, 30, 35

Median of these is average of 25 and 30, so

$Q_3 = \frac{25 + 30}{2} = 27.5$

... (iii)

Now, calculate Quartile Deviation:

$\text{QD} = \frac{27.5 - 11}{2} = \frac{16.5}{2} = 8.25$

... (iv)

Therefore, the Quartile Deviation of the data is 8.25.

Question 6. Calculate the Mean Deviation about the Mean for the data: 5, 7, 8, 10, 15.

Answer:

Given: The data set is 5, 7, 8, 10, 15.


To Find: Mean Deviation about the Mean.


Solution:

Step 1: Calculate the mean ($\bar{x}$) of the data.

$\bar{x} = \frac{5 + 7 + 8 + 10 + 15}{5}$

... (i)

Calculating the sum in the numerator:

$5 + 7 + 8 + 10 + 15 = 45$

(Sum of data)

$\therefore \bar{x} = \frac{45}{5} = 9$

(Mean)


Step 2: Calculate the absolute deviations from the mean.

Data, $x_i$ Deviation, $|x_i - \bar{x}|$
5$|5 - 9| = 4$
7$|7 - 9| = 2$
8$|8 - 9| = 1$
10$|10 - 9| = 1$
15$|15 - 9| = 6$

Step 3: Calculate the mean of these absolute deviations.

Mean Deviation $= \frac{4 + 2 + 1 + 1 + 6}{5} = \frac{14}{5} = 2.8$

... (ii)


Final Answer: The Mean Deviation about the Mean is 2.8.

Question 7. Calculate the Variance for the following data: 2, 4, 6, 8.

Answer:

Given: The data set is 2, 4, 6, 8.


To Find: Variance of the data.


Solution:

Step 1: Calculate the mean ($\bar{x}$) of the data.

$\bar{x} = \frac{2 + 4 + 6 + 8}{4}$

... (i)

Calculate the sum in the numerator:

$2 + 4 + 6 + 8 = 20$

(Sum of data)

$\therefore \bar{x} = \frac{20}{4} = 5$

(Mean)


Step 2: Calculate the squared deviations from the mean.

Data, $x_i$ Deviation, $x_i - \bar{x}$ Squared Deviation, $(x_i - \bar{x})^2$
2$2 - 5 = -3$$(-3)^2 = 9$
4$4 - 5 = -1$$(-1)^2 = 1$
6$6 - 5 = 1$$1^2 = 1$
8$8 - 5 = 3$$3^2 = 9$

Step 3: Calculate the variance.

Variance, $ \sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n} = \frac{9 + 1 + 1 + 9}{4} = \frac{20}{4} = 5$

... (ii)


Final Answer: The Variance of the data is 5.

Question 8. If the Standard Deviation of a dataset is 5, what is its Variance?

Answer:

Given: Standard Deviation ($\sigma$) of the dataset is 5.


To Find: Variance ($\sigma^2$) of the dataset.


Solution:

Recall the relationship between Standard Deviation and Variance:

$\sigma^2 = (\text{Standard Deviation})^2 = 5^2$

... (i)

Calculating the square:

$\sigma^2 = 25$

(Variance)


Final Answer: The Variance of the dataset is 25.

Question 9. Define 'Skewness'. What does it indicate about a distribution?

Answer:

Definition of Skewness:

Skewness is a statistical measure that describes the degree of asymmetry or departure from symmetry in the distribution of data values. It indicates how much the distribution deviates from a perfectly symmetrical bell-shaped curve (normal distribution).


What Skewness Indicates About a Distribution:

Positive Skewness: When skewness is positive, the tail on the right side of the distribution is longer or fatter than the left side. This indicates that the majority of data values are concentrated on the left, with a few extreme large values stretching the distribution to the right.

Negative Skewness: When skewness is negative, the tail on the left side of the distribution is longer or fatter than the right side. This means most data values are concentrated on the right, with some extreme small values pulling the distribution to the left.

Zero Skewness: When skewness is zero or very close to zero, the distribution is approximately symmetric, indicating that the data is evenly distributed around the mean.


Summary: Skewness helps in understanding the shape and direction of the distribution’s deviation from symmetry, which is important for selecting appropriate statistical methods and interpreting data behavior.

Question 10. What is the shape of a distribution if the coefficient of skewness is positive? Draw a rough sketch to illustrate.

Answer:

Given:

The coefficient of skewness is positive.


To Find:

The shape of the distribution when the coefficient of skewness is positive.


Solution:

The coefficient of skewness measures the asymmetry of the probability distribution of a real-valued random variable about its mean.

If the coefficient of skewness is positive, the distribution is said to be right-skewed or positively skewed. This means the tail on the right side (higher values) is longer or fatter than the left side.

In a positively skewed distribution, the mean is greater than the median, which is greater than the mode:

$\text{Mean} > \text{Median} > \text{Mode}$

(Relation in positive skewness)           ... (i)


Rough Sketch:

The distribution curve looks like this:

Positively skewed distribution

Figure: Positively Skewed Distribution


Explanation:

The bulk of the data is concentrated on the left side, with a long tail extending to the right, representing higher values that occur less frequently.

Question 11. Define 'Kurtosis'. What aspect of a distribution does it measure?

Answer:

Definition of Kurtosis:

Kurtosis is a statistical measure that describes the shape of the tails of a probability distribution. It indicates whether the data are heavy-tailed or light-tailed relative to a normal distribution.


What Aspect of Distribution Kurtosis Measures:

Kurtosis measures the peakedness or flatness of the distribution.

A distribution with:

  • High kurtosis (leptokurtic) has heavy tails and a sharper peak compared to the normal distribution.
  • Low kurtosis (platykurtic) has light tails and a flatter peak.
  • Mesokurtic distribution has kurtosis similar to the normal distribution.

Summary:

Kurtosis helps to understand the likelihood of extreme values (outliers) in the data by indicating tail thickness and peak sharpness.

Question 12. Explain the difference between a leptokurtic and a platykurtic distribution.

Answer:

Given:

Two types of distributions: leptokurtic and platykurtic.


To Find:

The difference between leptokurtic and platykurtic distributions.


Solution:

Feature Leptokurtic Distribution Platykurtic Distribution
Shape More peaked than normal distribution Flatter than normal distribution
Kurtosis Value Greater than 3 (Kurtosis > 3) Less than 3 (Kurtosis < 3)
Tails Heavy tails (more extreme values/outliers) Light tails (fewer extreme values)
Peak Sharper and higher peak Lower and broader peak

Summary:

Leptokurtic distributions indicate a higher probability of extreme values due to heavy tails and a sharper peak, whereas platykurtic distributions show fewer outliers with lighter tails and a flatter shape.

Question 13. Define 'Percentile Rank'. If a student scores in the 80th percentile on a test, what does it mean?

Answer:

Definition of Percentile Rank:

The percentile rank of a score is the percentage of scores in its frequency distribution that are equal to or less than that score.


Interpretation of 80th Percentile Score:

If a student scores in the 80th percentile on a test, it means that the student has scored better than 80% of all the students who took the test.

In other words, the student’s performance is higher than or equal to 80% of the group.

Question 14. Find the 50th percentile (median) for the data: 11, 13, 15, 17, 19, 21.

Answer:

Given:

Data set: 11, 13, 15, 17, 19, 21


To Find:

The 50th percentile (Median) of the given data set.


Solution:

The 50th percentile corresponds to the median of the data.

Step 1: Arrange the data in ascending order (already sorted):

11, 13, 15, 17, 19, 21

Step 2: Since the number of observations $n = 6$ (even), the median is the average of the middle two values.

The middle two values are the 3rd and 4th terms: 15 and 17.

$\text{Median} = \frac{15 + 17}{2} = \frac{32}{2} = 16$

... (i)


Answer:

The 50th percentile (median) of the data set is 16.

Question 15. Find the first quartile ($Q_1$) for the data: 2, 5, 7, 8, 10, 12, 15.

Answer:

Given:

Data set: 2, 5, 7, 8, 10, 12, 15


To Find:

The first quartile ($Q_1$) of the given data.


Solution:

Step 1: Arrange the data in ascending order (already sorted):

2, 5, 7, 8, 10, 12, 15

Step 2: Find the position of $Q_1$, which is the 25th percentile.

Position of $Q_1 = \frac{1(n+1)}{4} = \frac{1(7+1)}{4} = \frac{8}{4} = 2$

Step 3: The 2nd value in the data is 5.


Answer:

The first quartile $Q_1$ is 5.

Question 16. Define 'Correlation'. What does a positive correlation coefficient indicate?

Answer:

Definition of Correlation:

Correlation is a statistical measure that describes the degree and direction of the linear relationship between two variables.


Interpretation of Positive Correlation Coefficient:

A positive correlation coefficient indicates that as one variable increases, the other variable also tends to increase.

In other words, both variables move in the same direction.

Question 17. What does a correlation coefficient of -0.9 indicate about the relationship between two variables?

Answer:

Given:

Correlation coefficient, $r = -0.9$


To Find:

Interpretation of the correlation coefficient value $-0.9$.


Solution:

A correlation coefficient of -0.9 indicates a strong negative linear relationship between the two variables.

This means:

  • When one variable increases, the other variable tends to decrease significantly.
  • The variables move in opposite directions.
  • The value $-0.9$ being close to $-1$ indicates the relationship is very strong.

Summary:

A correlation coefficient of $-0.9$ means a strong negative correlation exists between the two variables.

Question 18. Draw a rough scatter plot for a dataset showing a strong negative correlation.

Answer:

Given:

A dataset with a strong negative correlation.


To Find:

A rough scatter plot illustrating a strong negative correlation.


Solution:

A strong negative correlation means as one variable increases, the other decreases in a nearly linear fashion.

The scatter plot points will lie close to a line with a negative slope.

Here is a rough sketch of such a scatter plot:

Scatter plot showing strong negative correlation

Figure: Scatter Plot Showing Strong Negative Correlation

Question 19. The monthly electricity bills (in $\textsf{₹}$) for 6 households are: 850, 920, 780, 1050, 900, 1120. Calculate the median electricity bill.

Answer:

Given:

Monthly electricity bills (in $\textsf{₹}$): 850, 920, 780, 1050, 900, 1120


To Find:

The median electricity bill.


Solution:

Step 1: Arrange the data in ascending order:

780, 850, 900, 920, 1050, 1120

Step 2: Number of observations $n = 6$ (even)

Median is the average of the middle two values, i.e., 3rd and 4th terms.

$\text{Median} = \frac{900 + 920}{2} = \frac{1820}{2} = 910$

... (i)


Answer:

The median monthly electricity bill is $\textsf{₹}$ 910.

Question 20. A company's sales figures (in $\textsf{₹}$ lakhs) over 5 years are: 10, 12, 15, 11, 13. Calculate the mean sales figure.

Answer:

Given:

Sales figures (in $\textsf{₹}$ lakhs) over 5 years: 10, 12, 15, 11, 13


To Find:

The mean sales figure.


Solution:

Mean is calculated as the sum of all sales figures divided by the number of years.

$\text{Mean} = \frac{10 + 12 + 15 + 11 + 13}{5} = \frac{61}{5} = 12.2$

... (i)


Answer:

The mean sales figure over 5 years is 12.2 lakhs.

Question 21. Calculate the coefficient of Range for the data: 45, 52, 38, 60, 40, 55.

Answer:

Given:

Data set: 45, 52, 38, 60, 40, 55


To Find:

The coefficient of range for the given data.


Solution:

Step 1: Find the maximum and minimum values in the data.

Maximum value, $X_{\max} = 60$

Minimum value, $X_{\min} = 38$

Step 2: Use the formula for coefficient of range:

$\text{Coefficient of Range} = \frac{X_{\max} - X_{\min}}{X_{\max} + X_{\min}} = \frac{60 - 38}{60 + 38} = \frac{22}{98} = 0.2245$

... (i)


Answer:

The coefficient of range for the data is 0.2245.

Question 22. If $Q_1 = 20$ and $Q_3 = 40$, calculate the Coefficient of Quartile Deviation.

Answer:

Given:

First quartile, $Q_1 = 20$

Third quartile, $Q_3 = 40$


To Find:

The Coefficient of Quartile Deviation.


Solution:

The formula for the Coefficient of Quartile Deviation is:

$\text{Coefficient of Quartile Deviation} = \frac{Q_3 - Q_1}{Q_3 + Q_1}$

... (i)

Substitute the values of $Q_1$ and $Q_3$:

$= \frac{40 - 20}{40 + 20} = \frac{20}{60} = \frac{1}{3} \approx 0.3333$

... (ii)


Answer:

The Coefficient of Quartile Deviation is approximately 0.3333.

Question 23. Calculate the Mean Deviation about the Median for the data: 1, 2, 3, 4, 5.

Answer:

Given:

Data: 1, 2, 3, 4, 5


To Find:

Mean Deviation about the Median.


Solution:

Step 1: Find the median of the data.

Since $n=5$ (odd), median is the middle value:

Median = 3


Step 2: Find the absolute deviations from the median:

Data Point ($x_i$) |$x_i$ - Median|
1$|1 - 3| = 2$
2$|2 - 3| = 1$
3$|3 - 3| = 0$
4$|4 - 3| = 1$
5$|5 - 3| = 2$

Step 3: Calculate the mean of these absolute deviations:

$\text{Mean Deviation} = \frac{2 + 1 + 0 + 1 + 2}{5} = \frac{6}{5} = 1.2$

... (i)


Answer:

The Mean Deviation about the Median is 1.2.

Question 24. If the mean of a dataset is 50 and the standard deviation is 10, calculate the Coefficient of Variation.

Answer:

Given:

Mean, $\bar{x} = 50$

Standard deviation, $s = 10$


To Find:

The Coefficient of Variation (CV).


Solution:

The formula for Coefficient of Variation is:

$\text{CV} = \frac{s}{\bar{x}} \times 100 = \frac{10}{50} \times 100 = 20\%$

... (i)


Answer:

The Coefficient of Variation is 20%.

Question 25. What is a mesokurtic distribution? Give the value of the kurtosis coefficient ($\beta_2$) for such a distribution.

Answer:

Definition:

A mesokurtic distribution is a type of probability distribution that has a kurtosis similar to that of a normal distribution. It represents a moderate level of peakedness or tail thickness.


Kurtosis Coefficient ($\beta_2$):

For a mesokurtic distribution, the kurtosis coefficient is:

$\beta_2 = 3$

... (i)


Explanation:

This value indicates that the tails and peak of the mesokurtic distribution are similar to the normal distribution, neither too sharp nor too flat.

Question 26. If the mean, median, and mode of a distribution are equal, what can you say about its skewness?

Answer:

Given:

Mean = Median = Mode


To Find:

The skewness of the distribution.


Solution:

When the mean, median, and mode of a distribution are equal, it indicates that the distribution is perfectly symmetric.

Interpretation:

The coefficient of skewness is zero, meaning the distribution is not skewed (it has zero skewness).


Answer:

The distribution is symmetrical with a coefficient of skewness equal to 0.

Question 27. In a dataset of 25 values, a specific value has a percentile rank of 76. Approximately how many values are less than or equal to this value?

Answer:

Given:

Number of values in dataset, $N = 25$

Percentile rank, $P = 76$


To Find:

Number of values less than or equal to the value with percentile rank 76.


Solution:

The number of values less than or equal to the given value can be approximated by:

$\text{Number of values} = \frac{P}{100} \times N = \frac{76}{100} \times 25 = 19$

... (i)


Answer:

Approximately 19 values are less than or equal to the value with the 76th percentile rank.

Question 28. If the third quartile ($Q_3$) of a dataset is 75, what is its percentile equivalent?

Answer:

Given:

Third quartile, $Q_3 = 75$


To Find:

The percentile equivalent of the third quartile.


Solution:

The third quartile ($Q_3$) corresponds to the 75th percentile of a dataset.

This means:

Percentile equivalent of $Q_3 = 75^\text{th}$ percentile

... (i)


Answer:

The percentile equivalent of the third quartile is 75th percentile.

Question 29. Give two limitations of using Range as a measure of dispersion.

Answer:

Limitations of Using Range as a Measure of Dispersion:

1. The range considers only the two extreme values (minimum and maximum) and ignores all other data points, which may not reflect the true spread of the dataset.

2. It is highly sensitive to outliers or extreme values, which can greatly distort the range and give a misleading impression of variability.

Question 30. Explain the term 'rank correlation'. When is it preferred over Pearson correlation?

Answer:

Definition of Rank Correlation:

Rank correlation measures the degree of association between two variables based on the rankings of the data rather than their actual numerical values.


When is Rank Correlation Preferred over Pearson Correlation?

Rank correlation is preferred when:

1. The data are ordinal or rankings rather than interval or ratio scale.

2. The relationship between variables is not linear.

3. The data contains outliers or is not normally distributed, making Pearson’s correlation less reliable.

Question 31. For a symmetrical distribution, what are the values of the first, second, and third quartiles relative to the median?

Answer:

Given:

A symmetrical distribution.


To Find:

Values of the first quartile ($Q_1$), second quartile ($Q_2$ or median), and third quartile ($Q_3$) relative to the median.


Solution:

In a symmetrical distribution:

$Q_2$ (Median) divides the dataset into two equal halves.

The first quartile $Q_1$ is the median of the lower half of the data.

The third quartile $Q_3$ is the median of the upper half of the data.

Since the distribution is symmetrical, the distances from the median to $Q_1$ and $Q_3$ are equal on either side:

$Q_2 = \text{Median}$

... (i)

$Q_3 - Q_2 = Q_2 - Q_1$

... (ii)


Answer:

For a symmetrical distribution, the median ($Q_2$) lies exactly midway between the first ($Q_1$) and third quartiles ($Q_3$), with equal spacing:

$Q_1$ and $Q_3$ are equidistant from the median $Q_2$.

Question 32. If the Standard Deviation of a series is 8 and the Mean is 40, find the Coefficient of Standard Deviation.

Answer:

Given:

Standard Deviation, $\sigma = 8$

Mean, $\bar{x} = 40$


To Find:

Coefficient of Standard Deviation (C.S.D.)


Solution:

The Coefficient of Standard Deviation is given by the formula:

$\text{C.S.D.} = \frac{\sigma}{\bar{x}} \times 100$

... (i)

Substituting the values:

$\text{C.S.D.} = \frac{8}{40} \times 100 = 20\%$

... (ii)


Answer:

The Coefficient of Standard Deviation is 20%.

Question 33. What does a correlation coefficient of 0 indicate?

Answer:

Definition:

The correlation coefficient measures the strength and direction of a linear relationship between two variables.


Interpretation of Correlation Coefficient = 0:

A correlation coefficient of 0 indicates no linear relationship between the two variables.

This means that changes in one variable do not predict changes in the other in a linear manner.

However, it does not necessarily imply that the variables are completely independent; there may be a non-linear relationship.

Question 34. Calculate the Standard Deviation for the data: 1, 1, 1, 1, 1.

Answer:

Given:

Data set: 1, 1, 1, 1, 1


To Find:

Standard Deviation ($\sigma$)


Solution:

Step 1: Calculate the mean ($\bar{x}$):

$\bar{x} = \frac{1 + 1 + 1 + 1 + 1}{5} = \frac{5}{5} = 1$

... (i)

Step 2: Calculate the squared deviations from the mean:

Each value is 1, so each deviation is $1 - 1 = 0$.

Squared deviations: $0^2 = 0$ for each value.

Step 3: Calculate variance ($\sigma^2$):

$\sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n} = \frac{0 + 0 + 0 + 0 + 0}{5} = 0$

... (ii)

Step 4: Calculate standard deviation ($\sigma$):

$\sigma = \sqrt{\sigma^2} = \sqrt{0} = 0$

... (iii)


Answer:

The standard deviation of the data set is 0, indicating no variability in the data.

Question 35. The scores of 5 students in a test are: 60, 75, 80, 65, 70. Calculate the Median score.

Answer:

Given:

Scores of 5 students: 60, 75, 80, 65, 70


To Find:

Median score


Solution:

Step 1: Arrange the data in ascending order:

60, 65, 70, 75, 80

Step 2: Find the median position:

Median position = $\frac{n+1}{2} = \frac{5+1}{2} = 3$

... (i)

Step 3: Identify the median score:

The 3rd value in the ordered list is 70.


Answer:

The median score is 70.

Question 36. Define the interquartile range. How is it related to the Quartile Deviation?

Answer:

Definition of Interquartile Range (IQR):

The interquartile range is the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$) of a dataset. It measures the spread of the middle 50% of the data.

Mathematically,

$\text{IQR} = Q_3 - Q_1$

... (i)


Relation to Quartile Deviation:

The Quartile Deviation (QD) is half of the interquartile range. It measures the average deviation of the quartiles from the median, providing a measure of dispersion.

Mathematically,

$\text{QD} = \frac{Q_3 - Q_1}{2} = \frac{\text{IQR}}{2}$

... (ii)


Summary:

The interquartile range measures the range between the first and third quartiles, while the quartile deviation is half of this range and represents the spread of the central half of the data.

Question 37. If $Q_1 = 15$ and $Q_3 = 35$, what is the interquartile range?

Answer:

Given:

$Q_1 = 15$, $Q_3 = 35$


To Find:

Interquartile Range (IQR)


Solution:

The interquartile range is calculated by the formula:

$\text{IQR} = Q_3 - Q_1$

... (i)

Substituting the given values,

$\text{IQR} = 35 - 15 = 20$

... (ii)


Answer:

The interquartile range is 20.

Question 38. What are the minimum and maximum possible values for the coefficient of correlation?

Answer:

Question: What are the minimum and maximum possible values for the coefficient of correlation?


Answer:

The coefficient of correlation, denoted by $r$, measures the strength and direction of a linear relationship between two variables.

Minimum value: $-1$ (indicating a perfect negative linear relationship)

Maximum value: $+1$ (indicating a perfect positive linear relationship)

If $r = 0$, it indicates no linear correlation between the variables.


Summary:

Coefficient of Correlation (r) Interpretation
$-1$Perfect negative correlation
0No linear correlation
$+1$Perfect positive correlation

Question 39. If the Mean Deviation about the Mean is 4 for a dataset with mean 20, find the Coefficient of Mean Deviation.

Answer:

Given:

Mean Deviation about the Mean (MD) = 4

Mean ($\bar{x}$) = 20


To Find: Coefficient of Mean Deviation


Solution:

The coefficient of mean deviation is calculated as:

$\text{Coefficient of Mean Deviation} = \frac{\text{Mean Deviation about Mean}}{\text{Mean}}$

... (i)

Substituting the given values:

$= \frac{4}{20}$

... (ii)

Therefore,

Coefficient of Mean Deviation = 0.2

Question 40. What does a large value of Standard Deviation indicate about a dataset?

Answer:

A large value of Standard Deviation indicates that the data points in the dataset are spread out widely from the mean. It shows high variability or dispersion within the data, meaning the values are more scattered and less consistent.


In contrast, a small standard deviation suggests that the data points are closely clustered around the mean, indicating low variability.

Question 41. Explain briefly, with a diagram, what a negatively skewed distribution looks like.

Answer:

Definition: A negatively skewed distribution is a type of distribution where the tail on the left side (lower values) is longer or fatter than the right side. It indicates that the majority of data values are concentrated on the right, with a few unusually low values stretching the distribution to the left.

Shape Characteristics:

  • The mean is less than the median.
  • The distribution has a long tail on the left side.
  • The mode is the highest point on the right side.

Rough Sketch Illustration:

Mode Median Mean

Summary: In a negatively skewed distribution, the mean < median < mode and the tail extends towards the lower values on the left side.

Question 42. Define the Coefficient of Kurtosis ($\beta_2$). What is its value for a normal distribution?

Answer:

Coefficient of Kurtosis ($\beta_2$):

The coefficient of kurtosis, denoted by β2, is a statistical measure that describes the "tailedness" or the peakedness of a probability distribution compared to a normal distribution. It is defined as the fourth standardized moment of the distribution:

\[ \beta_2 = \frac{\mu_4}{\sigma^4} \]

where:

  • \(\mu_4\) = fourth central moment (the average of \((x - \mu)^4\))
  • \(\sigma\) = standard deviation of the distribution

Interpretation: A higher value of \(\beta_2\) indicates a distribution with heavier tails and a sharper peak (leptokurtic), while a lower value indicates lighter tails and a flatter peak (platykurtic).

Value for Normal Distribution:

For a normal distribution, the coefficient of kurtosis \(\beta_2\) is exactly 3.

Question 43. In a class of 50 students, a student's score corresponds to the 70th percentile. How many students scored below this student?

Answer:

The 70th percentile means that the student scored better than 70% of the students in the class.

Number of students below this student = 70% of 50 = \(0.70 \times 50 = 35\)

Answer: 35 students scored below this student.

Question 44. What is the relationship between the Quartile Deviation and the Standard Deviation for a normal distribution?

Answer:

Quartile Deviation (QD) measures the spread of the middle 50% of the data and is calculated as:

\( \text{QD} = \frac{Q_3 - Q_1}{2} \)

Standard Deviation (SD) measures the overall spread of the data around the mean.

For a normal distribution, there is a fixed relationship between QD and SD:

  • The interquartile range \(IQR = Q_3 - Q_1\) covers approximately the middle 50% of the data.
  • For a normal distribution, the interquartile range corresponds to approximately 1.35 times the standard deviation.

Therefore, the Quartile Deviation is approximately:

\( \text{QD} = \frac{IQR}{2} \approx \frac{1.35 \times \text{SD}}{2} = 0.675 \times \text{SD} \)

In summary:

\( \text{QD} \approx 0.675 \times \text{SD} \) for a normal distribution.

Question 45. Can the coefficient of correlation be greater than 1? Justify your answer.

Answer:

Answer:

No, the coefficient of correlation cannot be greater than 1.

Justification:

  • The coefficient of correlation, usually denoted by \( r \), measures the strength and direction of a linear relationship between two variables.
  • By definition, its value ranges between \(-1\) and \(+1\).
  • A value of \(+1\) indicates a perfect positive linear relationship, while \(-1\) indicates a perfect negative linear relationship.
  • Values beyond this range are mathematically impossible because correlation is derived from standardized covariance, which is constrained by the Cauchy–Schwarz inequality.

Therefore, correlation values greater than 1 or less than -1 do not exist.

Question 46. Calculate the Standard Deviation for the data: 10, 10, 10, 10, 10.

Answer:

Answer:

Given data: 10, 10, 10, 10, 10

Step 1: Calculate the mean (\(\bar{x}\)):

\[ \bar{x} = \frac{10 + 10 + 10 + 10 + 10}{5} = \frac{50}{5} = 10 \]

Step 2: Calculate each deviation from the mean and square it:

\[ (10 - 10)^2 = 0, \quad (10 - 10)^2 = 0, \quad (10 - 10)^2 = 0, \quad (10 - 10)^2 = 0, \quad (10 - 10)^2 = 0 \]

Step 3: Find the variance (mean of squared deviations):

\[ \text{Variance} = \frac{0 + 0 + 0 + 0 + 0}{5} = 0 \]

Step 4: Calculate the standard deviation (square root of variance):

\[ \text{Standard Deviation} = \sqrt{0} = 0 \]

Conclusion: The standard deviation of the data is 0, indicating no variation in the dataset as all values are identical.

Question 47. If the Mean Deviation about the Median is 6 and the Median is 30, find the Coefficient of Mean Deviation.

Answer:

Given:

Mean Deviation about the Median = 6

Median = 30


To Find:

Coefficient of Mean Deviation


Solution:

The Coefficient of Mean Deviation about the median is given by the formula:

Coefficient of Mean Deviation = $\frac{\text{Mean Deviation about Median}}{\text{Median}} \times 100$

… (i)

Substituting the given values,

Coefficient of Mean Deviation = $\frac{6}{30} \times 100$

… (ii)

Calculating,

Coefficient of Mean Deviation = 20%

Question 48. What is the main advantage of using Standard Deviation over Mean Deviation?

Answer:

Standard Deviation is generally preferred over Mean Deviation because it takes into account the square of the deviations from the mean, rather than the absolute deviations. This leads to several important advantages:


1. Mathematical Rigor and Applicability:

Standard Deviation is algebraically more tractable and is compatible with further statistical operations like correlation, regression, and hypothesis testing. It is based on squared deviations, which allows for easier manipulation using calculus and algebraic formulas.


2. Sensitivity to Extreme Values:

Since Standard Deviation squares the deviations, it gives more weight to extreme values (outliers). This is helpful in understanding the spread of data more precisely, especially in datasets where variability is critical.


3. Theoretical Justification:

Standard Deviation is based on the concept of variance, which is a central concept in probability theory and statistical inference. It has strong theoretical foundations and plays a key role in the formulation of many statistical models.


4. Use in Inferential Statistics:

In advanced statistics, especially in inferential statistics, Standard Deviation is indispensable. It is used in confidence intervals, z-scores, and in defining the shape of a normal distribution (bell curve).


Conclusion:

Thus, the main advantage of using Standard Deviation over Mean Deviation lies in its mathematical properties and broader applicability in theoretical and applied statistics.

Question 49. If the sum of squares of deviations from the mean is 100 for 10 observations, find the Standard Deviation.

Answer:

Given:

Sum of squares of deviations from the mean = 100

Number of observations = 10


To Find:

Standard Deviation


Formula:

Standard Deviation, $σ = \sqrt{\dfrac{\sum (x_i - \bar{x})^2}{n}}$


Solution:

$\sum (x_i - \bar{x})^2 = 100$

(Given)

$n = 10$

(Given)

Using the formula:

$σ = \sqrt{\dfrac{100}{10}}$

… (i)

Simplifying:

$σ = \sqrt{10}$

… (ii)

∴ Standard Deviation = $\sqrt{10}$

Question 50. Define the term 'Coefficient of Variation'. When is it used?

Answer:

Definition:

The Coefficient of Variation (C.V.) is a statistical measure of the relative variability. It is defined as the ratio of the standard deviation to the mean, expressed as a percentage.

$\text{C.V.} = \dfrac{\sigma}{\bar{x}} \times 100$

… (i)


Where,

$\sigma$ = Standard Deviation

$\bar{x}$ = Mean


Usage:

The Coefficient of Variation is used when we want to compare the degree of variation between two or more data sets that have different units or widely different means. It helps determine which data set is more consistent or has less relative variability.


Conclusion:

Thus, C.V. is especially useful in comparing the consistency of two or more data series irrespective of their units or magnitude.

Question 51. For a positively skewed distribution, how does the Mean, Median, and Mode relate to each other?

Answer:

In a Positively Skewed Distribution:

The values of Mean, Median, and Mode are not equal and follow a particular order due to the longer tail on the right-hand side of the distribution.


Relationship:

$\text{Mean} > \text{Median} > \text{Mode}$

… (i)


Explanation:

In a positively skewed distribution, the presence of higher extreme values increases the value of the Mean, pulling it to the right. The Median lies between the Mean and Mode, and the Mode occurs at the peak or most frequent value of the distribution.


Conclusion:

Therefore, in a positively skewed distribution: Mean is the greatest, followed by Median, then Mode.

Question 52. What is a Causal Correlation? Give an example.

Answer:

Definition:

Causal Correlation refers to a type of correlation in which a change in one variable directly causes a change in the other variable.


Explanation:

In causal correlation, there is not just an association between variables, but a cause-and-effect relationship. That is, the variation in one variable is responsible for the variation in the other.


Example:

There is a causal correlation between the number of hours studied and the marks obtained in an exam. An increase in study hours directly leads to an increase in marks (to a certain extent).


Conclusion:

Thus, in causal correlation, the relation is not only statistical but also involves direct influence or causation between the variables.

Question 53. Calculate the Quartile Deviation for the data: 10, 15, 8, 20, 12, 25, 18.

Answer:

Given: The data set is: 10, 15, 8, 20, 12, 25, 18


Step 1: Arrange the data in ascending order:

8, 10, 12, 15, 18, 20, 25


Step 2: Find the quartiles.

Number of observations ($n$) = 7 (odd number)

Median (Q2) = Middle term = 4th term = 15

Q1 = Median of lower half (8, 10, 12)

Q1 = 10

Q3 = Median of upper half (18, 20, 25)

Q3 = 20


Step 3: Apply formula for Quartile Deviation

$\text{Quartile Deviation} = \frac{Q_3 - Q_1}{2}$

…(i)

Substitute the values:

$\text{Quartile Deviation} = \frac{20 - 10}{2} = \frac{10}{2} = 5$


Final Answer: The Quartile Deviation is 5.

Question 54. If the Mean Deviation about the Mean is 5 for a dataset, can you determine the Standard Deviation? If yes, how? If no, why not?

Answer:

Given: Mean Deviation about the Mean = 5


Explanation:

The Mean Deviation and the Standard Deviation are both measures of dispersion, but they are calculated differently.

Mean Deviation is the average of the absolute deviations from the mean, while Standard Deviation is based on the square of the deviations from the mean:

$\text{M.D. (about Mean)} = \frac{1}{n} \sum |x_i - \bar{x}|$

$\text{S.D.} = \sqrt{\frac{1}{n} \sum (x_i - \bar{x})^2}$


Conclusion:

No, it is not possible to determine the Standard Deviation from the Mean Deviation alone, because:

  • The two formulas involve different operations (absolute value vs. square).
  • Standard Deviation requires knowledge of the squared deviations of individual data points, which is not available from just the Mean Deviation.
  • There is no fixed ratio or formula that connects Mean Deviation and Standard Deviation for all datasets.

Final Answer: No, we cannot determine the Standard Deviation from the Mean Deviation alone, as both are calculated using different methods and require different data characteristics.

Question 55. Explain how outliers affect the Range compared to the Interquartile Range.

Answer:

Definition:

Range is the difference between the largest and smallest value in a dataset.

Interquartile Range (IQR) is the difference between the third quartile ($Q_3$) and the first quartile ($Q_1$), i.e., $IQR = Q_3 - Q_1$.


Effect of Outliers:

  • Range is highly sensitive to outliers.
  • This is because Range is calculated using the extreme values in the dataset. Even a single unusually high or low value (outlier) can drastically increase the Range.

  • Interquartile Range is resistant to outliers.
  • IQR focuses on the middle 50% of the data and ignores the smallest 25% and largest 25% of values. Therefore, extreme values (outliers) do not significantly affect it.


Example:

Consider two datasets:

Without outlier: 10, 12, 14, 15, 16, 17, 18

With outlier: 10, 12, 14, 15, 16, 17, 50

Range:

Without outlier: $18 - 10 = 8$

With outlier: $50 - 10 = 40$

IQR: Remains almost unchanged as outlier lies outside the middle 50%.


Conclusion: Outliers significantly affect the Range, but have minimal effect on the Interquartile Range. Hence, IQR is a more robust measure of dispersion when data includes extreme values.

Question 56. What is the minimum number of data points required to calculate Standard Deviation?

Answer:

Minimum Number of Data Points:

The minimum number of data points required to calculate Standard Deviation is two.


Explanation:

Standard Deviation measures the dispersion or spread of data points from the mean. If only one data point exists, then all the values are identical (since there's only one), and there's no variability to measure.

With two or more values, the deviations from the mean can be computed, which is essential in calculating the standard deviation using the formula:

$\sigma = \sqrt{\dfrac{\sum\limits_{i=1}^{n} (x_i - \bar{x})^2}{n}}$

… (for population data)

$s = \sqrt{\dfrac{\sum\limits_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$

… (for sample data)


Conclusion: A minimum of 2 data points is needed to compute standard deviation meaningfully.



Long Answer Type Questions

Question 1. Calculate the Mean Deviation about the Mean for the following data:

Marks Number of Students
0-10 5
10-20 8
20-30 15
30-40 16
40-50 6

Answer:

Step 1: Construct the table with midpoints and deviations

Class Interval Frequency (f) Midpoint (x) fx
0 - 105525
10 - 20815120
20 - 301525375
30 - 401635560
40 - 50645270
Total501350

Step 2: Calculate the Mean ($\bar{x}$)

$\bar{x} = \dfrac{\sum fx}{\sum f} = \dfrac{1350}{50}$

= 27


Step 3: Calculate $|x - \bar{x}|$ and $f|x - \bar{x}|$

Class Interval Frequency (f) Midpoint (x) $|x - \bar{x}|$ $f|x - \bar{x}|$
0 - 105522110
10 - 208151296
20 - 301525230
30 - 4016358128
40 - 5064518108
Total50472

Step 4: Calculate Mean Deviation about Mean

Mean Deviation = $\dfrac{\sum f|x - \bar{x}|}{\sum f} = \dfrac{472}{50}$

= 9.44


Final Answer: Mean Deviation about Mean = 9.44

Question 2. Calculate the Standard Deviation and Coefficient of Variation for the following data: 10, 12, 15, 18, 20, 22, 25, 28, 30, 32.

Answer:

Step 1: Organize the Data

Given data (x): 10, 12, 15, 18, 20, 22, 25, 28, 30, 32


Step 2: Calculate Mean ($\bar{x}$)

$\bar{x} = \dfrac{\sum x}{n} = \dfrac{10 + 12 + 15 + 18 + 20 + 22 + 25 + 28 + 30 + 32}{10}$

= $\dfrac{212}{10} = 21.2$


Step 3: Calculate $(x - \bar{x})^2$ for each value

x $x - \bar{x}$ $(x - \bar{x})^2$
10-11.2125.44
12-9.284.64
15-6.238.44
18-3.210.24
20-1.21.44
220.80.64
253.814.44
286.846.24
308.877.44
3210.8116.64
Total515.6

Step 4: Calculate Standard Deviation ($\sigma$)

$\sigma = \sqrt{\dfrac{\sum (x - \bar{x})^2}{n}} = \sqrt{\dfrac{515.6}{10}}$

= $\sqrt{51.56} \approx 7.18$


Step 5: Calculate Coefficient of Variation (CV)

CV = $\dfrac{\sigma}{\bar{x}} \times 100 = \dfrac{7.18}{21.2} \times 100$

≈ 33.87%


Final Answers:

Standard Deviation = 7.18

Coefficient of Variation = 33.87%

Question 3. Calculate the Standard Deviation for the following frequency distribution:

Class Interval Frequency
10-20 6
20-30 10
30-40 15
40-50 10
50-60 9

Answer:

Step 1: Create the working table

Class Interval Frequency (f) Midpoint (x) fx f·x²
10–20615902251350
20–3010252506256250
30–401535525122518375
40–501045450202520250
50–60955495302527225
Total50181073500

Step 2: Calculate Mean ($\bar{x}$)

$\bar{x} = \dfrac{\sum fx}{\sum f} = \dfrac{1810}{50} = 36.2$


Step 3: Calculate Standard Deviation (σ)

Formula: $\sigma = \sqrt{ \dfrac{\sum fx^2}{\sum f} - \left( \dfrac{\sum fx}{\sum f} \right)^2 }$

$= \sqrt{ \dfrac{73500}{50} - (36.2)^2 } = \sqrt{1470 - 1310.44} = \sqrt{159.56} \approx 12.63$


Final Answer: Standard Deviation = 12.63

Question 4. For a distribution, the first quartile is 30, the median is 45, and the third quartile is 70. Calculate the Bowley's Coefficient of Skewness and comment on the nature of skewness.

Answer:

Given:

  • Q1 = 30
  • Q2 (Median) = 45
  • Q3 = 70

Step 1: Apply Bowley's Coefficient of Skewness Formula

Bowley’s Skewness = $\dfrac{(Q_3 + Q_1 - 2Q_2)}{(Q_3 - Q_1)}$

= $\dfrac{(70 + 30 - 2 \times 45)}{70 - 30} = \dfrac{(100 - 90)}{40} = \dfrac{10}{40} = 0.25$


Step 2: Interpretation

Since Bowley’s Coefficient of Skewness is positive (0.25), the distribution is positively skewed. This indicates that the right tail is longer, and a larger number of observations lie on the lower side of the distribution.


Final Answer:

Bowley's Coefficient of Skewness = 0.25

Nature of Skewness: Positively Skewed

Question 5. Calculate the Karl Pearson's Coefficient of Skewness for the following data: 15, 20, 25, 25, 30, 35, 40, 40, 40, 45, 50. (First calculate Mean, Median/Mode, and Standard Deviation).

Answer:

Step 1: Arrange Data and Find Required Statistics

Ordered Data: 15, 20, 25, 25, 30, 35, 40, 40, 40, 45, 50


Mean ($\bar{x}$):

Sum = 15 + 20 + 25 + 25 + 30 + 35 + 40 + 40 + 40 + 45 + 50 = 405

Number of observations = 11

$\bar{x} = \dfrac{405}{11} \approx 36.82$


Median (Q2):

Since $n = 11$ (odd), the median is the 6th value: 35


Mode:

Mode = Most frequent value = 40


Step 2: Calculate Standard Deviation (σ)

x x - $\bar{x}$ $(x - \bar{x})^2$
15-21.82476.91
20-16.82282.91
25-11.82139.64
25-11.82139.64
30-6.8246.51
35-1.823.31
403.1810.11
403.1810.11
403.1810.11
458.1866.91
5013.18173.91
Total1359.17

$\sigma = \sqrt{ \dfrac{\sum (x - \bar{x})^2}{n} } = \sqrt{\dfrac{1359.17}{11}} \approx \sqrt{123.56} \approx 11.11$


Step 3: Apply Karl Pearson’s Coefficient of Skewness Formula

Karl Pearson’s Skewness (using Mode): $Sk = \dfrac{\bar{x} - \text{Mode}}{\sigma}$

$Sk = \dfrac{36.82 - 40}{11.11} = \dfrac{-3.18}{11.11} \approx -0.29$


Final Answer:

Karl Pearson’s Coefficient of Skewness = -0.29

Since it is negative, the distribution is negatively skewed.

Question 6. The marks of 10 students in Mathematics (X) and Statistics (Y) are given below:

X52634536726547255969
Y48554030605843205060
Calculate the Spearman's Rank Correlation Coefficient between the marks.

Answer:

Step 1: Assign Ranks to X and Y

XRank (X) YRank (Y) d = RX - RY
52548500
633554-11
45740700
36830800
721601.5-0.50.25
652583-11
47643600
2510201000
59450400
691.5601.500
Σd²2.25

Note: Tied ranks have been averaged. For example, two highest X values (72 and 69) get ranks 1 and 2, so both get rank = $\dfrac{1+2}{2} = 1.5$


Step 2: Apply Spearman's Rank Correlation Coefficient Formula

$r_s = 1 - \dfrac{6\sum d^2}{n(n^2 - 1)}$

$r_s = 1 - \dfrac{6 \times 2.25}{10(10^2 - 1)} = 1 - \dfrac{13.5}{990} = 1 - 0.0136$

$r_s \approx 0.986$


Final Answer:

The Spearman's Rank Correlation Coefficient is 0.986, indicating a very strong positive correlation.

Question 7. Calculate the Karl Pearson's Coefficient of Correlation between the advertisement cost (in $\textsf{₹}$ thousands) and sales (in $\textsf{₹}$ lakhs) from the following data:

Advertisement Cost (X)1012151820
Sales (Y)5055606570
Interpret the result.

Answer:

Step 1: Prepare the table with deviations from mean

XY$x = X - \bar{X}$$y = Y - \bar{Y}$ $xy$$x^2$$y^2$
1050-5-105025100
1255-3-515925
156000000
18653515925
20705105025100
Σ13068250

Step 2: Apply Karl Pearson's Correlation Coefficient formula

$r = \dfrac{\sum xy}{\sqrt{\sum x^2 \cdot \sum y^2}}$

$r = \dfrac{130}{\sqrt{68 \times 250}} = \dfrac{130}{\sqrt{17000}} = \dfrac{130}{130.38}$

$r \approx 0.997$


Interpretation:

The Karl Pearson’s Correlation Coefficient is 0.997, which is very close to +1. This indicates a very strong positive linear relationship between advertisement cost and sales. As the advertisement cost increases, the sales tend to increase proportionally.

Question 8. For the following distribution of weights:

Weight (kg) Number of Persons
40-50 10
50-60 15
60-70 25
70-80 10
80-90 5
Calculate the first quartile ($Q_1$), third quartile ($Q_3$), and the 80th percentile ($P_{80}$).

Answer:

Step 1: Prepare cumulative frequency table

Class Frequency (f) Cumulative Frequency (CF)
40-501010
50-601525
60-702550
70-801060
80-90565

Total frequency (N) = 65


Step 2: Locate $Q_1$ (First Quartile)

$Q_1 = \dfrac{N}{4} = \dfrac{65}{4} = 16.25$

16.25 lies in the class 50–60

$Q_1 = L + \dfrac{\frac{N}{4} - F}{f} \cdot h = 50 + \dfrac{16.25 - 10}{15} \cdot 10 = 50 + \dfrac{6.25}{15} \cdot 10 = 50 + 4.17 = 54.17$


Step 3: Locate $Q_3$ (Third Quartile)

$Q_3 = \dfrac{3N}{4} = \dfrac{3 \times 65}{4} = 48.75$

48.75 lies in the class 60–70

$Q_3 = L + \dfrac{\frac{3N}{4} - F}{f} \cdot h = 60 + \dfrac{48.75 - 25}{25} \cdot 10 = 60 + \dfrac{23.75}{25} \cdot 10 = 60 + 9.5 = 69.5$


Step 4: Locate $P_{80}$ (80th Percentile)

$P_{80} = \dfrac{80N}{100} = \dfrac{80 \times 65}{100} = 52$

52 lies in the class 70–80

$P_{80} = L + \dfrac{0.80N - F}{f} \cdot h = 70 + \dfrac{52 - 50}{10} \cdot 10 = 70 + \dfrac{2}{10} \cdot 10 = 70 + 2 = 72$


Final Answers:

  • First Quartile ($Q_1$) = 54.17 kg
  • Third Quartile ($Q_3$) = 69.5 kg
  • 80th Percentile ($P_{80}$) = 72 kg

Question 9. Compare the variability of two datasets using Coefficient of Variation. Dataset A has a mean of 150 and a standard deviation of 15. Dataset B has a mean of 200 and a standard deviation of 25. Which dataset is more consistent?

Answer:

Step 1: Use the formula for Coefficient of Variation (CV)

$CV = \dfrac{\text{Standard Deviation}}{\text{Mean}} \times 100$


For Dataset A:

Mean = 150, Standard Deviation = 15

$CV_A = \dfrac{15}{150} \times 100 = 10\%$

For Dataset B:

Mean = 200, Standard Deviation = 25

$CV_B = \dfrac{25}{200} \times 100 = 12.5\%$


Conclusion:

Since Dataset A has a lower coefficient of variation (10%) than Dataset B (12.5%), Dataset A is more consistent.

Question 10. Discuss the properties of a good measure of dispersion. Evaluate Range, Quartile Deviation, Mean Deviation, and Standard Deviation against these properties.

Answer:

Properties of a Good Measure of Dispersion:

  • It should be based on all observations.
  • It should be rigidly defined.
  • It should be easy to understand and compute.
  • It should be amenable to further algebraic treatment.
  • It should be less affected by extreme values (outliers).

Evaluation of Different Measures:

1. Range

  • Only depends on the extreme values (maximum and minimum).
  • Easy to compute and understand.
  • Not based on all observations.
  • Highly affected by outliers.
  • Not suitable for further mathematical treatment.

2. Quartile Deviation (Q.D.)

  • Based on middle 50% of data (interquartile range).
  • More resistant to outliers than range.
  • Not based on all observations.
  • Simple to calculate and interpret.
  • Limited use in algebraic treatment.

3. Mean Deviation (M.D.)

  • Based on all values (absolute deviations from mean or median).
  • Less affected by outliers compared to standard deviation.
  • Relatively easy to compute.
  • Not suitable for algebraic treatment due to absolute values.

4. Standard Deviation (S.D.)

  • Based on all observations.
  • Rigorously defined and widely accepted.
  • Suitable for further statistical and algebraic analysis.
  • However, it is sensitive to extreme values.

Conclusion:

Among the given measures, Standard Deviation is considered the best overall measure of dispersion due to its mathematical robustness and basis on all data points, although it is sensitive to outliers. Other measures like Quartile Deviation and Mean Deviation are useful in contexts requiring resistance to extreme values.

Question 11. Explain the concepts of skewness and kurtosis with suitable diagrams. Describe the different types of distributions based on these measures.

Answer:

Concept of Skewness:

Skewness refers to the degree of asymmetry in the distribution of data. A perfectly symmetrical distribution has zero skewness.

Types of Skewness:

  • Symmetrical Distribution: Mean = Median = Mode
  • Positively Skewed: Tail on the right side is longer. Mean > Median > Mode
  • Negatively Skewed: Tail on the left side is longer. Mean < Median < Mode

Diagrams Representing Skewness:

Type Diagram
Symmetrical Symmetrical
Positively Skewed Positively Skewed
Negatively Skewed Negatively Skewed

Concept of Kurtosis:

Kurtosis refers to the degree of peakedness or flatness of a distribution curve relative to the normal distribution.

Types of Kurtosis:

  • Mesokurtic: Normal distribution with moderate peak — benchmark for comparison.
  • Leptokurtic: More peaked than normal distribution — heavy tails.
  • Platykurtic: Flatter than normal distribution — light tails.

Diagrams Representing Kurtosis:

Type Diagram
Mesokurtic Mesokurtic
Leptokurtic Leptokurtic
Platykurtic Platykurtic

Conclusion:

Both skewness and kurtosis are important to understand the shape and nature of the data distribution. Skewness measures asymmetry while kurtosis measures the height and tails of the distribution curve. Together, they help in assessing normality and making informed statistical decisions.

Question 12. Calculate the Mean Deviation about the Median for the following data:

Marks Number of Students
0-10 4
10-20 6
20-30 10
30-40 8
40-50 2

Answer:

Step 1: Prepare the frequency table with midpoints and cumulative frequency

Class Interval f Midpoint (x) Cumulative Frequency (C.F.)
0–10454
10–2061510
20–30102520
30–4083528
40–5024530

Total frequency (N) = 30

Step 2: Find the Median Class

N = 30 ⇒ N/2 = 15

The 15th value lies in the 20–30 class (C.F. just before is 10).

Median formula:

Median = \( L + \left( \dfrac{\frac{N}{2} - CF}{f} \right) \times h \)

  • L = 20
  • CF = 10
  • f = 10
  • h = 10

Median = \( 20 + \left( \dfrac{15 - 10}{10} \right) \times 10 = 20 + 5 = 25 \)

Step 3: Calculate Mean Deviation about the Median

Class Interval f x (Midpoint) |x − Median| f × |x − Median|
0–1045|5 − 25| = 2080
10–206151060
20–30102500
30–408351080
40–502452040

∑f × |x − Median| = 80 + 60 + 0 + 80 + 40 = 260

Mean Deviation (M.D.) about Median = \( \dfrac{∑f|x−Median|}{∑f} = \dfrac{260}{30} = 8.67 \)

Final Answer: Mean Deviation about the Median = 8.67

Question 13. Calculate the Quartile Deviation and its Coefficient for the following data:

Class Interval Frequency
5-10 5
10-15 12
15-20 18
20-25 10
25-30 5

Answer:

Step 1: Prepare a cumulative frequency table

Class Interval Frequency (f) Cumulative Frequency (C.F.)
5–1055
10–151217
15–201835
20–251045
25–30550

Total frequency (N) = 50


Step 2: Find $Q_1$

$\frac{N}{4} = \frac{50}{4} = 12.5$

12.5 lies in the class 10–15

  • L = 10
  • CF = 5 (before 10–15)
  • f = 12
  • h = 5

$Q_1 = L + \left( \dfrac{\frac{N}{4} - CF}{f} \right) \times h$

$Q_1 = 10 + \left( \dfrac{12.5 - 5}{12} \right) \times 5$

$Q_1 = 10 + \left( \dfrac{7.5}{12} \right) \times 5 = 10 + 3.125 = 13.125$


Step 3: Find $Q_3$

$\frac{3N}{4} = \frac{3 \times 50}{4} = 37.5$

37.5 lies in the class 20–25

  • L = 20
  • CF = 35
  • f = 10
  • h = 5

$Q_3 = L + \left( \dfrac{\frac{3N}{4} - CF}{f} \right) \times h$

$Q_3 = 20 + \left( \dfrac{37.5 - 35}{10} \right) \times 5$

$Q_3 = 20 + \left( \dfrac{2.5}{10} \right) \times 5 = 20 + 1.25 = 21.25$


Step 4: Calculate Quartile Deviation (Q.D.)

$Q.D. = \dfrac{Q_3 - Q_1}{2} = \dfrac{21.25 - 13.125}{2} = \dfrac{8.125}{2} = 4.0625$


Step 5: Coefficient of Quartile Deviation

Coefficient = $\dfrac{Q_3 - Q_1}{Q_3 + Q_1} = \dfrac{21.25 - 13.125}{21.25 + 13.125} = \dfrac{8.125}{34.375} \approx 0.236$


Final Answers:

Quartile Deviation = 4.0625

Coefficient of Quartile Deviation ≈ 0.236

Question 14. Calculate the Karl Pearson's Coefficient of Correlation for the following data:

X606264666870
Y555860626567

Answer:

Step 1: Prepare the table for calculations

X Y X̄ = 65 Ȳ = 61.17 X − X̄ Y − Ȳ (X − X̄)² (Y − Ȳ)² (X − X̄)(Y − Ȳ)
6055-5-6.172538.0730.85
6258-3-3.17910.059.51
6460-1-1.1711.371.17
666210.8310.690.83
686533.83914.6611.49
706755.832533.9929.15

Step 2: Calculate totals

  • Σ(X − X̄)² = 70
  • Σ(Y − Ȳ)² = 98.83
  • Σ(X − X̄)(Y − Ȳ) = 82.99

Step 3: Apply Karl Pearson’s formula

\( r = \dfrac{ \sum (X - \bar{X})(Y - \bar{Y}) }{ \sqrt{ \sum (X - \bar{X})^2 \cdot \sum (Y - \bar{Y})^2 } } \)

\( r = \dfrac{82.99}{\sqrt{70 \times 98.83}} = \dfrac{82.99}{\sqrt{6918.1}} = \dfrac{82.99}{83.17} \approx 0.998 \)

Final Answer: Karl Pearson’s Coefficient of Correlation \( r \approx 0.998 \)

Question 15. The prices (in $\textsf{₹}$) of shares of Company A and Company B over 7 trading days are given below:

Day1234567
Price A105110108112115111114
Price B210220215225230222228
Calculate the Spearman's Rank Correlation Coefficient between the prices of the two companies' shares and comment on the relationship.

Answer:

To Find: Spearman’s Rank Correlation Coefficient ($r_s$) between prices of Company A and Company B.


Step 1: Assign ranks to the data for both Company A and Company B

Day Price A ($x$) Rank A ($R_x$) Price B ($y$) Rank B ($R_y$) $d = R_x - R_y$ $d^2$
11057210700
21105220500
31086215600
41124225400
51151230100
61113222300
71142228200

Step 2: Use the formula for Spearman’s Rank Correlation Coefficient

$r_s = 1 - \dfrac{6 \sum d^2}{n(n^2 - 1)}$

…(i)

Here, $\sum d^2 = 0$ and $n = 7$

$r_s = 1 - \dfrac{6 \times 0}{7(49 - 1)}$

[Substituting values in (i)]

$r_s = 1 - 0 = 1$


Conclusion:

The Spearman’s Rank Correlation Coefficient is +1, which indicates a perfect positive correlation between the share prices of Company A and Company B. This means as the share price of Company A increases, the share price of Company B also increases in the same ranking order.

Question 16. In a survey of 100 households, the weekly expenditure on groceries (in $\textsf{₹}$) is given below:

Expenditure Number of Households
1000-2000 15
2000-3000 25
3000-4000 30
4000-5000 20
5000-6000 10
Calculate the 25th percentile, 75th percentile, and 90th percentile of the expenditure.

Answer:

To Find: $P_{25}$, $P_{75}$ and $P_{90}$


Step 1: Construct the cumulative frequency table

Class Interval Frequency (f) Cumulative Frequency (CF)
1000 - 20001515
2000 - 30002540
3000 - 40003070
4000 - 50002090
5000 - 600010100

Total number of households ($n$) = 100


Step 2: Use the percentile formula:

$P_k = L + \dfrac{\left(\dfrac{kN}{100} - F\right)}{f} \times h$

…(i)

Where:

  • $L$ = lower boundary of the class containing $P_k$
  • $N$ = total frequency
  • $F$ = cumulative frequency before the class
  • $f$ = frequency of the class
  • $h$ = class width

Step 3: Calculate $P_{25}$

$\dfrac{25 \times 100}{100} = 25$ → 25th value lies in the class 2000 - 3000

Here, $L = 2000$, $F = 15$, $f = 25$, $h = 1000$

$P_{25} = 2000 + \dfrac{25 - 15}{25} \times 1000$

[Using formula (i)]

$P_{25} = 2000 + \dfrac{10}{25} \times 1000 = 2000 + 400 = 2400$


Step 4: Calculate $P_{75}$

$\dfrac{75 \times 100}{100} = 75$ → 75th value lies in the class 3000 - 4000

Here, $L = 3000$, $F = 40$, $f = 30$, $h = 1000$

$P_{75} = 3000 + \dfrac{75 - 40}{30} \times 1000$

[Using formula (i)]

$P_{75} = 3000 + \dfrac{35}{30} \times 1000 = 3000 + 1166.67 = 4166.67$


Step 5: Calculate $P_{90}$

$\dfrac{90 \times 100}{100} = 90$ → 90th value lies in the class 4000 - 5000

Here, $L = 4000$, $F = 70$, $f = 20$, $h = 1000$

$P_{90} = 4000 + \dfrac{90 - 70}{20} \times 1000$

[Using formula (i)]

$P_{90} = 4000 + \dfrac{20}{20} \times 1000 = 4000 + 1000 = 5000$


Final Answers:

  • 25th Percentile ($P_{25}$) = $\textsf{₹}$2400
  • 75th Percentile ($P_{75}$) = $\textsf{₹}$4166.67
  • 90th Percentile ($P_{90}$) = $\textsf{₹}$5000

Question 17. Distinguish between absolute and relative measures of dispersion. Provide examples for each and explain when relative measures are particularly useful.

Answer:

Dispersion refers to the extent to which values in a data set vary around the average. It can be measured in two ways: absolute and relative.


1. Absolute Measures of Dispersion:

  • These express the variation in the same units as the original data.
  • They provide a direct measure of the amount of dispersion.
  • Examples:
    • Range
    • Quartile Deviation
    • Mean Deviation
    • Standard Deviation
  • Use: Useful when comparing variability within the same dataset or datasets expressed in the same units.

Example: If the standard deviation of heights of students in a class is 5 cm, this is an absolute measure.


2. Relative Measures of Dispersion:

  • These express the variation as a ratio or percentage of a central value (e.g., mean or median).
  • They are unitless, allowing for comparison between datasets with different units or scales.
  • Examples:
    • Coefficient of Range
    • Coefficient of Quartile Deviation
    • Coefficient of Mean Deviation
    • Coefficient of Variation (CV)
  • Use: Useful for comparing variability across datasets with different units or widely different means.

Example: If two firms have standard deviations of ₹10,000 and ₹20,000, and means of ₹50,000 and ₹1,00,000 respectively, comparing Coefficient of Variation helps determine which firm has greater relative variability.


Conclusion:

Absolute measures are helpful for understanding the scale of variation, while relative measures are crucial when comparing the consistency or variability between datasets of different units or magnitudes.

Question 18. Calculate the Standard Deviation and Mean Deviation about the Mean for the following series:

x f
5 8
10 12
15 15
20 10
25 5

Answer:

To Find: Mean Deviation about the Mean and Standard Deviation


Step 1: Calculate the Mean ($\bar{x}$)

$x$ $f$ $fx$
5840
1012120
1515225
2010200
255125
Total50710

$\bar{x} = \dfrac{\sum fx}{\sum f} = \dfrac{710}{50} = 14.2$


Step 2: Calculate Mean Deviation about the Mean

$x$ $f$ $|x - \bar{x}|$ $f|x - \bar{x}|$
589.273.6
10124.250.4
15150.812.0
20105.858.0
25510.854.0
$\sum f|x - \bar{x}|$248.0

Mean Deviation about Mean = $\dfrac{\sum f|x - \bar{x}|}{\sum f} = \dfrac{248}{50} = 4.96$


Step 3: Calculate Standard Deviation (σ)

$x$ $f$ $(x - \bar{x})$ $(x - \bar{x})^2$ $f(x - \bar{x})^2$
58-9.284.64677.12
1012-4.217.64211.68
15150.80.649.60
20105.833.64336.40
25510.8116.64583.20
$\sum f(x - \bar{x})^2$1818.00

Standard Deviation = $\sigma = \sqrt{\dfrac{\sum f(x - \bar{x})^2}{\sum f}} = \sqrt{\dfrac{1818}{50}} = \sqrt{36.36} ≈ 6.03$


Final Answers:

  • Mean Deviation about Mean = 4.96
  • Standard Deviation = 6.03

Question 19. Explain the properties of the Standard Deviation. Discuss its advantages over other measures of dispersion.

Answer:

Standard Deviation (σ) is a widely used absolute measure of dispersion that quantifies the spread of data points around the mean. It is defined as the square root of the variance and is denoted by $\sigma$ for a population or $s$ for a sample.


Properties of Standard Deviation:

  • Non-negative: Standard deviation is always $\geq 0$ since it is the square root of the average squared deviation from the mean.
  • Minimum value: Standard deviation is zero if all observations are the same (i.e., there is no variability).
  • Affected by all values: It takes into account every observation in the dataset, making it more comprehensive.
  • Same unit as the data: Unlike variance, which is in squared units, standard deviation is expressed in the same unit as the original data.
  • Sensitive to outliers: A large deviation from the mean can increase the standard deviation significantly.
  • Mathematically tractable: It has well-defined mathematical properties and is useful in advanced statistical applications (e.g., in Normal Distribution).

Advantages of Standard Deviation over Other Measures of Dispersion:

  • Considers all observations: Unlike Range and Quartile Deviation, which use only a few data points, standard deviation includes all values in its calculation.
  • Mathematical Rigour: It is algebraically manipulable and forms the basis of many statistical formulas and tests.
  • Comparability: When used in its relative form (Coefficient of Variation), it facilitates comparison of variability across datasets with different units or scales.
  • Theoretical Importance: Standard deviation plays a key role in the study of probability distributions, especially the Normal Distribution where about 68%, 95%, and 99.7% of data lie within $1\sigma$, $2\sigma$, and $3\sigma$ of the mean respectively.

Conclusion:

Standard Deviation is a reliable and mathematically powerful measure of dispersion. It offers a more complete picture of variability compared to simpler measures like Range or Mean Deviation, and it is especially valuable in advanced statistical analysis.

Question 20. The runs scored by two batsmen A and B in 5 matches are given below:

Batsman A5065457080
Batsman B6070558065
Calculate the Coefficient of Variation for both batsmen and determine which batsman is more consistent.

Answer:

To Find: Coefficient of Variation (CV) for both Batsman A and B


Step 1: Calculate Mean and Standard Deviation of Batsman A

Runs (x) $(x - \bar{x})$ $(x - \bar{x})^2$
50-12144
6539
45-17289
70864
8018324
Total830

Mean of A, $\bar{x}_A = \dfrac{50 + 65 + 45 + 70 + 80}{5} = \dfrac{310}{5} = 62$

Standard Deviation of A, $\sigma_A = \sqrt{\dfrac{830}{5}} = \sqrt{166} ≈ 12.89$

Coefficient of Variation of A, $\text{CV}_A = \dfrac{\sigma_A}{\bar{x}_A} \times 100 = \dfrac{12.89}{62} \times 100 ≈ 20.79\%$


Step 2: Calculate Mean and Standard Deviation of Batsman B

Runs (x) $(x - \bar{x})$ $(x - \bar{x})^2$
60-416
70636
55-981
8016256
6511
Total390

Mean of B, $\bar{x}_B = \dfrac{60 + 70 + 55 + 80 + 65}{5} = \dfrac{330}{5} = 66$

Standard Deviation of B, $\sigma_B = \sqrt{\dfrac{390}{5}} = \sqrt{78} ≈ 8.83$

Coefficient of Variation of B, $\text{CV}_B = \dfrac{8.83}{66} \times 100 ≈ 13.38\%$


Conclusion:

Batsman B has a lower Coefficient of Variation ($13.38\%$) compared to Batsman A ($20.79\%$). Hence, Batsman B is more consistent.

Question 21. Define Karl Pearson's Coefficient of Correlation and explain its properties. What are the assumptions underlying its use?

Answer:

Definition:

Karl Pearson’s Coefficient of Correlation is a statistical measure that quantifies the degree and direction of the linear relationship between two variables. It is denoted by $r$ and lies between $-1$ and $+1$.

Formula: For variables $X$ and $Y$, the formula is:

$r = \dfrac{\sum (x - \bar{x})(y - \bar{y})}{\sqrt{\sum (x - \bar{x})^2 \sum (y - \bar{y})^2}}$

…(i)


Properties of Karl Pearson’s Coefficient:

  • Range: The value of $r$ lies between $-1$ and $+1$.
  • Unit-free measure: It is a pure number and does not depend on the units of measurement.
  • Direction:
    • $r > 0$ indicates positive correlation
    • $r < 0$ indicates negative correlation
    • $r = 0$ indicates no correlation
  • Perfect correlation:
    • $r = +1$ implies perfect positive correlation
    • $r = -1$ implies perfect negative correlation
  • Symmetry: $r_{xy} = r_{yx}$
  • Affected by extreme values: It is sensitive to outliers.

Assumptions Underlying Its Use:

  • The relationship between the variables is linear.
  • The variables are measured on at least interval or ratio scale.
  • No significant outliers in the data set.
  • Data should be normally distributed if used for inference or testing.
  • The observations are independent.

Conclusion:

Karl Pearson's Coefficient of Correlation is a foundational tool in statistics for assessing the strength and direction of linear relationships. While powerful, its interpretation relies on adherence to the underlying assumptions.

Question 22. Calculate the Standard Deviation for the following data using the step-deviation method:

Class Interval Frequency
100-110 5
110-120 12
120-130 15
130-140 10
140-150 8

Answer:

To Find: Standard Deviation using the Step-Deviation Method


Step 1: Prepare the table with assumed mean method

Class Interval Frequency (f) Midpoint (x) $d = \dfrac{x - A}{h}$ $fd$ $fd^2$
100 - 1105105-2-1020
110 - 12012115-1-1212
120 - 13015125000
130 - 1401013511010
140 - 150814521632
Total50474

Here, assumed mean $A = 125$, class width $h = 10$


Step 2: Apply formula for Standard Deviation

Standard Deviation, $\sigma = h \sqrt{ \dfrac{\sum fd^2}{\sum f} - \left( \dfrac{\sum fd}{\sum f} \right)^2 }$

$\sigma = 10 \sqrt{\dfrac{74}{50} - \left(\dfrac{4}{50}\right)^2}$

…(i)

$\Rightarrow \sigma = 10 \sqrt{1.48 - 0.0064}$

$\Rightarrow \sigma = 10 \sqrt{1.4736}$

$\Rightarrow \sigma ≈ 10 \times 1.2139 = 12.139$


Final Answer: Standard Deviation ≈ 12.14

Question 23. For a set of 10 observations, the mean is 15 and the standard deviation is 5. If each observation is increased by 2, what will be the new mean and standard deviation? If each observation is multiplied by 2, what will be the new mean and standard deviation?

Answer:

Given:

  • Number of observations = 10
  • Original Mean ($\bar{x}$) = 15
  • Original Standard Deviation ($\sigma$) = 5

Case I: When each observation is increased by 2

Effect on Mean:

When a constant is added to each observation, the mean increases by that constant.

New Mean = $15 + 2 = 17$

…(i)

Effect on Standard Deviation:

Adding a constant to each observation does not affect the standard deviation.

New Standard Deviation = $5$

…(ii)


Case II: When each observation is multiplied by 2

Effect on Mean:

When each observation is multiplied by a constant, the mean is also multiplied by that constant.

New Mean = $15 \times 2 = 30$

…(iii)

Effect on Standard Deviation:

Multiplying each observation by a constant also multiplies the standard deviation by the same constant.

New Standard Deviation = $5 \times 2 = 10$

…(iv)


Final Answers:

  • After adding 2: Mean = 17, Standard Deviation = 5
  • After multiplying by 2: Mean = 30, Standard Deviation = 10

Question 24. The sales (in $\textsf{₹}$ thousands) of a company and the advertising expenditure (in $\textsf{₹}$ hundreds) over 5 months are given:

Sales (X)1012151820
Advertising Exp (Y)23456
Calculate the Karl Pearson's Coefficient of Correlation and interpret the result.

Answer:

To Find: Karl Pearson’s Coefficient of Correlation ($r$)


Step 1: Calculate Mean of $X$ and $Y$

$\bar{X} = \dfrac{10 + 12 + 15 + 18 + 20}{5} = \dfrac{75}{5} = 15$

$\bar{Y} = \dfrac{2 + 3 + 4 + 5 + 6}{5} = \dfrac{20}{5} = 4$


Step 2: Prepare the table for $x = X - \bar{X}$, $y = Y - \bar{Y}$, $xy$, $x^2$, and $y^2$

$X$ $Y$ $x = X - \bar{X}$ $y = Y - \bar{Y}$ $xy$ $x^2$ $y^2$
102-5-210254
123-3-1391
15400000
18531391
2065210254
Total266810

Step 3: Apply Karl Pearson’s Formula

$r = \dfrac{\sum xy}{\sqrt{\sum x^2 \cdot \sum y^2}}$

$r = \dfrac{26}{\sqrt{68 \times 10}}$

…(i)

$r = \dfrac{26}{\sqrt{680}} = \dfrac{26}{26.0768} \approx 0.997$


Final Answer: Karl Pearson’s Coefficient of Correlation $r \approx \mathbf{0.997}$

Interpretation: Since the value of $r$ is very close to 1, there is a very strong positive correlation between sales and advertising expenditure. This means that as advertising expenditure increases, sales also increase.

Question 25. A survey was conducted to find the relationship between hours of study (X) and marks obtained (Y) by 8 students.

Hours of Study (X)35286475
Marks (Y)6075509080658570
Calculate the Spearman's Rank Correlation Coefficient.

Answer:

To Find: Spearman’s Rank Correlation Coefficient ($r_s$)


Step 1: Assign Ranks to X and Y (Highest = Rank 1)

$X$ (Hours) $Y$ (Marks) Rank($X$) Rank($Y$) $d = R_X - R_Y$ $d^2$
3607700
5754.540.50.25
2508800
8901100
6803300
4656600
7852200
5704.55-0.50.25
$\sum d^2$0.5

Note: For tied ranks, average ranks are used. $X=5$ occurs twice $\Rightarrow$ rank = $(4 + 5)/2 = 4.5$


Step 2: Apply Spearman’s Rank Correlation Formula

$r_s = 1 - \dfrac{6 \sum d^2}{n(n^2 - 1)}$

$r_s = 1 - \dfrac{6 \times 0.5}{8(8^2 - 1)}$

…(i)

$r_s = 1 - \dfrac{3}{8 \times 63} = 1 - \dfrac{3}{504} = 1 - 0.00595$

$r_s \approx 0.994$


Final Answer: Spearman’s Rank Correlation Coefficient $r_s \approx \mathbf{0.994}$

Interpretation: There is a very strong positive correlation between hours of study and marks obtained by students.

Question 26. For the following data, calculate the Mean Deviation about the Median:

x1012151820
f571085

Answer:

To Find: Mean Deviation about the Median


Step 1: Prepare the cumulative frequency table

$x$ $f$ Cumulative Frequency ($cf$)
1055
12712
151022
18830
20535

Total frequency ($N$) = 35 ⇒ $\dfrac{N+1}{2} = \dfrac{36}{2} = 18$
The 18th item lies in the cumulative frequency group that ends at 22 → Median = 15


Step 2: Calculate Mean Deviation about the Median

We use the formula: $${\text{M.D.}} = \dfrac{\sum f|x - \text{Median}|}{\sum f}$$

$x$ $f$ $|x - 15|$ $f|x - 15|$
105525
127321
151000
188324
205525
Total3595

$\text{M.D.} = \dfrac{95}{35} \approx 2.714$


Final Answer: Mean Deviation about the Median = 2.714

Question 27. Explain different methods of measuring dispersion. Discuss the merits and demerits of each method.

Answer:

To Find: Explanation of different methods of measuring dispersion with their merits and demerits


Dispersion refers to the extent to which values in a data set vary from the average value. The main methods of measuring dispersion are as follows:


1. Range

Definition: The difference between the largest and the smallest observation in the data.

$\text{Range} = \text{Maximum value} - \text{Minimum value}$

Merits:

  • Simple to understand and calculate
  • Useful for quick comparisons

Demerits:

  • Depends only on extreme values
  • Does not consider all data values

2. Quartile Deviation (Semi-Interquartile Range)

Definition: Half of the difference between the third and the first quartile.

$\text{Q.D.} = \dfrac{Q_3 - Q_1}{2}$

Merits:

  • Not affected by extreme values
  • Based on middle 50% of data

Demerits:

  • Ignores 50% of the data
  • Not suitable for further mathematical treatment

3. Mean Deviation (Average Deviation)

Definition: The average of the absolute deviations from mean, median or mode.

$\text{M.D.} = \dfrac{\sum |x - A|}{n}$, where $A$ = mean/median/mode

Merits:

  • Based on all values
  • Less affected by extreme values than standard deviation

Demerits:

  • Not commonly used in advanced statistics
  • Involves modulus, making computation slightly complex

4. Standard Deviation (SD)

Definition: The square root of the average of squared deviations from the mean.

$\sigma = \sqrt{\dfrac{\sum (x - \bar{x})^2}{n}}$

Merits:

  • Most reliable and accurate measure
  • Based on all observations
  • Suitable for algebraic and statistical analysis

Demerits:

  • Time-consuming to calculate
  • More sensitive to extreme values

5. Coefficient of Variation (CV)

Definition: It is the ratio of the standard deviation to the mean, expressed as a percentage.

$\text{CV} = \dfrac{\sigma}{\bar{x}} \times 100$

Merits:

  • Useful for comparing variation between two or more distributions
  • Unit-free measure

Demerits:

  • Depends on standard deviation and mean
  • Cannot be used when mean is zero

Question 28. The daily wages (in $\textsf{₹}$) of 50 workers in a factory are given below:

Wages Number of Workers
200-300 5
300-400 10
400-500 15
500-600 12
600-700 8
Calculate the Mean Deviation about the Mean and the Standard Deviation for this data.

Answer:

To Find: Mean Deviation about the Mean and Standard Deviation


Step 1: Compute midpoints and necessary columns

Class Frequency ($f$) Midpoint ($x$) $f \cdot x$
200–30052501250
300–400103503500
400–500154506750
500–600125506600
600–70086505200
Total5023,300

Step 2: Calculate Mean

$\bar{x} = \dfrac{\sum f \cdot x}{\sum f} = \dfrac{23300}{50} = 466$


Step 3: Compute Mean Deviation about Mean

$x$ $f$ $|x - \bar{x}|$ $f \cdot |x - \bar{x}|$
25052161080
350101161160
4501516240
55012841008
65081841472
Total504960

$\text{Mean Deviation about Mean} = \dfrac{4960}{50} = 99.2$


Step 4: Compute Standard Deviation

$x$ $f$ $(x - \bar{x})^2$ $f \cdot (x - \bar{x})^2$
250546656233280
3501013456134560
450152563840
55012705684672
650833856270848
Total50726200

$\sigma = \sqrt{\dfrac{726200}{50}} = \sqrt{14524} \approx 120.5$


Final Answers:

  • Mean Deviation about Mean = 99.2
  • Standard Deviation = 120.5