| Latest Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th) | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9th | 10th | 11th | 12th | ||||||||||||||||
Chapter 3 Organisation Of Data
Once data is collected, it exists in a highly disorganised form known as Raw Data, which is difficult to comprehend or analyse. This chapter explains the crucial process of organising this raw data to bring order and meaning to it. The primary goal of classification is to arrange data into groups or classes based on their common characteristics. This process saves time, facilitates comparison, and makes the data suitable for further statistical analysis. The chapter outlines four main types of classification: Chronological (by time), Spatial (by location), Qualitative (by attributes like gender or literacy), and Quantitative (by numerical characteristics like height or income).
For quantitative data, the most important method of organisation is the creation of a Frequency Distribution. This is a table that summarises data by showing the number of observations (frequency) that fall within different classes or intervals. The chapter defines key concepts such as class limits, class interval, and class mark, and explains the difference between Continuous and Discrete variables. It details the practical technique of using tally marks to construct a frequency distribution from raw data. While this process involves some loss of information about individual data points, the gain in clarity and comprehensibility is a significant advantage.
Introduction to Organisation of Data
After data is collected, the next crucial step is to organise it. The purpose of classifying raw data is to bring order to it, making it easier to handle and subjecting it to further statistical analysis. Think of a local junk dealer (kabadiwallah) who sorts his collected items—newspapers, glass bottles, plastics, and various metals—into different groups. This classification helps him manage his trade efficiently and quickly find an item a buyer needs.
Similarly, when you arrange your schoolbooks by subject, you can easily find the one you are looking for. Classification is the process of arranging or organising things into groups or classes based on some shared criteria. This saves valuable time and effort.
What is Raw Data?
Unclassified data is called raw data. Like a disorganised pile of junk, raw data is often large, cumbersome, and highly disorganised. Drawing meaningful conclusions from it is a tedious task because it doesn't easily yield to statistical methods. Therefore, raw data must be properly organised and presented in a classified form before any systematic analysis can be undertaken.
For example, a list of mathematics marks for 100 students is raw data. In its original form, it is difficult to determine the highest mark or the overall performance of the class. If the number of students were 1,000, the task would be even more daunting.
The process of classification summarises the raw data and makes it comprehensible. When facts with similar characteristics are placed in the same class, it enables one to:
- Locate information easily.
- Make comparisons.
- Draw inferences without difficulty.
The raw data from a massive exercise like the Census of India, which contacts crores of people, is unmanageable until it is classified by characteristics like gender, education, marital status, and occupation. Only then does the structure of the population become understandable.
Types of Classification
Raw data can be classified in various ways, depending on the purpose of the study. The method of classification is determined by the requirement of the analysis.
1. Chronological Classification
In this type of classification, data is grouped according to time. The data is arranged either in ascending or descending order with reference to time periods such as years, quarters, months, or weeks. Data classified this way is known as a Time Series.
Example 1. Population of India (in crores)
| Year | Population (Crores) |
|---|---|
| 1951 | 35.7 |
| 1961 | 43.8 |
| 1971 | 54.6 |
| 1981 | 68.4 |
| 1991 | 81.8 |
| 2001 | 102.7 |
| 2011 | 121.0 |
2. Spatial Classification
In spatial classification, data is classified with reference to geographical locations such as countries, states, cities, or districts.
Example 2. Yield of Wheat for Different Countries (2013)
| Country | Yield of wheat (kg/hectare) |
| Canada | 3594 |
| China | 5055 |
| France | 7254 |
| Germany | 7998 |
| India | 3154 |
| Pakistan | 2787 |
3. Qualitative Classification
This classification is used for characteristics that cannot be expressed quantitatively. Such characteristics are called qualities or attributes. Examples include nationality, literacy, religion, gender, and marital status. The classification is done based on the presence or absence of a particular attribute.
Example 3. Classification of Population by Gender and Marital Status
4. Quantitative Classification
This classification is used for characteristics that are quantitative in nature, such as height, weight, age, income, and marks of students. When the collected data of such characteristics are grouped into classes, it is known as a quantitative classification. This is often represented as a frequency distribution.
Variables: Continuous and Discrete
A variable is a characteristic that can take on different values. Variables are broadly classified into two types:
1. Continuous Variable
A continuous variable can take any numerical value within a given range. This includes integral values (1, 2, 3), fractional values (1/2, 2/3), and irrational values ($\sqrt{2}$, $\sqrt{3}$). The values of a continuous variable can be broken down into infinite gradations.
Examples: Height, weight, time, and distance. A student's height can be 150 cm, 150.5 cm, or 150.51 cm as they grow.
2. Discrete Variable
A discrete variable can take only certain values, and its value changes in finite "jumps." It does not take any intermediate value between two adjacent values.
Examples: The number of students in a class. It can be 25 or 26, but it cannot be 25.5. Another example is the number appearing on a dice (1, 2, 3, 4, 5, 6).
What is a Frequency Distribution?
A frequency distribution is a comprehensive way to classify the raw data of a quantitative variable. It shows how different values of a variable are distributed in different classes, along with their corresponding frequencies.
Key Terms in a Frequency Distribution
- Class: A group into which the raw data is condensed (e.g., marks between 0–10).
- Class Frequency: The number of observations that fall in a particular class.
- Class Limits: The two ends of a class. The lowest value is the Lower Class Limit, and the highest value is the Upper Class Limit.
- Class Interval (or Class Width): The difference between the upper class limit and the lower class limit.
- Class Mid-Point (or Class Mark): The middle value of a class, calculated as:
$ \text{Class Mark} = \frac{\text{Upper Class Limit} + \text{Lower Class Limit}}{2} $
The class mark is used to represent the class in further statistical calculations.
Example 4. Frequency Distribution of Marks in Mathematics of 100 Students
| Marks (Class) | Frequency |
|---|---|
| 0–10 | 1 |
| 10–20 | 8 |
| 20–30 | 6 |
| 30–40 | 7 |
| 40–50 | 21 |
| 50–60 | 23 |
| 60–70 | 19 |
| 70–80 | 6 |
| 80–90 | 5 |
| 90–100 | 4 |
Frequency Array
When classifying the data of a discrete variable, the resulting table is known as a Frequency Array. Since a discrete variable takes only specific integral values, the frequency is shown corresponding to each of those values, rather than a class interval.
Table 3.8. Frequency Array of the Size of Households
| Size of Household | Number of Households (Frequency) |
|---|---|
| 1 | 5 |
| 2 | 15 |
| 3 | 25 |
| 4 | 35 |
| 5 | 10 |
| 6 | 5 |
| 7 | 3 |
| 8 | 2 |
How to Prepare a Frequency Distribution
Preparing a frequency distribution involves several key decisions to ensure the data is summarised effectively.
Determining Class Intervals and Limits
- Number of Classes: The number of classes is typically between 6 and 15. It can be determined by dividing the range of the data (Largest Value - Smallest Value) by the desired size of the class interval.
- Size of Class Intervals: You can choose equal or unequal class intervals. Unequal intervals are useful when the data range is very high or when values are concentrated in a small part of the range. In most other cases, equal-sized intervals are used.
-
Determining Class Limits: Class limits should be clear and definite. There are two types:
- Inclusive Class Intervals: In this method, values equal to both the lower and upper limits are included in the frequency of that class (e.g., 0-10, 11-20). This is suitable for discrete variables.
- Exclusive Class Intervals: In this method, an observation equal to the upper class limit is excluded from that class and included in the next class (e.g., 0-10, 10-20). In this case, a value of 10 would be included in the 10-20 class, not the 0-10 class. This method is often used for continuous variables.
Finding Class Frequency using Tally Marking
The frequency for each class is determined by counting the number of observations that fall within its limits. This is commonly done using the tally marking method.
- Go through the raw data one observation at a time.
- For each observation, place a tally mark (/) against the class in which it falls.
- To make counting easier, tallies are grouped in fives. Four marks are drawn vertically (////), and the fifth is drawn diagonally across them (
////). - After tallying all observations, the total number of tallies for each class gives its frequency.
Table 3.6. Tally Marking of Marks of 100 Students in Mathematics
| Class | Tally Marks | Frequency |
|---|---|---|
| 0–10 | / | 1 |
| 10–20 | 8 | |
| 20–30 | 6 | |
| 30–40 | 7 | |
| 40–50 | 21 | |
| ...and so on | ... | ... |
Loss of Information
A significant shortcoming of classifying data into a frequency distribution is the loss of information. While the summary becomes concise and comprehensible, the details of the individual observations within each class are lost. For further calculations, all values in a class are assumed to be equal to the class mark. This is an approximation, but the clarity and ease of analysis gained by summarising the data often outweigh this loss.
Bivariate Frequency Distribution
Often, when we collect data, we gather information on more than one variable from each element of the sample. For example, from a sample of 20 companies, we might collect data on both their annual sales and their expenditure on advertisement. This is known as bivariate data.
A Bivariate Frequency Distribution is a way to summarise bivariate data. It is defined as the frequency distribution of two variables. It is presented in a two-way table where the values of one variable are classed in rows and the values of the other variable are classed in columns.
Table 3.9. Bivariate Frequency Distribution of Sales and Advertisement Expenditure of 20 Firms
| Advertisement Expenditure (in Thousand ₹) | Sales (in Lakh ₹) | Total | |||||
|---|---|---|---|---|---|---|---|
| 115–125 | 125–135 | 135–145 | 145–155 | 155–165 | 165–175 | ||
| 62–64 | 2 | 1 | 3 | ||||
| 64–66 | 1 | 3 | 4 | ||||
| 66–68 | 1 | 1 | 2 | 1 | 5 | ||
| 68–70 | 2 | 2 | 4 | ||||
| 70–72 | 1 | 1 | 1 | 1 | 4 | ||
| Total | 4 | 5 | 6 | 3 | 1 | 1 | 20 |
Each cell in the table shows the frequency of observations that fall into the corresponding row and column categories. For example, the table shows there are 3 firms whose sales are between ₹135 lakh and ₹145 lakh and whose advertisement expenditure is between ₹64 thousand and ₹66 thousand. This type of distribution helps in understanding the relationship between two variables, a topic known as correlation.
NCERT Questions Solution
Question 1. Which of the following alternatives is true?
(i) The class midpoint is equal to:
(a) The average of the upper class limit and the lower class limit.
(b) The product of upper class limit and the lower class limit.
(c) The ratio of the upper class limit and the lower class limit.
(d) None of the above.
(ii) The frequency distribution of two variables is known as
(a) Univariate Distribution
(b) Bivariate Distribution
(c) Multivariate Distribution
(d) None of the above
(iii) Statistical calculations in classified data are based on
(a) the actual values of observations
(b) the upper class limits
(c) the lower class limits
(d) the class midpoints
(iv) Range is the
(a) difference between the largest and the smallest observations
(b) difference between the smallest and the largest observations
(c) average of the largest and the smallest observations
(d) ratio of the largest to the smallest observation
Answer:
Question 2. Can there be any advantage in classifying things? Explain with an example from your daily life.
Answer:
Question 3. What is a variable? Distinguish between a discrete and a continuous variable.
Answer:
Question 4. Explain the ‘exclusive’ and ‘inclusive’ methods used in classification of data.
Answer:
Question 5. Use the data in Table 3.2 that relate to monthly household expenditure (in Rs) on food of 50 households and
(i) Obtain the range of monthly household expenditure on food.
(ii) Divide the range into appropriate number of class intervals and obtain the frequency distribution of expenditure.
(iii) Find the number of households whose monthly expenditure on food is
(a) less than Rs 2000
(b) more than Rs 3000
(c) between Rs 1500 and Rs 2500
Answer:
Question 6. In a city 45 families were surveyed for the number of Cell phones they used. Prepare a frequency array based on their replies as recorded below.
| 1 | 3 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 |
| 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 3 | 2 |
| 2 | 6 | 1 | 6 | 2 | 1 | 5 | 1 | 5 | 3 |
| 2 | 4 | 2 | 7 | 4 | 2 | 4 | 3 | 4 | 2 |
| 0 | 3 | 1 | 4 | 3 |
Answer:
Question 7. What is ‘loss of information’ in classified data?
Answer:
Question 8. Do you agree that classified data is better than raw data? Why?
Answer:
Question 9. Distinguish between univariate and bivariate frequency distribution.
Answer:
Question 10. Prepare a frequency distribution by inclusive method taking class interval of 7 from the following data.
| 28 | 17 | 15 | 22 | 29 | 21 | 23 | 27 | 18 | 12 |
| 7 | 2 | 9 | 4 | 1 | 8 | 3 | 10 | 5 | 20 |
| 16 | 12 | 8 | 4 | 33 | 27 | 21 | 15 | 3 | 36 |
| 27 | 18 | 9 | 2 | 4 | 6 | 32 | 31 | 29 | 18 |
| 14 | 13 | 15 | 11 | 9 | 7 | 1 | 5 | 37 | 32 |
| 28 | 26 | 24 | 20 | 19 | 25 | 19 | 20 | 6 | 9 |
Answer:
Question 11. “The quick brown fox jumps over the lazy dog”
Examine the above sentence carefully and note the numbers of letters in each word. Treating the number of letters as a variable, prepare a frequency array for this data.
Answer: