| Classwise Concept with Examples | ||||||
|---|---|---|---|---|---|---|
| 6th | 7th | 8th | 9th | 10th | 11th | 12th |
| Content On This Page | ||
|---|---|---|
| Data & its Related Terms | Bar Graphs & Double Bar Graphs | Representative Values |
| Mean of Grouped and Ungrouped Data | Median and Mode of the Data | Probability and its Related Terms |
Chapter 3 Data Handling (Concepts)
Welcome to this fascinating chapter that introduces you to the world of Data Handling and Probability! In our everyday lives, we are surrounded by information – weather reports, sports scores, survey results, election polls. The science of Statistics provides us with the tools to make sense of this information, known as data. This chapter will guide you through the essential first steps: how to collect information, how to organize it neatly so it's easy to understand, how to represent it visually using graphs, and how to calculate simple summary values that tell us about the 'typical' value in the data. We will also take our first look at Probability, exploring the mathematical way of thinking about chance and likelihood. These skills are incredibly important for understanding the world around us, interpreting news, and making informed decisions based on information.
We begin with the journey of data. Often, when information is first collected, it's in a disorganized form called raw data. Making sense of a jumbled list of numbers or observations can be difficult. Therefore, the first crucial step is organizing data. A common and effective method is creating a frequency distribution table. This involves listing each distinct observation or category and then counting how many times each occurs (its frequency), often using tally marks ($||||$, $\bcancel{||||}$) for efficient counting. This organized table makes patterns much clearer.
While tables organize data, graphs help us visualize it. We might briefly revisit pictographs, but our main focus will be on Bar Graphs. A single bar graph uses rectangular bars of uniform width to represent frequencies or values for different categories, making comparisons easy. We then introduce Double Bar Graphs. These are extremely useful when we want to compare two related sets of data for the same categories side-by-side (e.g., comparing marks obtained by students in Math vs. Science, or comparing the number of cars sold by two different companies over several months). We will learn how to accurately construct both single and double bar graphs, paying attention to choosing appropriate scales and labeling axes clearly, and how to interpret the information presented in them – comparing bar heights, identifying trends, and drawing simple conclusions.
Often, we want a single number that represents the 'center' or 'typical' value of a dataset. These are called Measures of Central Tendency. We will learn about three important measures:
- Mean: This is the most common type of average, calculated by summing all the data values and dividing by the total number of values ($\text{Mean} = \frac{\text{Sum of observations}}{\text{Number of observations}}$).
- Median: This is the middle value when the data is arranged in order (ascending or descending). If there's an even number of observations, the median is the average of the two middle values. The median is useful because it's not heavily affected by extremely high or low values (outliers).
- Mode: This is simply the value that appears most frequently in the dataset. A dataset can have one mode, more than one mode, or no mode at all.
Finally, we take our first steps into Probability – the mathematics of chance. We explore simple random experiments, like tossing a fair coin or rolling a standard die. We learn to identify the outcomes (possible results) and calculate the basic probability of an event happening using the formula: $P(\text{Event}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}$ For example, the probability of getting a 'heads' when tossing a coin is $\frac{1}{2}$. This introduction lays the foundation for understanding likelihood and randomness in a more structured way.
Data & its Related Terms
In our daily lives, we are constantly exposed to information, whether it's from newspapers, television, the internet, or even conversations. This information often comes in the form of facts, figures, or observations related to various aspects of the world around us. In mathematics, specifically in the field of Statistics, we deal with collecting, organising, analysing, and interpreting this type of information to make sense of it and draw conclusions. This raw information is what we call Data.
Data Handling is the process that covers all these steps: collecting the data, organising it so it's easy to understand, representing it visually (like with graphs), interpreting what the representation tells us, and analysing it to find patterns or draw conclusions.
What is Data?
Data is a collection of facts, which can be numbers, words, measurements, observations, or even simple descriptions, gathered for a specific purpose. When data is collected directly from the source and is in its original, unorganised form, it is called Raw Data or Ungrouped Data.
Examples of situations where data is collected:
- A teacher recording the marks obtained by students in a test to see how the class performed.
- A weather reporter collecting temperature readings from different cities over a week.
- A company collecting feedback scores from customers about their service.
- You counting the number of different coloured cars that pass your house in an hour.
Each of these collections of information is a data set, and the information in its initial form is raw data.
Key Terms Related to Data
When working with data, certain terms are commonly used:
1. Observation
Each individual numerical fact or entry in a collection of data is called an Observation or a Variate. It is the specific value recorded for one item or instance in the data set.
Example: If the marks obtained by 5 students in a mathematics test are 75, 80, 65, 90, and 75, then the individual marks 75, 80, 65, 90, and 75 are the observations.
2. Raw Data
As mentioned earlier, Raw Data is the data collected in its original form, without any sorting, grouping, or processing. It is exactly as it was recorded during collection.
Example: If you noted the daily attendance in your class for 10 days as 35, 34, 36, 35, 37, 34, 35, 36, 35, 34, this list is the raw data.
3. Array or Arrayed Data
Arranging the numerical observations of a data set in either ascending order (from smallest to largest) or descending order (from largest to smallest) is called creating an Array. The data after arranging is called Arrayed Data.
Example: Arranging the attendance data (35, 34, 36, 35, 37, 34, 35, 36, 35, 34) in ascending order gives: 34, 34, 34, 35, 35, 35, 35, 36, 36, 37. This is the arrayed data.
Arranging data in an array makes it easier to find the minimum and maximum values and to see how the data is spread.
4. Frequency
The Frequency of a particular observation is the number of times that specific observation or value appears in the given data set. It tells us how often a certain value occurs.
Example: In the arrayed attendance data (34, 34, 34, 35, 35, 35, 35, 36, 36, 37):
- The observation 34 appears 3 times, so its frequency is 3.
- The observation 35 appears 4 times, so its frequency is 4.
- The observation 36 appears 2 times, so its frequency is 2.
- The observation 37 appears 1 time, so its frequency is 1.
5. Frequency Distribution Table
A Frequency Distribution Table is a table that organises raw data by listing each distinct observation or category and its corresponding frequency (how many times it occurs). This provides a clear summary of the data set.
To count frequencies systematically when dealing with raw data, especially for discrete values, we often use Tally Marks. Tally marks are drawn in groups of five to make counting easier: $|$ for 1, $||$ for 2, $|||$ for 3, $||||$ for 4, and $\bcancel{||||}$ for 5. For example, 7 is $\bcancel{||||} ||$.
A frequency distribution table typically has columns for the Observation/Category, Tally Marks, and Frequency.
Example: Let's create a frequency distribution table for the science marks data: 25, 30, 15, 40, 30, 25, 20, 30, 35, 15.
The distinct marks obtained are 15, 20, 25, 30, 35, 40.
Count the occurrences of each mark and use tally marks:
- 15: Appears twice $\implies ||$
- 20: Appears once $\implies |$
- 25: Appears twice $\implies ||$
- 30: Appears three times $\implies |||$
- 35: Appears once $\implies |$
- 40: Appears once $\implies |$
Now, present this in a table:
| Marks (Observation) | Tally Marks | Frequency (Number of Students) |
|---|---|---|
| 15 | || | 2 |
| 20 | | | 1 |
| 25 | || | 2 |
| 30 | ||| | 3 |
| 35 | | | 1 |
| 40 | | | 1 |
| Total | $2+1+2+3+1+1 = \textbf{10}$ |
The 'Total' row in the frequency table should always equal the total number of observations in the raw data.
6. Range
The Range of a data set is the difference between the highest value (maximum observation) and the lowest value (minimum observation) in the data. It gives a simple measure of the spread or variability of the data.
Range = Highest Observation $-$ Lowest Observation
Example: For the science marks data (15, 15, 20, 25, 25, 30, 30, 30, 35, 40):
The highest observation is 40.
The lowest observation is 15.
Range = $40 - 15 = 25$.
The range tells us that the marks obtained by the students are spread over a difference of 25 marks.
Organisation of Data
Organising data is a crucial step after collection. Raw data, especially in large quantities, can be difficult to look at and interpret. Organizing data makes it systematic and easy to understand and analyze. Some methods of organizing data suitable for Class 7 include:
- Creating an Array: Arranging the data values in ascending or descending order.
- Constructing a Frequency Distribution Table: Using tally marks to count the occurrences of each observation and presenting it in a table.
Well-organised data helps in:
- Quickly finding the minimum and maximum values.
- Seeing how the data values are distributed.
- Identifying the most frequent and least frequent observations.
- Calculating statistical measures like mean, median, and mode more easily.
Example 1. The ages (in years) of 15 teachers in a school are given as follows: 32, 41, 28, 54, 35, 26, 23, 33, 38, 40, 26, 43, 30, 35, 47.
(a) What is the range of the ages of the teachers?
(b) Arrange the data in ascending order.
(c) Prepare a frequency distribution table for this data.
Answer:
The given data set is: 32, 41, 28, 54, 35, 26, 23, 33, 38, 40, 26, 43, 30, 35, 47.
The total number of observations is 15.
(a) Find the range of the ages:
To find the range, we need the highest and lowest observations in the data.
Scanning through the list of ages:
Lowest age = 23 years.
Highest age = 54 years.
Range = Highest Observation $-$ Lowest Observation
Range = $54 - 23 = 31$ years.
The range of the ages of the teachers is 31 years.
(b) Arrange the data in ascending order (Arrayed Data):
List the ages from the smallest to the largest:
23, 26, 26, 28, 30, 32, 33, 35, 35, 38, 40, 41, 43, 47, 54.
(c) Prepare a frequency distribution table:
List the distinct ages from the arrayed data (23, 26, 28, 30, 32, 33, 35, 38, 40, 41, 43, 47, 54). Then, count the frequency of each age using tally marks.
Let's count and make tally marks:
- 23: | (1)
- 26: || (2)
- 28: | (1)
- 30: | (1)
- 32: | (1)
- 33: | (1)
- 35: || (2)
- 38: | (1)
- 40: | (1)
- 41: | (1)
- 43: | (1)
- 47: | (1)
- 54: | (1)
Now, present this in a frequency distribution table:
| Age (Years) | Tally Marks | Frequency (Number of Teachers) |
|---|---|---|
| 23 | | | 1 |
| 26 | || | 2 |
| 28 | | | 1 |
| 30 | | | 1 |
| 32 | | | 1 |
| 33 | | | 1 |
| 35 | || | 2 |
| 38 | | | 1 |
| 40 | | | 1 |
| 41 | | | 1 |
| 43 | | | 1 |
| 47 | | | 1 |
| 54 | | | 1 |
| Total | $1+2+1+1+1+1+2+1+1+1+1+1+1 = \textbf{15}$ |
The total frequency (15) matches the total number of teachers, confirming the counts are correct.
Bar Graphs & Double Bar Graphs
In the previous section, we learned about collecting and organising raw data and presenting it in frequency distribution tables. While tables are great for summarising data, visual representations like graphs can often make the data much easier to understand, compare, and interpret quickly. This section introduces two such powerful visual tools: Bar Graphs and Double Bar Graphs.
Bar Graph (or Bar Chart)
A Bar Graph is a graphical way of representing data using rectangular bars of uniform width. These bars are drawn either vertically (standing up) or horizontally (lying on their side) with equal spacing between each bar. The key feature of a bar graph is that the length or height of each bar is proportional to the value or frequency of the category it represents.
Bar graphs are very effective for comparing quantities across different categories.
Features of a Bar Graph:
A complete and informative bar graph includes several key components:
- Title: A clear title that describes the data being displayed in the graph.
- Axes: The graph has two perpendicular lines, usually a horizontal axis (often called the X-axis) and a vertical axis (often called the Y-axis).
- One axis represents the categories or items for which the data is being shown (e.g., names of subjects, months, cities). These are typically labelled with names or descriptions.
- The other axis represents the numerical values or frequencies corresponding to each category. This axis is marked with a scale.
- Scale: A consistent scale is marked on the axis representing the numerical values. The scale indicates how much quantity or frequency each unit of length on that axis represents (e.g., 1 unit length = 10 students, or 1 cm = $\textsf{₹}50$). Choosing an appropriate scale is vital so that the bars fit well on the graph paper and are easy to read. The scale usually starts from zero.
- Bars: These are the rectangular blocks drawn for each category.
- All bars must have the same width.
- There must be equal spacing between consecutive bars.
- The height (for vertical bars) or length (for horizontal bars) of each bar is drawn according to the chosen scale, reaching up to the mark that corresponds to the value of that category.
- Labels: Both the axes should be clearly labelled to indicate what they represent (e.g., "Subjects", "Number of Students", "Years").
Constructing a Bar Graph:
To draw a bar graph from a given set of data, follow these steps:
- Draw two perpendicular lines, one horizontal and one vertical, starting from a common point (usually the origin, representing zero). These are your axes.
- Decide which axis will represent the categories and which will represent the numerical values/frequencies. Label the axes accordingly.
- Choose a suitable Scale for the axis representing the numerical values. Look at the minimum and maximum values in your data and choose a scale that makes the graph easy to draw and read. Mark equal intervals along this axis based on your scale (e.g., 0, 5, 10, 15, ... or 0, 10, 20, 30, ...). Write down the scale clearly (e.g., "Scale: 1 unit = 5 marks").
- Along the other axis (representing categories), mark points at equal intervals where the bars will be drawn. Label these points with the names of the categories. Decide on a uniform width for all bars and ensure the space between consecutive bars is also uniform and equal to the bar width or some other consistent spacing.
- For each category, draw a rectangular bar starting from the category axis. The height of the bar (if vertical) or length (if horizontal) should be drawn up to the value corresponding to that category on the scale of the numerical axis.
- Give a clear Title to the bar graph, usually placed at the top.
Interpreting a Bar Graph:
Once a bar graph is drawn, it helps in quickly understanding the data. We can easily:
- Compare the values of different categories by looking at the heights/lengths of the bars.
- Identify the category with the highest value (tallest/longest bar) and the lowest value (shortest bar).
- See the overall distribution and variation in the data.
Example 1. The marks obtained by a student in an examination in different subjects are given below. Draw a bar graph to represent this data.
| Subject | Marks Obtained |
|---|---|
| English | 75 |
| Hindi | 60 |
| Mathematics | 90 |
| Science | 80 |
| Social Science | 70 |
Answer:
To draw the bar graph representing the student's marks:
Steps for Construction:
- Draw the horizontal (X-axis) and vertical (Y-axis) lines.
- Let the horizontal axis represent the 'Subjects' and the vertical axis represent the 'Marks Obtained'. Label the axes clearly.
- Choose a suitable scale for the Y-axis (Marks). The marks are 75, 60, 90, 80, 70. The minimum mark is 60 and the maximum is 90. A scale of "1 unit length = 10 marks" is appropriate. Mark the Y-axis starting from 0, then 10, 20, 30, ..., up to 100.
- On the X-axis, mark points for each subject (English, Hindi, Mathematics, Science, Social Science) at equal intervals. Decide on the bar width and spacing (e.g., width = 1 unit, spacing = 1 unit).
- Draw vertical bars for each subject. The height of each bar will correspond to the marks obtained for that subject, according to the scale.
- English: 75 marks. Bar height is 7.5 units (halfway between 70 and 80 marks on the scale).
- Hindi: 60 marks. Bar height is 6 units.
- Mathematics: 90 marks. Bar height is 9 units.
- Science: 80 marks. Bar height is 8 units.
- Social Science: 70 marks. Bar height is 7 units.
- Ensure all bars have uniform width and equal spacing.
- Add the title: Marks Obtained by a Student in Different Subjects.
Double Bar Graph
Sometimes, we need to compare two related sets of data for the same categories. For example, comparing a student's performance in two different terms, or comparing the sales of a product in two different years across multiple cities. A Double Bar Graph is a useful tool for such comparisons.
A double bar graph shows two sets of bars side-by-side for each category, allowing for a direct visual comparison between the two data sets for each category.
Features of a Double Bar Graph:
- It has all the features of a single bar graph (Title, Axes, Scale, Labels).
- Paired Bars: For each category (e.g., a subject), two bars are drawn right next to each other, representing the two different data sets (e.g., marks in Term 1 and Term 2).
- Key or Legend: A key or legend is essential to identify which bar in each pair represents which data set (e.g., one colour for Term 1 marks, another colour for Term 2 marks).
- Double bar graphs are excellent for easily comparing two specific data points for every item in the category.
Constructing a Double Bar Graph:
The steps are similar to constructing a single bar graph, with adjustments for drawing two bars per category:
- Draw the perpendicular axes (X-axis and Y-axis). Label them.
- Choose and write the title for the graph.
- Choose a suitable scale for the numerical axis (Y-axis for a vertical graph).
- Mark the categories (e.g., subjects) on the category axis (X-axis).
- For each category, draw two bars side-by-side, one for each data set. The height of each bar should correspond to its value according to the scale. Ensure the bars within a pair touch or are very close, and there is equal spacing between the pairs of bars for different categories.
- Use different colors or patterns for the bars representing the two different data sets.
- Include a clear Key or Legend to explain what each color or pattern represents.
Example 2. The marks obtained by a student in Term 1 and Term 2 examinations are given below. Draw a double bar graph to compare the performance.
| Subject | Term 1 Marks (out of 100) | Term 2 Marks (out of 100) |
|---|---|---|
| English | 67 | 70 |
| Hindi | 72 | 65 |
| Mathematics | 88 | 95 |
| Science | 81 | 85 |
| Social Science | 73 | 75 |
Answer:
To draw the double bar graph to compare Term 1 and Term 2 performance:
Steps for Construction:
- Draw the X-axis (horizontal) and Y-axis (vertical). Label them 'Subjects' and 'Marks Obtained'.
- Choose the Title: Comparison of Student's Marks in Term 1 and Term 2.
- Choose a suitable scale for the Y-axis (Marks). The marks range from 65 to 95. A scale of "1 unit length = 10 marks" is appropriate. Mark the Y-axis starting from 0, then 10, 20, ..., up to 100.
- On the X-axis, mark points for each subject (English, Hindi, Mathematics, Science, Social Science) at equal intervals.
- Create a Key to distinguish the two sets of marks. For example, use blue bars for Term 1 and red bars for Term 2. Write the key clearly on the graph.
- For each subject, draw two vertical bars side-by-side. The height of the blue bar will represent the Term 1 marks, and the height of the red bar will represent the Term 2 marks, according to the scale. Ensure uniform width for all bars and equal spacing between the pairs of bars for different subjects.
- English: Blue bar height for 67, Red bar height for 70.
- Hindi: Blue bar height for 72, Red bar height for 65.
- Mathematics: Blue bar height for 88, Red bar height for 95.
- Science: Blue bar height for 81, Red bar height for 85.
- Social Science: Blue bar height for 73, Red bar height for 75.
Interpretation from the graph:
By looking at the pairs of bars for each subject, we can quickly compare the student's performance between the two terms.
- In English, the red bar (Term 2) is slightly taller than the blue bar (Term 1), showing improvement (from 67 to 70).
- In Hindi, the red bar (Term 2) is shorter than the blue bar (Term 1), showing a decrease in marks (from 72 to 65).
- In Mathematics, the red bar (Term 2) is significantly taller than the blue bar (Term 1), showing good improvement (from 88 to 95).
- Science and Social Science also show slight improvements (81 to 85 and 73 to 75 respectively).
- Overall, the student's performance improved in most subjects, except Hindi. Mathematics saw the highest score in both terms and the most significant improvement.
Representative Values
When we collect a set of data, especially if it contains many observations, it's difficult to grasp its overall characteristics just by looking at the raw list of numbers. We need ways to summarise the data. One important way to summarise data is to find a single value that represents the entire set. Such a value is called a Representative Value or a Measure of Central Tendency. These values give us an idea of the typical or central performance or characteristic of the dataset.
The most commonly used measures of central tendency are the Arithmetic Mean (or Average), the Median, and the Mode. Each of these measures describes the 'centre' of the data in a different way and is useful in different situations.
What are Representative Values?
A representative value is a single number that provides a summary of a set of data points by identifying the central position within the data. It is a value around which the other data points tend to cluster. It helps in understanding the typical behaviour of the data without having to look at every single observation.
Choosing the most appropriate representative value depends on the nature of the data and what aspect of the data's 'centre' you want to describe.
1. Arithmetic Mean (or Average)
The Arithmetic Mean is the most widely used measure of central tendency. It is calculated by adding up all the values in a data set and then dividing the sum by the total number of values. It represents the value that each observation would have if the total quantity were distributed equally among all observations.
The arithmetic mean is often simply referred to as the Mean or the Average.
Formula for Calculating Mean (for Ungrouped Data):
Mean $(\bar{x}) = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$
If you have a data set with $n$ observations, denoted as $x_1, x_2, x_3, ..., x_n$, the formula can be written as:
$\bar{x} = \frac{x_1 + x_2 + x_3 + ... + x_n}{n}$
Using summation notation, this is written as:
$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}$
($\sum$ means "sum of", $x_i$ is the i-th observation)
Characteristics of the Mean:
- It is calculated using every observation in the data set.
- It is affected by the value of every observation.
- It can be significantly influenced by extreme values (outliers). For example, the average height in a class with one very tall person might be misleadingly high.
- For a given set of data, there is only one arithmetic mean (it is unique).
- A key property: The sum of the deviations of each observation from the mean is always zero. $\sum (x_i - \bar{x}) = 0$.
Example: Find the mean of the marks obtained by 5 students: 10, 15, 20, 25, 30.
Sum of observations = $10 + 15 + 20 + 25 + 30 = 100$.
Total number of observations = 5.
Mean = $\frac{\text{Sum of observations}}{\text{Number of observations}} = \frac{100}{5} = 20$.
The average mark is 20.
2. Median
The Median is the middle value of a data set when the data has been arranged in order (either ascending or descending). It divides the data into two halves: half the observations are below the median, and half are above it.
The median is a positional average because its value depends on its position in the ordered list, not directly on the value of every single observation (unlike the mean).
Calculating the Median for Ungrouped Data:
- Arrange the data set in either ascending order (smallest to largest) or descending order (largest to smallest). The result will be the same. Ascending order is more common.
- Determine the total number of observations, $n$, in the data set.
- Based on whether $n$ is odd or even, calculate the median:
- If 'n' is an odd number, there is a single middle value. The median is the value of the observation located at the position $\frac{n+1}{2}$ in the ordered list.
- If 'n' is an even number, there are two middle values. The median is the arithmetic mean (average) of these two middle observations. These two observations are located at positions $\frac{n}{2}$ and $(\frac{n}{2} + 1)$ in the ordered list.
Median $= \frac{\text{Value of } (\frac{n}{2})^{th} \text{ observation} + \text{Value of } (\frac{n}{2} + 1)^{th} \text{ observation}}{2} $
(for even 'n')
Characteristics of the Median:
- It is not affected by extreme values (outliers) in the dataset. For example, adding a very large number to a list might drastically change the mean, but the median will only shift slightly or not at all, making it a robust measure for skewed data.
- It is a positional measure; its calculation depends on the rank of observations after ordering.
- It is unique for a given set of data.
Example (n is odd): Find the median of the marks: 10, 25, 15, 30, 20.
Step 1: Arrange the data in ascending order: 10, 15, 20, 25, 30.
Step 2: Count the number of observations, $n = 5$. Since 5 is odd, the median is the value at position $\frac{5+1}{2} = \frac{6}{2} = 3^{rd}$.
Step 3: Find the $3^{rd}$ observation in the ordered list. The $3^{rd}$ observation is 20.
So, Median = 20.
Example (n is even): Find the median of the marks: 10, 25, 15, 30, 20, 35.
Step 1: Arrange the data in ascending order: 10, 15, 20, 25, 30, 35.
Step 2: Count the number of observations, $n = 6$. Since 6 is even, the median is the average of the two middle observations at positions $\frac{6}{2} = 3^{rd}$ and $\frac{6}{2} + 1 = 4^{th}$.
Step 3: Identify the $3^{rd}$ and $4^{th}$ observations in the ordered list. The $3^{rd}$ observation is 20, and the $4^{th}$ observation is 25.
Step 4: Calculate the average of these two values:
Median $= \frac{20 + 25}{2} = \frac{45}{2} = 22.5$.
So, Median = 22.5.
3. Mode
The Mode is the observation or value that occurs most frequently in a data set. It represents the most common value in the data.
A data set can have one mode (unimodal), two modes (bimodal), three modes (trimodal), or even more modes (multimodal) if multiple values share the highest frequency. If all observations in a data set have the same frequency (e.g., each occurs only once), then the data set has no mode.
Calculating the Mode for Ungrouped Data:
- Examine the data set and count the frequency of each distinct observation. Using a frequency distribution table is very helpful for this.
- Identify the observation(s) that have the highest frequency. That value (or those values) is/are the mode(s).
Arranging the data in order (creating an array) is not strictly necessary to find the mode, but it can make it easier to see repeated values and count their frequencies.
Characteristics of the Mode:
- It is not affected by extreme values (outliers).
- It is easy to find by inspection or simple counting.
- It is the only measure of central tendency that can be used for both numerical data (like ages, marks) and categorical data (like favourite colours, types of cars).
- It may not exist for a given data set, or there might be more than one mode.
Example 1 (Unimodal): Find the mode of the data: 2, 3, 4, 3, 5, 3, 6, 3, 7.
Let's count the frequency of each number:
- 2 appears 1 time.
- 3 appears 4 times.
- 4 appears 1 time.
- 5 appears 1 time.
- 6 appears 1 time.
- 7 appears 1 time.
The highest frequency is 4, which corresponds to the value 3. Since only one value has the highest frequency, the mode is unique.
So, Mode = 3.
Example 2 (Bimodal): Find the mode of the data: 2, 3, 4, 3, 5, 4, 6, 4, 3, 7.
Let's count the frequency of each number:
- 2 appears 1 time.
- 3 appears 3 times.
- 4 appears 3 times.
- 5 appears 1 time.
- 6 appears 1 time.
- 7 appears 1 time.
The highest frequency is 3. Both the values 3 and 4 appear with this highest frequency. Since there are two values with the highest frequency, the data is bimodal, and both are modes.
So, Modes = 3 and 4.
Example 3 (No Mode): Find the mode of the data: 1, 2, 3, 4, 5, 6, 7.
Let's count the frequency of each number:
- 1 appears 1 time.
- 2 appears 1 time.
- 3 appears 1 time.
- 4 appears 1 time.
- 5 appears 1 time.
- 6 appears 1 time.
- 7 appears 1 time.
All values have the same frequency (1). In such a case, there is no value that occurs 'most' frequently.
So, there is no mode for this dataset.
Choosing the Appropriate Representative Value
The choice of which measure of central tendency (mean, median, or mode) is the 'best' representative value depends on the type of data and the specific purpose of the analysis:
- Mean: It is the most commonly used measure and is suitable for numerical data that is fairly symmetrical and does not contain extreme outliers. It is based on all observations.
- Median: It is a good measure for numerical data that has extreme outliers or is skewed (data is heavily clustered towards one end). Since it's the middle value, it's not affected by the magnitude of extreme values. It's often used for data like incomes or property prices.
- Mode: It is most useful when you want to know the most popular or common category or value. It is the only measure of central tendency appropriate for categorical data (like favourite colours, types of cars). For numerical data, it shows which specific value occurs most often.
Example 1. The scores in mathematics test (out of 25) of 15 students are as follows: 19, 25, 23, 20, 9, 20, 15, 10, 5, 16, 25, 20, 24, 12, 20.
Find the mean, median and mode of this data. Are they the same?
Answer:
The given scores are: 19, 25, 23, 20, 9, 20, 15, 10, 5, 16, 25, 20, 24, 12, 20.
The total number of observations (students) is $n = 15$.
1. Calculate the Mean:
Mean $= \frac{\text{Sum of all scores}}{\text{Number of scores}}$.
Sum of scores $= 19 + 25 + 23 + 20 + 9 + 20 + 15 + \ $$ 10 + 5 + 16 + \ $$ 25 + 20 + 24 + \ $$ 12 + 20 = 263$
Sum of scores = 263.
Number of scores = 15.
Mean $= \frac{263}{15}$.
Perform the division:
The mean is a non-terminating repeating decimal. Rounding to two decimal places, Mean $\approx 17.53$.
2. Calculate the Median:
First, arrange the scores in ascending order:
5, 9, 10, 12, 15, 16, 19, 20, 20, 20, 20, 23, 24, 25, 25.
Number of observations, $n = 15$. Since 15 is an odd number, the median is the value of the observation at position $\frac{n+1}{2} = \frac{15+1}{2} = \frac{16}{2} = 8^{th}$.
Find the $8^{th}$ score in the ordered list. The $8^{th}$ score is 20.
So, Median $= 20$.
3. Calculate the Mode:
Identify the score that appears most frequently in the data. Looking at the ordered data or counting from the raw data:
- 5 appears 1 time.
- 9 appears 1 time.
- 10 appears 1 time.
- 12 appears 1 time.
- 15 appears 1 time.
- 16 appears 1 time.
- 19 appears 1 time.
- 20 appears 4 times.
- 23 appears 1 time.
- 24 appears 1 time.
- 25 appears 2 times.
The score '20' occurs with the highest frequency (4 times).
So, Mode $= 20$.
Comparison of Mean, Median, and Mode:
Mean $\approx 17.53$
Median $= 20$
Mode $= 20$
In this dataset, the Median and the Mode are equal to 20, but the Mean (approximately 17.53) is different from the median and mode.
The mean, median, and mode are not all the same for this data.
Mean of Grouped and Ungrouped Data
In the previous section, we introduced the concept of representative values or measures of central tendency, which are single values used to summarise a dataset. The first such measure we will explore in detail is the Arithmetic Mean, often simply called the mean or average. We will learn how to calculate the mean for data that is not grouped (ungrouped data), whether it's presented as a simple list or in a frequency table.
1. Mean of Ungrouped Data (Raw Data List)
When data is available as a simple list of individual observations (raw data), the mean is calculated by summing all these observations and dividing by the total count of observations.
The Arithmetic Mean of a set of observations is the sum of the values of all observations divided by the total number of observations.
Formula for Mean of Ungrouped Data:
Mean $(\bar{x}) = \frac{\text{Sum of all observations}}{\text{Total number of observations}}$
If you have $n$ individual observations, denoted as $x_1, x_2, ..., x_n$, the mean is:
$\bar{x} = \frac{x_1 + x_2 + ... + x_n}{n}$
Using summation notation (where $\sum$ means "sum of"):
$\bar{x} = \frac{\sum x_i}{n}$
Here, $\sum x_i$ represents the sum of all the individual observations, and $n$ is the total count of how many observations there are.
Example: Find the mean of the numbers: 5, 10, 15, 20.
Sum of observations = $5 + 10 + 15 + 20 = 50$.
Total number of observations, $n = 4$.
Mean $(\bar{x}) = \frac{\text{Sum of observations}}{\text{Number of observations}} = \frac{50}{4}$.
Perform the division: $50 \div 4 = 12.5$.
Mean $= 12.5$.
Example 1. The heights of 5 plants in a garden are 65 cm, 72 cm, 58 cm, 75 cm, and 60 cm. Find the mean height of the plants.
Answer:
The given data set is a list of individual heights: 65 cm, 72 cm, 58 cm, 75 cm, 60 cm.
This is ungrouped data. The number of observations is the number of plants, $n = 5$.
To find the mean height, we first find the sum of all the heights:
Sum of heights = $65 + 72 + 58 + 75 + 60$ cm.
Sum of heights = 330 cm.
Now, use the formula for the mean:
Mean height $= \frac{\text{Sum of heights}}{\text{Number of plants}} = \frac{330 \text{ cm}}{5}$.
Perform the division:
The quotient is 66.
Mean height $= 66$ cm.
Therefore, the mean height of the plants is 66 cm.
2. Mean of Ungrouped Data Presented in a Frequency Distribution Table
Sometimes, even though the data is ungrouped (individual distinct values), it is organised and presented in a frequency distribution table. This table lists each distinct observation ($x_i$) and how many times it occurs (its frequency, $f_i$).
When data is given in this format, calculating the sum of all observations by manually listing each one and adding them can be tedious, especially if the frequencies are large. Instead, we use the frequencies to simplify the calculation of the sum.
The sum of all observations is found by multiplying each distinct observation by its frequency and then adding all these products together. The total number of observations ($N$) is the sum of all the frequencies.
Formula for Mean from a Frequency Distribution Table:
If there are $k$ distinct observations $x_1, x_2, ..., x_k$ with corresponding frequencies $f_1, f_2, ..., f_k$, then the mean is:
Mean $(\bar{x}) = \frac{(x_1 \times f_1) + (x_2 \times f_2) + ... + (x_k \times f_k)}{\text{Sum of frequencies}}$
This can be written using summation notation as:
$\bar{x} = \frac{\sum (x_i f_i)}{\sum f_i}$
... (i)
Where:
- $x_i$ represents the $i^{th}$ distinct observation (e.g., a specific mark value).
- $f_i$ represents the frequency of the $i^{th}$ observation (e.g., the number of students who got that mark).
- $x_i f_i$ (or $f_i x_i$) is the product of the observation and its frequency. This product represents the sum of all values for that specific observation (e.g., if 7 students got 20 marks, their total contribution to the sum is $7 \times 20 = 140$).
- $\sum (x_i f_i)$ is the sum of all these products, which is the total sum of all original observations.
- $\sum f_i$ is the sum of all frequencies, which is the total number of observations in the data set ($N$).
Steps to Calculate Mean from a Frequency Table:
- Create a table with at least three columns: 'Observation ($x_i$)', 'Frequency ($f_i$)', and 'Product ($x_i f_i$)'.
- List the distinct observations and their frequencies in the first two columns based on the given data.
- In the third column, for each row, calculate the product of the observation value and its frequency ($x_i \times f_i$).
- Find the sum of all frequencies ($\sum f_i$) by adding all the numbers in the frequency column. This gives the total number of observations ($N$).
- Find the sum of all the products ($\sum (x_i f_i)$) by adding all the numbers in the third column. This gives the sum of all observations.
- Use the formula $\bar{x} = \frac{\sum (x_i f_i)}{\sum f_i}$ to calculate the mean.
Example 2. The marks obtained by 20 students in a class test are given below in a frequency distribution table. Find the mean marks.
| Marks ($x_i$) | Number of Students ($f_i$) |
|---|---|
| 10 | 3 |
| 15 | 5 |
| 20 | 7 |
| 25 | 4 |
| 30 | 1 |
Answer:
The data is given in a frequency distribution table. We use the formula $\bar{x} = \frac{\sum (x_i f_i)}{\sum f_i}$.
We create an additional column for the product $x_i f_i$ to help calculate the sums needed for the formula.
| Marks ($x_i$) | Number of Students ($f_i$) | Product ($x_i f_i$) |
|---|---|---|
| 10 | 3 | $10 \times 3 = 30$ |
| 15 | 5 | $15 \times 5 = 75$ |
| 20 | 7 | $20 \times 7 = 140$ |
| 25 | 4 | $25 \times 4 = 100$ |
| 30 | 1 | $30 \times 1 = 30$ |
| Total | $\sum f_i = 3+5+7+4+1 = 20$ | $\sum (x_i f_i) = 30+75+140+100+30 = 375$ |
From the table, we have:
Sum of frequencies, $\sum f_i = 20$. This is the total number of students ($N$).
Sum of the products ($x_i f_i$), $\sum (x_i f_i) = 375$. This is the sum of all the marks obtained by the 20 students.
Now, calculate the mean using the formula:
Mean $(\bar{x}) = \frac{\sum (x_i f_i)}{\sum f_i} = \frac{375}{20}$.
Perform the division:
The quotient is 18.75.
Therefore, the mean marks of the students are 18.75.
3. Mean of Grouped Data (Higher Classes)
For very large data sets, it becomes impractical to list every single distinct observation and its frequency. In such cases, data is often Grouped into class intervals (e.g., age groups 0-10, 10-20, marks 0-10, 10-20). When data is presented in this format (grouped data), the individual values are lost, and we only know how many observations fall within each interval.
Calculating the exact mean from grouped data is not possible. Instead, we calculate an estimated mean.
To estimate the mean for grouped data:
- For each class interval, calculate the mid-point (or class mark). The mid-point represents the average value of the observations within that class.
Class Mark ($x_i$) = $\frac{\text{Upper class limit} + \text{Lower class limit}}{2}$
- Assume that all observations within a class interval are concentrated at its mid-point. So, the mid-point ($x_i$) is used as the representative value for all frequencies ($f_i$) in that class.
- Multiply the mid-point ($x_i$) of each class by the frequency ($f_i$) of that class to get $x_i f_i$.
- Use the same formula as for ungrouped data with frequencies: $\bar{x} = \frac{\sum (x_i f_i)}{\sum f_i}$, where $x_i$ are the class marks and $f_i$ are the class frequencies.
This method is one way to calculate the mean of grouped data (called the Direct Method) and is a topic typically covered in detail in higher classes. For your current level (Class 7), the focus remains on understanding and calculating the mean for ungrouped data presented either as a list or in a simple frequency table listing individual distinct observations.
Characteristics of the Mean: (Revisited for Clarity)
Regardless of how it's calculated, the mean has these general characteristics:
- It uses all data values in its computation.
- Its value can be significantly affected by extreme values (outliers).
- It is a unique value for any given dataset.
- The sum of the differences between each observation and the mean is zero.
Median and Mode of the Data
In addition to the mean, the Median and the Mode are two other important measures of central tendency or representative values that help us summarise and understand a dataset. While the mean gives the average value, the median tells us the middle value, and the mode indicates the most frequent value. Each measure provides a different kind of information about where the data is centered or clustered.
Median
The Median is the middle value of a dataset when the observations are arranged in order of magnitude (either from smallest to largest - ascending order, or from largest to smallest - descending order). The median is a useful measure because it is not affected by extremely high or low values (outliers) in the data, making it a robust indicator of the 'typical' value, especially in data that is skewed or contains unusual values.
When the data is ordered, the median is the value that splits the dataset into two equal halves: roughly 50% of the observations are less than or equal to the median, and roughly 50% are greater than or equal to the median.
Steps to Find the Median of Ungrouped Data:
To calculate the median for a set of ungrouped data:
- Arrange the data: The first and most important step is to arrange all the observations in the dataset in either ascending order (from smallest to largest) or descending order.
- Count the observations: Determine the total number of observations in the data set and denote it by 'n'.
- Determine the median based on 'n':
- If 'n' is an odd number, there is a single middle observation. The median is the value of the observation located at the position given by the formula $\frac{n+1}{2}$ in the ordered list.
- If 'n' is an even number, there are two middle observations. The median is the arithmetic mean (average) of these two middle observations. These two observations are located at positions $\frac{n}{2}$ and $(\frac{n}{2} + 1)$ in the ordered list.
Median = Value of the $(\frac{n+1}{2})^{th}$ observation in the ordered data.
Median $= \frac{\text{Value of the } (\frac{n}{2})^{th} \text{ obs} + \text{Value of the } (\frac{n}{2} + 1)^{th} \text{ obs}}{2} $
(for even n)
Advantages of Median:
- It is not affected by extreme values (outliers). This is a significant advantage over the mean when dealing with data that includes unusually high or low values.
- It provides a good sense of the 'middle' of the data, especially for skewed distributions.
- It is relatively easy to calculate once the data is ordered.
- It is unique for any given dataset.
Example 1. The runs scored by 11 players in a cricket match are as follows: 6, 15, 120, 50, 100, 80, 10, 15, 8, 10, 15. Find the median of this data.
Answer:
The given scores are: 6, 15, 120, 50, 100, 80, 10, 15, 8, 10, 15.
Step 1: Arrange the data in ascending order.
Count the scores to ensure all are included: There are 11 scores.
Ordered data: 6, 8, 10, 10, 15, 15, 15, 50, 80, 100, 120.
Step 2: Count the number of observations.
Number of observations, $n = 11$. Since 11 is an odd number, we use the formula for odd 'n'.
Step 3: Determine the median.
The median is the value of the $(\frac{n+1}{2})^{th}$ observation in the ordered list.
Position of Median $= (\frac{11+1}{2})^{th} \text{ observation} = (\frac{12}{2})^{th} \text{ observation} = 6^{th} \text{ observation}$.
Now, find the value of the $6^{th}$ observation in the sorted data (6, 8, 10, 10, 15, 15, 15, 50, 80, 100, 120).
The $6^{th}$ observation is 15.
Therefore, the median score is 15.
Example 2. Find the median of the data: 25, 34, 31, 23, 22, 26, 35, 28, 20, 32.
Answer:
The given data is: 25, 34, 31, 23, 22, 26, 35, 28, 20, 32.
Step 1: Arrange the data in ascending order.
Count the observations: There are 10 observations.
Ordered data: 20, 22, 23, 25, 26, 28, 31, 32, 34, 35.
Step 2: Count the number of observations.
Number of observations, $n = 10$. Since 10 is an even number, we use the formula for even 'n'.
Step 3: Determine the median.
The median is the average of the two middle observations, which are at positions $\frac{n}{2}$ and $(\frac{n}{2} + 1)$.
Position of the first middle observation
$= (\frac{10}{2})^{th} \text{ observation} = 5^{th} \text{ observation}$.
Position of the second middle observation
$= (\frac{10}{2} + 1)^{th} \text{ observation} = (5+1)^{th} \text{ observation} = 6^{th} \text{ observation}$.
From the sorted data (20, 22, 23, 25, 26, 28, 31, 32, 34, 35):
The $5^{th}$ observation is 26.
The $6^{th}$ observation is 28.
Calculate the median by finding the average of these two values:
Median $= \frac{\text{Value of } 5^{th} \text{ obs} + \text{Value of } 6^{th} \text{ obs}}{2} = \frac{26 + 28}{2}$.
Median $= \frac{54}{2} = 27$.
Therefore, the median of the data is 27.
Mode
The Mode is the observation or value that appears most frequently in a data set. It is the value that has the highest frequency. The mode is the most easily understandable measure of central tendency and can be found by simply counting the occurrences of each distinct value in the data.
Characteristics of Mode:
- It represents the most typical or common value in the dataset.
- It is not affected by extreme values (outliers), similar to the median.
- It can be calculated for both numerical data (like test scores, ages) and categorical data (like favourite colours, types of cars, sizes of shirts).
- A data set can have one mode (unimodal), more than one mode (bimodal for two modes, trimodal for three modes, etc.), or no mode at all if all observations appear with the same frequency.
- It is often the easiest measure to determine by simple inspection, especially from a frequency distribution table or a bar graph (the category with the tallest bar).
Steps to Find the Mode of Ungrouped Data:
- Examine the data set and list all the distinct values present.
- Count the frequency (number of times each distinct value appears) for every distinct value. Creating a frequency distribution table is an efficient way to do this.
- Identify the value(s) that have the highest frequency among all distinct values.
- That value (or those values) is/are the mode(s) of the dataset.
Arranging the data in ascending order (creating an array) before finding frequencies can help ensure you don't miss any values and make counting easier, but it is not strictly required for finding the mode.
Example 3. Find the mode of the following set of numbers: 1, 1, 2, 4, 3, 2, 1, 2, 2, 4.
Answer:
The given data is: 1, 1, 2, 4, 3, 2, 1, 2, 2, 4.
Let's count the frequency of each distinct number:
- Number 1 appears 3 times.
- Number 2 appears 4 times.
- Number 3 appears 1 time.
- Number 4 appears 2 times.
Compare the frequencies: 3, 4, 1, 2. The highest frequency is 4, which corresponds to the number 2.
Therefore, the mode of this data is 2.
Example 4. The heights (in cm) of 10 students are: 150, 152, 155, 152, 150, 151, 155, 153, 152, 155. Find the mode(s) of the heights.
Answer:
The given heights are: 150, 152, 155, 152, 150, 151, 155, 153, 152, 155.
Let's find the frequency of each distinct height. It's helpful to list the distinct heights first: 150, 151, 152, 153, 155.
Count the occurrences of each height:
- 150 appears 2 times.
- 151 appears 1 time.
- 152 appears 3 times.
- 153 appears 1 time.
- 155 appears 3 times.
Compare the frequencies: 2, 1, 3, 1, 3. The highest frequency is 3.
Both the heights 152 cm and 155 cm appear with the highest frequency (3 times each). Since there are two values with the same highest frequency, the data set has two modes.
Therefore, the modes of this data are 152 cm and 155 cm.
This data set is bimodal.
Example 5. Find the mode of the sizes of shirts sold by a shopkeeper on a particular day, which are: M, L, XL, L, M, L, S, M, L, XL, L, M, XXL, L.
Answer:
The given shirt sizes are: M, L, XL, L, M, L, S, M, L, XL, L, M, XXL, L.
This is categorical data (categories are shirt sizes). We can find the mode for categorical data.
Let's count the frequency of each distinct size:
- S appears 1 time.
- M appears 4 times.
- L appears 6 times.
- XL appears 2 times.
- XXL appears 1 time.
Compare the frequencies: 1, 4, 6, 2, 1. The highest frequency is 6, which corresponds to the size 'L'.
Therefore, the mode of the shirt sizes is L. This means that 'L' is the most frequently sold shirt size on that day.
Note: For categorical data like this, it is not possible to calculate the mean or median, but the mode is a meaningful representative value.
Probability and its Related Terms
In our daily lives, we constantly face situations where the outcome is uncertain. We often talk about the 'chance' or 'likelihood' of something happening. For example, we might wonder about the chance of rain tomorrow, the likelihood of winning a game, or the possibility of a student getting a distinction in an exam. Probability is the branch of mathematics that provides a way to measure and quantify this uncertainty or chance. It helps us to express the likelihood of an event happening using numbers.
Basic Concepts in Probability
To understand probability, we need to be familiar with some basic terms:
1. Experiment and Random Experiment
An Experiment is any operation or procedure that is carried out under certain conditions to produce a result. When we talk about probability, we are usually concerned with a special type of experiment called a random experiment.
A Random Experiment (or probabilistic experiment) is an experiment or a process that satisfies the following conditions:
- All the possible results or outcomes of the experiment are known in advance.
- The exact outcome of any single trial (repetition) of the experiment cannot be predicted with certainty beforehand. There is an element of chance involved.
- The experiment can be repeated under identical conditions.
Examples of Random Experiments:
- Tossing a fair coin: We know the possible outcomes are Head or Tail, but we cannot predict whether the next toss will be a Head or a Tail.
- Rolling a standard six-sided die: We know the possible outcomes are the numbers 1, 2, 3, 4, 5, or 6, but we cannot say for sure which number will appear on a specific roll.
- Drawing a card from a well-shuffled deck of 52 playing cards: We know all 52 possible cards that can be drawn, but we don't know which specific card will be drawn in a single attempt.
2. Outcome
An Outcome is a single possible result of a random experiment.
Examples:
- In the experiment of tossing a coin, "getting a Head" is one outcome, and "getting a Tail" is another outcome.
- In the experiment of rolling a standard die, "getting the number 4" is an outcome. "Getting the number 1" is another outcome, and so on.
- In the experiment of drawing a card from a deck, "drawing the Ace of Spades" is a specific outcome.
3. Sample Space (S)
The Sample Space of a random experiment is the set (or collection) of all possible outcomes of that experiment. It is usually denoted by the letter 'S'. The number of elements in the sample space is denoted by $n(S)$.
Examples:
- For the experiment of tossing a single coin: The possible outcomes are Head (H) and Tail (T). The sample space $S = \{\text{H, T}\}$. The number of outcomes in the sample space is $n(S) = 2$.
- For the experiment of rolling a standard die: The possible outcomes are the numbers 1, 2, 3, 4, 5, 6. The sample space $S = \{1, 2, 3, 4, 5, 6\}$. The number of outcomes is $n(S) = 6$.
- For the experiment of tossing two coins simultaneously: The possible outcomes are Head followed by Head (HH), Head followed by Tail (HT), Tail followed by Head (TH), and Tail followed by Tail (TT). The sample space $S = \{\text{HH, HT, TH, TT}\}$. The number of outcomes is $n(S) = 4$.
4. Event (E)
An Event is any subset of the sample space. In simpler terms, an event is a collection of one or more specific outcomes of a random experiment that we are interested in. An event is said to occur if the result of the experiment is one of the outcomes included in that event.
Examples (for the experiment of rolling a standard die, where $S = \{1, 2, 3, 4, 5, 6\}$):
- Let E be the event of "getting an even number". The outcomes favourable to this event are those numbers in S that are even. So, $E = \{2, 4, 6\}$. The number of outcomes favourable to event E is $n(E) = 3$.
- Let P be the event of "getting a prime number". Prime numbers in S are 2, 3, 5. So, $P = \{2, 3, 5\}$. The number of outcomes favourable to event P is $n(P) = 3$.
- Let G be the event of "getting a number greater than 4". Numbers in S greater than 4 are 5, 6. So, $G = \{5, 6\}$. The number of outcomes favourable to event G is $n(G) = 2$.
- Let Z be the event of "getting a number less than 7". All outcomes in S are less than 7. So, $Z = \{1, 2, 3, 4, 5, 6\} = S$. This is called a sure event. $n(Z) = 6$.
- Let I be the event of "getting the number 7". There is no outcome in S that is 7. So, $I = \{\}$ or $\emptyset$ (the empty set). This is called an impossible event. $n(I) = 0$.
5. Equally Likely Outcomes
The outcomes of a random experiment are said to be Equally Likely if each outcome has the same chance or probability of occurring as any other outcome. In a fair or unbiased experiment, the outcomes are usually equally likely.
Examples:
- When tossing a fair coin, the outcomes Head and Tail are equally likely. The chance of getting a Head is the same as the chance of getting a Tail.
- When rolling a fair standard die, each face (representing numbers 1, 2, 3, 4, 5, 6) has the same chance of landing face up. So, all 6 outcomes are equally likely.
- If a bag contains 3 red balls and 5 blue balls of the same size and weight, and you draw one ball at random, drawing any specific ball is equally likely. However, the *event* of drawing a red ball (3 possibilities) and the *event* of drawing a blue ball (5 possibilities) are NOT equally likely. Drawing a blue ball is more likely than drawing a red ball.
The concept of equally likely outcomes is fundamental to the basic definition of probability for simple events.
Probability of an Event
The Probability of an event E, denoted by $P(E)$, is a numerical value that measures the likelihood or chance of that event occurring. When all outcomes in the sample space are equally likely (which is assumed for simple probability calculations in Class 7), the probability of an event E is defined as the ratio of the number of outcomes favourable to event E to the total number of possible outcomes in the sample space S.
Formula for Probability of an Event (with Equally Likely Outcomes):
P(E) = $\frac{\text{Number of outcomes favorable to Event E}}{\text{Total number of possible outcomes in the Sample Space S}}$
Using the notation for the number of elements in a set:
P(E) = $\frac{n(E)}{n(S)}$
Where:
- $n(E)$ is the number of outcomes that are in the event E (favourable outcomes).
- $n(S)$ is the total number of outcomes in the sample space S.
The probability of an event is always a value between 0 and 1, inclusive. It can be expressed as a fraction, a decimal, or a percentage.
Properties of Probability:
The probability of any event E has the following properties:
- The probability of any event E is a number between 0 and 1, including 0 and 1.
$0 \le P(E) \le 1$
- Impossible Event: If an event E cannot possibly occur, it is called an impossible event. The number of favourable outcomes is 0, so its probability is 0.
For an impossible event $E_i$, $n(E_i) = 0$, so $P(E_i) = \frac{0}{n(S)} = 0 $.
Example: Getting a 7 when rolling a standard die. $n(E_i)=0$, $n(S)=6$. $P(\text{getting a 7}) = \frac{0}{6} = 0$.
- Sure (or Certain) Event: If an event E is certain to occur (meaning all possible outcomes are favourable to the event), it is called a sure event. The number of favourable outcomes is equal to the total number of outcomes in the sample space, so its probability is 1.
For a sure event $E_s$, $n(E_s) = n(S)$, so $P(E_s) = \frac{n(S)}{n(S)} = 1 $.
Example: Getting a number less than 7 when rolling a standard die. $n(E_s)=6$, $n(S)=6$. $P(\text{getting < 7}) = \frac{6}{6} = 1$.
- The sum of the probabilities of all elementary events (outcomes that consist of only one single result from the sample space) of an experiment is always 1.
Probability can be expressed as a fraction (e.g., $\frac{1}{2}$), a decimal (e.g., 0.5), or a percentage (e.g., 50%).
Calculating Probability - Examples
Example 1. A fair coin is tossed once. What is the probability of getting (a) a head? (b) a tail?
Answer:
When a fair coin is tossed once, the possible outcomes are Head (H) and Tail (T). The sample space is $S = \{\text{H, T}\}$.
The total number of possible outcomes in the sample space is $n(S) = 2$. Since the coin is fair, the outcomes (Head and Tail) are equally likely.
(a) Find the probability of getting a head.
Let E be the event of getting a head. The outcome favorable to this event is just {H}.
The number of outcomes favorable to event E is $n(E) = 1$.
Using the probability formula $P(E) = \frac{n(E)}{n(S)}$:
P(Getting a Head) $= \frac{1}{2}$.
(b) Find the probability of getting a tail.
Let F be the event of getting a tail. The outcome favorable to this event is just {T}.
The number of outcomes favorable to event F is $n(F) = 1$.
Using the probability formula $P(F) = \frac{n(F)}{n(S)}$:
P(Getting a Tail) $= \frac{1}{2}$.
Notice that $P(\text{Head}) + P(\text{Tail}) = \frac{1}{2} + \frac{1}{2} = 1$. This is expected because getting a head and getting a tail are the only two possible outcomes, and one of them must occur.
Example 2. A standard six-sided die is rolled once. What is the probability of getting:
(a) an even number?
(b) a number greater than 4?
(c) the number 5?
Answer:
When a standard six-sided die is rolled once, the possible outcomes are the numbers on its faces: 1, 2, 3, 4, 5, 6.
The sample space is $S = \{1, 2, 3, 4, 5, 6\}$.
The total number of possible outcomes is $n(S) = 6$. Assuming the die is fair, all outcomes are equally likely.
(a) Find the probability of getting an even number.
Let A be the event of getting an even number. The outcomes in the sample space that are even are {2, 4, 6}.
The number of outcomes favorable to event A is $n(A) = 3$.
Using the probability formula $P(A) = \frac{n(A)}{n(S)}$:
P(Getting an even number) $= \frac{3}{6} = \frac{1}{2}$.
(b) Find the probability of getting a number greater than 4.
Let B be the event of getting a number greater than 4. The outcomes in the sample space that are greater than 4 are {5, 6}.
The number of outcomes favorable to event B is $n(B) = 2$.
Using the probability formula $P(B) = \frac{n(B)}{n(S)}$:
P(Getting a number greater than 4) $= \frac{2}{6} = \frac{1}{3}$.
(c) Find the probability of getting the number 5.
Let C be the event of getting the number 5. The outcome favorable to this event is just {5}.
The number of outcomes favorable to event C is $n(C) = 1$.
Using the probability formula $P(C) = \frac{n(C)}{n(S)}$:
P(Getting the number 5) $= \frac{1}{6}$.
Example 3. A bag contains 3 red balls and 5 blue balls. If one ball is drawn at random from the bag, what is the probability that the ball drawn is (a) red? (b) blue?
Answer:
Given:
Number of red balls = 3.
Number of blue balls = 5.
Total number of balls in the bag = Number of red balls + Number of blue balls $= 3 + 5 = 8$.
When one ball is drawn at random, each ball has an equal chance of being selected. The total number of possible outcomes is the total number of balls in the bag.
Total number of possible outcomes, $n(S) = 8$.
(a) Find the probability that the ball drawn is red.
Let R be the event that the ball drawn is red. The outcomes favorable to event R are drawing any one of the 3 red balls.
Number of outcomes favorable to event R, $n(R) = 3$ (since there are 3 red balls).
Using the probability formula $P(R) = \frac{n(R)}{n(S)}$:
P(Drawing a red ball) $= \frac{3}{8}$.
(b) Find the probability that the ball drawn is blue.
Let B be the event that the ball drawn is blue. The outcomes favorable to event B are drawing any one of the 5 blue balls.
Number of outcomes favorable to event B, $n(B) = 5$ (since there are 5 blue balls).
Using the probability formula $P(B) = \frac{n(B)}{n(S)}$:
P(Drawing a blue ball) $= \frac{5}{8}$.
Notice that $P(\text{Red}) + P(\text{Blue}) = \frac{3}{8} + \frac{5}{8} = \frac{8}{8} = 1$. This makes sense because drawing either a red ball or a blue ball are the only two possibilities in this experiment, and these two events are mutually exclusive (you cannot draw a ball that is both red and blue at the same time).
Chance and Probability in Real Life
Understanding basic probability helps us make sense of uncertainty in various real-world situations. Here are some applications of the concept of chance and probability:
- Weather Forecasting: Meteorologists use complex models based on probability to predict the weather. A "40% chance of rain" means that under similar historical weather conditions, rain occurred 40% of the time.
- Games of Chance: Card games, dice games, coin tosses, and lotteries are designed around the principles of probability. Knowing the probability of different outcomes helps players understand their chances (though it doesn't guarantee winning!).
- Insurance: Insurance companies use probability to calculate the likelihood of events like accidents, illness, or property damage occurring. This helps them determine the insurance premiums people need to pay to cover potential losses.
- Medical Decisions: Doctors use probability to assess the likelihood of a patient having a certain disease based on symptoms or test results. The accuracy of diagnostic tests is also described in terms of probability.
- Quality Control: In manufacturing, companies use probability to estimate the chance of a defective product occurring in a batch. This helps them decide how much testing is needed.
- Financial Markets: Probability is used to model the likelihood of stock prices moving up or down.
Probability provides a framework for thinking about uncertainty and risk, helping us make more informed decisions in situations where we don't have complete information.