Data handling: Calculate, represent and interpret measures of central tendency and dispersion in univariate numerical grouped data

# Unit 2: Construct the ogive

Natashia Bearam-Edmunds

### Unit outcomes

By the end of this unit you will be able to:

• Work out class intervals.
• Plot the ogive.
• Interpret the ogive.

## What you should know

Before you start this unit, make sure you can:

## Introduction

In unit 1 of this subject outcome we learnt that when consecutive frequencies are added together we get cumulative frequencies. A cumulative frequency can be plotted as its own graph, called the ogive (pronounced ‘o-jive’) or the cumulative frequency graph. We use the data and intervals from a grouped frequency distribution to plot the ogive. As you will see it is easier to get information by looking at an ogive rather than working through the data in a frequency table.

The ogive is used to approximate the number of observations less than or equal to a specific value and is useful when you want to see what has happened up to a particular point. The following activity will take you through the process of constructing an ogive.

### Activity 2.1: Use a frequency distribution to draw an ogive

Time required: 20 minutes

What you need:

• a pen and paper

What to do:

Below is the cumulative frequency table of the ages of people attending a public reading by an author at a library.

 Class intervals Class midpoint Frequency Cumulative frequency $\scriptsize 9 \lt x\le 12$ $\scriptsize \displaystyle 10.5$ $\scriptsize \displaystyle 5$ $\scriptsize 12 \lt x\le 15$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 13$ $\scriptsize 15 \lt x\le 18$ $\scriptsize \displaystyle 13$ $\scriptsize 18 \lt x\le 21$ $\scriptsize \displaystyle 15$ $\scriptsize \displaystyle 41$ $\scriptsize 21 \lt x\le 24$ $\scriptsize \displaystyle 22.5$ $\scriptsize \displaystyle 24$ $\scriptsize 24 \lt x\le 27$ $\scriptsize \displaystyle 17$ $\scriptsize \displaystyle 82$ $\scriptsize 27 \lt x\le 30$ $\scriptsize \displaystyle 18$
1. Rewrite and complete the table.
2. If you were to plot a graph of the data, what values would go on the horizontal axis and on the vertical axis?
3. How can you make sure the graph starts from the x-axis?
4. On a set of labelled axes plot the cumulative frequency against the class midpoints and name your graph.
5. What percentage of attendees was less than $\scriptsize \displaystyle 21$ years old?

What did you find?

1. Find the class midpoint by adding the lower and upper class limits and dividing by $\scriptsize 2$. Your completed table should have the following values. You will not use the class midpoint to construct the ogive but it will be useful when we calculate central tendency and dispersion in later units.
 Class intervals Class midpoint Frequency Cumulative frequency $\scriptsize 9 \lt x\le 12$ $\scriptsize \displaystyle 10.5$ $\scriptsize \displaystyle 5$ $\scriptsize \displaystyle 5$ $\scriptsize 12 \lt x\le 15$ $\scriptsize \displaystyle 13.5$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 13$ $\scriptsize 15 \lt x\le 18$ $\scriptsize \displaystyle 16.5$ $\scriptsize \displaystyle 13$ $\scriptsize \displaystyle \mathbf{26}$ $\scriptsize 18 \lt x\le 21$ $\scriptsize \displaystyle 19.5$ $\scriptsize \displaystyle 15$ $\scriptsize \displaystyle 41$ $\scriptsize 21 \lt x\le 24$ $\scriptsize \displaystyle 22.5$ $\scriptsize \displaystyle 24$ $\scriptsize \displaystyle \mathbf{65}$ $\scriptsize 24 \lt x\le 27$ $\scriptsize \displaystyle 25.5$ $\scriptsize \displaystyle 17$ $\scriptsize \displaystyle 82$ $\scriptsize 27 \lt x\le 30$ $\scriptsize \displaystyle 28.5$ $\scriptsize \displaystyle 18$ $\scriptsize \displaystyle 100$

To determine the cumulative frequency, we add up the frequencies going down the table. The first cumulative frequency is the same as the frequency, because we are adding it to zero. The final cumulative frequency is always equal to the sum of all the frequencies.

2. Class intervals in numbers of years is plotted on the x-axis and cumulative frequency plotted on the vertical axis (y-axis).
3. For the graph to start on the x-axis, the first lower class boundary must be plotted with a cumulative frequency of zero. There are no data values between zero and $\scriptsize \displaystyle 9$ so the graph must reflect no data values in that range.
4. Plot a dot at each class interval endpoint against the cumulative frequency. Join the dots using line segments.
5. Read off from the graph as shown below. Find $\scriptsize 21$ on the x-axis and draw a vertical line upward until you touch the ogive, then draw a horizontal line from that point to meet the y-axis. The corresponding value on the y-axis tells you how many people were less than $\scriptsize 21$.

We see that $\scriptsize \displaystyle \displaystyle \frac{{41}}{{100}}=41\%$ of attendees were less than $\scriptsize \displaystyle 21$ years old.

### How to construct a cumulative frequency graph/ogive

In Activity 2.1 we saw that these are the steps to construct an ogive:

1. The class intervals are labelled on the horizontal axis (x-axis) and the cumulative frequencies along the vertical axis (y-axis).
2. Every ogive will start at the lower boundary of the first-class interval on the horizontal axis with a frequency of zero.
3. Plot a dot at each class interval endpoint against the cumulative frequency. In this way, the end of the final interval will always be the total number of data elements that we will have added up across all intervals.
4. Join the dots using straight lines.

Note: this is not a continuous curve.

### Example 2.1

1. The table shows the marks out of $\scriptsize 60$ for a physics test.
Complete the table.
 Intervals Frequency Cumulative frequency $\scriptsize 10 \lt n\le 20$ $\scriptsize 5$ $\scriptsize 20 \lt n\le 30$ $\scriptsize 7$ $\scriptsize 30 \lt n\le 40$ $\scriptsize 12$ $\scriptsize 40 \lt n\le 50$ $\scriptsize 10$ $\scriptsize 50 \lt n\le 60$ $\scriptsize 6$
2. Draw the ogive for the data.
3. What percentage of learners got a mark higher than $\scriptsize 40$?

Solution

1. We complete the cumulative frequency column by adding frequencies as we go down the table. The completed table looks like this:
 Intervals Frequency Cumulative frequency $\scriptsize 10 \lt n\le 20$ $\scriptsize 5$ $\scriptsize 5$ $\scriptsize 20 \lt n\le 30$ $\scriptsize 7$ $\scriptsize 12$ $\scriptsize 30 \lt n\le 40$ $\scriptsize 12$ $\scriptsize 24$ $\scriptsize 40 \lt n\le 50$ $\scriptsize 10$ $\scriptsize 34$ $\scriptsize 50 \lt n\le 60$ $\scriptsize 6$ $\scriptsize 40$
2. Now we use the completed frequency table to draw the ogive by plotting the endpoint of each interval against the cumulative frequency for that interval.
3. Find $\scriptsize 40$ on the x-axis and draw a vertical line to meet the ogive, all values to the right of that point are marks that are higher than $\scriptsize 40$. We can see that $\scriptsize 16$ learners $\scriptsize (40-24)=16$ scored $\scriptsize 40$ or more marks. So $\scriptsize \displaystyle \frac{{16}}{{40}}\times 100=40\%$ of learners got more than $\scriptsize 40$ on the test.

## Interpreting the ogive

Ogives are useful for determining the quartiles and five-number summary of data. Remember that the median is simply the value in the middle when we order the data. A quartile is simply a quarter of the way from the beginning or the end of an ordered data set. With an ogive we already know how many data values are above or below a certain point, so it is easy to find the middle or a quarter of the data set. Remember that the answers you find for the quartiles will be approximations as you are dealing with grouped data.

### Example 2.2

Use the following ogive to compute the five-number summary:

Solution

Remember that the five-number summary consists of the minimum, all the quartiles including the median, which is the second quartile, and the maximum value.

Step 1: Find the minimum and maximum
The minimum value in the data set is $\scriptsize 10$ as this is where the ogive starts on the horizontal axis. The maximum value in the data set is $\scriptsize 60$ as this is where the ogive stops on the horizontal axis.

Step 2: Find the quartiles
The quartiles are the values that are $\scriptsize \displaystyle \frac{1}{4}$ , $\scriptsize \displaystyle \frac{1}{2}$ and $\scriptsize \displaystyle \frac{3}{4}$ of the way into the ordered data set. Here the cumulative frequency goes up to $\scriptsize 40$, so we can find the quartiles by looking at the values corresponding to counts of $\scriptsize 10\text{ }(\displaystyle \frac{1}{4}\times 40)$, $\scriptsize 20\text{ }(\displaystyle \frac{1}{2}\times 40)$ and $\scriptsize 30\text{ }(\displaystyle \frac{3}{4}\times 40)$. On the ogive, a count of:

• $\scriptsize 10$ corresponds to a value of approximately $\scriptsize 26$ (first quartile)
• $\scriptsize 20$ corresponds to a value of approximately $\scriptsize 36$ (median)
• $\scriptsize 30$ corresponds to a value of approximately $\scriptsize 45$ (third quartile).

Step 3: Write down the five-number summary
$\scriptsize \displaystyle \text{Minimum}=10$
$\scriptsize {{\text{Q}}_{1}}\approx 26$
$\scriptsize {{\text{Q}}_{2}}\approx 36$
$\scriptsize {{\text{Q}}_{3}}\approx 45$
$\scriptsize \displaystyle \text{Maximum}=60$

### Exercise 2.1

1. The following data set lists the ages of $\scriptsize \displaystyle 24$ people.
\scriptsize \displaystyle \begin{align*}&2;\text{ }5;\text{ }1;\text{ }76;\text{ }34;\text{ }23;\text{ }65;\text{ }22;\text{ }63;\text{ }45;\text{ }53;\text{ }38\\&4;\text{ }28;\text{ }5;\text{ }73;\text{ }79;\text{ }17;\text{ }15;\text{ }5;\text{ }34;\text{ }37;\text{ }45;\text{ }56\end{align*}
Use the data to answer the following questions:
1. Using an interval width of $\scriptsize 8$, construct a cumulative frequency plot.
2. Below what value do the bottom $\scriptsize \displaystyle 50\%$ of the ages fall? Give a reason for your answer.
3. Below what value do the bottom $\scriptsize \displaystyle 40\%$ fall?
2. Use the ogive to answer the questions below. Note that marks are given as a percentage.
1. How many learners got between $\scriptsize \displaystyle 50\%$ and $\scriptsize \displaystyle 70\%$?
2. How many learners got at least $\scriptsize \displaystyle 70\%$?
3. Find the median mark for this class, rounded to the nearest integer.

The full solutions are at the end of the unit.

## Summary

In this unit you have learnt the following:

• How to construct the ogive from a grouped frequency distribution.
• How to interpret information from an ogive.

# Unit 2: Assessment

#### Suggested time to complete: 25 minutes

1. The marks (as a percentage) obtained in an NCV Maths level 3 examination are shown in the table below:
 Intervals Frequency Cumulative frequency $\scriptsize \displaystyle ~0 \lt x\le 10$ $\scriptsize \displaystyle 0$ $\scriptsize \displaystyle 0$ $\scriptsize \displaystyle 10 \lt x\le 20$ $\scriptsize \displaystyle 2$ $\scriptsize \displaystyle 20 \lt x\le 30$ $\scriptsize \displaystyle 6$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 30 \lt x\le 40$ $\scriptsize \displaystyle 7$ $\scriptsize \displaystyle 40 \lt x\le 50$ $\scriptsize \displaystyle 14$ $\scriptsize \displaystyle 29$ $\scriptsize \displaystyle 50 \lt x\le 60$ $\scriptsize \displaystyle 20$ $\scriptsize \displaystyle 60 \lt x\le 70$ $\scriptsize \displaystyle 35$ $\scriptsize \displaystyle 84$ $\scriptsize \displaystyle 70 \lt x\le 80$ $\scriptsize \displaystyle 29$ $\scriptsize \displaystyle 80 \lt x\le 90$ $\scriptsize \displaystyle 6$ $\scriptsize \displaystyle 90 \lt x\le 100$ $\scriptsize \displaystyle 1$ $\scriptsize \displaystyle 120$
1. How many total observations are there?
2. Copy and complete the table.
3. Draw the ogive for the data.
4. In which class interval will the median be located?
5. Estimate the median.
2. Use the ogive below to draw a box-and-whisker plot:
3. The ages of 28 people whose birthday coincides with that of one of their children are shown below.
\scriptsize \displaystyle \begin{align*}78;\text{ }53;\text{ }70;\text{ }97;\text{ }37;\text{ }68;\text{ }48;\text{ }35;\text{ }71;\text{ }63;\text{ }47;\text{ }60;\text{ }63;\text{ }58;\\74;\text{ }39;\text{ }67;\text{ }64;\text{ }42;\text{ }52;\text{ }38;\text{ }54;\text{ }60;\text{ }75;\text{ }69;\text{ }77;\text{ }65;\text{ }72\end{align*}
1. Construct the frequency distribution showing cumulative frequency.
2. Draw the ogive for the data.
3. Estimate the median.

The full solutions are at the end of the unit.

# Unit 2: Solutions

### Exercise 2.1

1. Construct a frequency distribution before drawing the cumulative frequency graph.
1. .
 Age intervals Frequency Cumulative frequency $\scriptsize 0 \lt x\le 8$ $\scriptsize 6$ $\scriptsize 6$ $\scriptsize 8 \lt x\le 16$ $\scriptsize 1$ $\scriptsize 7$ $\scriptsize 16 \lt x\le 24$ $\scriptsize 3$ $\scriptsize 10$ $\scriptsize 24 \lt x\le 32$ $\scriptsize 1$ $\scriptsize 11$ $\scriptsize 32 \lt x\le 40$ $\scriptsize 4$ $\scriptsize 15$ $\scriptsize 40 \lt x\le 48$ $\scriptsize 2$ $\scriptsize 17$ $\scriptsize 48 \lt x\le 56$ $\scriptsize 1$ $\scriptsize 18$ $\scriptsize 56 \lt x\le 64$ $\scriptsize 2$ $\scriptsize 20$ $\scriptsize 64 \lt x\le 72$ $\scriptsize 1$ $\scriptsize 21$ $\scriptsize 72 \lt x\le 80$ $\scriptsize 3$ $\scriptsize 24$

2. Below $\scriptsize 34$: as there are $\scriptsize 24$ values the median will be between positions $\scriptsize 11$ and $\scriptsize 12$. Reading off the graph that gives us approximately $\scriptsize 34$.
3. Below $\scriptsize 24$.
2. .
1. $\scriptsize 20$ learners. Read off from the graph and subtract $\scriptsize (35-15=20)$.
2. $\scriptsize 15$ learners. At least $\scriptsize 70$ means $\scriptsize 70$ or more, so subtract the number of learners at $\scriptsize 70$ from the number at $\scriptsize 100$ $\scriptsize (50-35=15)$.
3. The median mark will be approximately $\scriptsize 60\%$.

Back to Exercise 2.1

### Unit 2: Assessment

1. .
1. $\scriptsize 120$ observations in total.
2. .
 Intervals Frequency Cumulative frequency $\scriptsize \displaystyle ~0 \lt x\le 10$ $\scriptsize \displaystyle 0$ $\scriptsize \displaystyle 0$ $\scriptsize \displaystyle 10 \lt x\le 20$ $\scriptsize \displaystyle 2$ $\scriptsize \displaystyle \mathbf{2}$ $\scriptsize \displaystyle 20 \lt x\le 30$ $\scriptsize \displaystyle 6$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 30 \lt x\le 40$ $\scriptsize \displaystyle 7$ $\scriptsize \displaystyle \mathbf{15}$ $\scriptsize \displaystyle 40 \lt x\le 50$ $\scriptsize \displaystyle 14$ $\scriptsize \displaystyle 29$ $\scriptsize \displaystyle 50 \lt x\le 60$ $\scriptsize \displaystyle 20$ $\scriptsize \displaystyle \mathbf{49}$ $\scriptsize \displaystyle 60 \lt x\le 70$ $\scriptsize \displaystyle 35$ $\scriptsize \displaystyle 84$ $\scriptsize \displaystyle 70 \lt x\le 80$ $\scriptsize \displaystyle 29$ $\scriptsize \displaystyle \mathbf{113}$ $\scriptsize \displaystyle 80 \lt x\le 90$ $\scriptsize \displaystyle 6$ $\scriptsize \displaystyle \mathbf{119}$ $\scriptsize \displaystyle 90 \lt x\le 100$ $\scriptsize \displaystyle 1$ $\scriptsize \displaystyle 120$
3. .
4. There are $\scriptsize 120$ data values so the median will lie between position $\scriptsize 60$ and $\scriptsize 61$. Looking at the cumulative frequency we see that the position of the median places it in the interval $\scriptsize \displaystyle 60 \lt x\le 70$.
5. Reading off from the ogive, the median is approximately $\scriptsize 63$.
2. .
\scriptsize \begin{align*}\text{Min}=20\\{{\text{Q}}_{1}}\approx 45\\{{\text{Q}}_{2}}\approx 60\\{{\text{Q}}_{3}}\approx 74\\\text{Max}=100\end{align*}
3. .
1. .
 Age intervals Frequency Cumulative frequency $\scriptsize 34 \lt x\le 42$ $\scriptsize 5$ $\scriptsize 5$ $\scriptsize 42 \lt x\le 50$ $\scriptsize 2$ $\scriptsize 7$ $\scriptsize 50 \lt x\le 58$ $\scriptsize 4$ $\scriptsize 11$ $\scriptsize 58 \lt x\le 66$ $\scriptsize 6$ $\scriptsize 17$ $\scriptsize 66 \lt x\le 74$ $\scriptsize 7$ $\scriptsize 24$ $\scriptsize 74 \lt x\le 82$ $\scriptsize 3$ $\scriptsize 27$ $\scriptsize 82 \lt x\le 90$ $\scriptsize 0$ $\scriptsize 27$ $\scriptsize 90 \lt x\le 98$ $\scriptsize 1$ $\scriptsize 28$
2. .
3. Median age is approximately $\scriptsize 63$.

Back to Unit 2: Assessment