Data handling: Calculate, represent and interpret measures of central tendency and dispersion in univariate numerical grouped data

# Unit 1: Grouping data using frequency distributions

Natashia Bearam-Edmunds

### Unit outcomes

By the end of this unit you will be able to:

• Construct a frequency distribution by grouping data into classes.

## What you should know

Before you start this unit, make sure you can:

• Work with data and represent data effectively. You can revise the following subject outcomes in Maths level 2:

## Introduction

To make sense of data we need to condense it or organise it in a way that makes interpretation easier. We have worked with ungrouped data, which is unorganised or raw data. Now, we will use frequencies to organise and group data. There are different methods to calculate statistics for ungrouped and grouped data. But, before you summarise data, it is important to understand how to organise data using frequency distributions. By counting how often data values appear we can make a frequency distribution.

Frequency is how often something occurs. A frequency distribution, also called a frequency table, is used to organise qualitative and quantitative raw data

Work through Activity 1.1 to learn how to organise data using frequency tables.

### Activity 1.1: Organising data using a frequency distribution

Time required: 25 minutes

What you need:

• a pen and paper

What to do:

Answer the questions based on the given data. Show all the necessary steps.

The following data on the heights (in cm) of netball players has been collected.

\scriptsize \displaystyle \begin{align*} {165\text{ }148\text{ }158\text{ }150\text{ }160\text{ }165\text{ }150\text{ }156\text{ }155\text{ }164\text{ }162\text{ }160\text{ }158\text{ }148\text{ }158} \\ {140\text{ }146\text{ }160\text{ }148\text{ }152\text{ }139\text{ }165\text{ }148\text{ }160\text{ }156\text{ }158\text{ }170\text{ }155\text{ }160\text{ }148} \\ {155\text{ }158\text{ }179\text{ }170\text{ }158\text{ }161\text{ }155\text{ }160\text{ }163\text{ }178\text{ }138\text{ }172\text{ }170\text{ }156\text{ }160} \\ {160\text{ }171\text{ }140\text{ }160\text{ }170\text{ }175\text{ }148\text{ }170\text{ }177\text{ }155\text{ }167\text{ }154\text{ }160\text{ }170\text{ }155} \\ {136\text{ }179\text{ }150\text{ }167\text{ }148\text{ }160\text{ }164\text{ }167\text{ }157\text{ }165\text{ }163\text{ }140\text{ }162\text{ }178\text{ }160}\\{170\text{ }163\text{ }162\text{ }165\text{ }175\text{ }165\text{ }152\text{ }147\text{ }180\text{ }148\text{ }170\text{ }165\text{ }167\text{ }165~\text{ }165} \end{align*}

1. How many netball players were surveyed?
2. Is there a simple way to organise the data?
3. Rewrite the data values from smallest to biggest.
4. How many netball players are between $\scriptsize 146-155\text{ cm}$ in height?
5. Group the heights in intervals of $\scriptsize 9\text{ cm}$ starting with a first interval of $\scriptsize 136-145\text{ cm}$. Draw a table to show the number of netball players that fall within each interval. Label the first column ‘height’ and the second column ‘frequency’.
7. What is the sum of the frequencies equal to?
8. From the table you drew for question 5, how many players were shorter than $\scriptsize 166\text{ cm}$?

What did you find?

1. There were $\scriptsize 90$ players surveyed.
2. Even answering a simple question such as ‘how many netball players were surveyed?’ is a long process and can lead to mistakes using ungrouped data. We must organise the data to make it easier to work with. We can arrange these values by rewriting and grouping heights together using intervals.
3. It is very time consuming to order large sets of raw data and in reality there is computer software that can do this work for us.
\scriptsize \displaystyle \begin{align*} 136;\text{ }138;\text{ }139;\text{ }140;\text{ }140;\text{ }140;\text{ }146;\text{ }147;\text{ }148;\text{ }148;\\148;\text{ }148;\text{ }148;\text{ }148;\text{ }148;\text{ }148;\text{ }150;\text{ }150;\text{ }150;\text{ }152;\\152;\text{ }154;\text{ }155;\text{ }155;\text{ }155;\text{ }155;\text{ }155;\text{ }155;\text{ }156;\text{ }156;\\156;\text{ }157;\text{ }158;\text{ }158;\text{ }158;\text{ }158;\text{ }158;\text{ }158;\text{ }160;\text{ }160;\\160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\text{ }160;\\161;\text{ }162;\text{ }162;\text{ }162;\text{ }163;\text{ }163;\text{ }163;\text{ }164;\text{ }164;\text{ }165;\\165;\text{ }165;\text{ }165;\text{ }165;\text{ }165;\text{ }165;\text{ }165;\text{ }165;\text{ }167;\text{ }167;\\167;\text{ }167;\text{ }170;\text{ }170;\text{ }170;\text{ }170;\text{ }170;\text{ }170;\text{ }170;\text{ }170;\\171;\text{ }172;\text{ }175;\text{ }175;\text{ }177;\text{ }178;\text{ }178;\text{ }179;\text{ }179;\text{ }180\end{align*}
4. There are 22 players between $\scriptsize 146-155\text{ cm}$ in height.
5. .
 Height intervals ( in cm) Frequency $\scriptsize 136-145$ $\scriptsize 6$ $\scriptsize 146-155$ $\scriptsize 22$ $\scriptsize 156-165$ $\scriptsize 40$ $\scriptsize 166-175$ $\scriptsize 16$ $\scriptsize 176-185$ $\scriptsize 6$
6. No the intervals cannot overlap. You cannot have a height that falls into two different intervals as you will be double counting.
7. The total frequency is the same as the total number of observations. $\scriptsize \text{Sum of frequency}=90$.
8. We must add the frequencies for the intervals $\scriptsize 136-145$; $\scriptsize 146-155$ and $\scriptsize 156-165$. There are $\scriptsize 68$ players shorter than $\scriptsize 166\text{ cm}$.

Activity 1.1 showed how to construct a frequency distribution. We also saw that it is much easier to answer questions about data from a frequency table rather than using ungrouped data. As the size of the data set grows it will become even more difficult to handle in its raw form, so we need to organise data before analysing it.

### Take note!

Summarising the data into class intervals leads to a loss of detail, in other words we cannot return to the original data set from the grouped intervals. In Activity 1.1, for example, you cannot tell how many players had a height of exactly $\scriptsize 160\text{ cm}$ from the grouped frequency table.

### Constructing a frequency distribution

Things to note when creating class intervals for a frequency distribution:

• Create equal intervals. First, decide how many class intervals you would like to use then calculate the range of data values. Next, divide the range by the number of classes to get an approximate interval width.
• The first class (lower limit) should start below the smallest observation.
• Classes must not overlap and must cover the entire range of the data set.

In the next example you will learn how to create class intervals.

### Example 1.1

The following data shows the ages of people attending a public reading by an author at a library.

\scriptsize \begin{align*}&10;\text{ }10;\text{ }11;\text{ }12;\text{ }12;\text{ }13;\text{ }13;\text{ }14;\text{ }14;\text{ }14;\text{ }15;\text{ }15;\text{ }15;\text{ }16;\text{ }16;\text{ }16;\text{ }17;\text{ }17;\text{ }17;\text{ }17;\\&18;\text{ }18;\text{ }18;\text{ }18;\text{ }18;\text{ }19;\text{ }19;\text{ }19;\text{ }19;\text{ }19;\text{ }19;\text{ }20;\text{ }20;\text{ }21;\text{ }22;\text{ }22;\text{ }22;\text{ }22;\text{ }22;\text{ }22;\\&23;\text{ }25;\text{ }25;\text{ }25;\text{ }26;\text{ }26;\text{ }27;\text{ }30\end{align*}

Construct a frequency distribution for the data.

Solution

There are $\scriptsize 48$ observations. The minimum age is $\scriptsize 10$ and the maximum age is $\scriptsize 30$ therefore the range of data values is $\scriptsize 20$. If we would like to create $\scriptsize 7$ class intervals, then the equal interval widths will be $\scriptsize 3$ ($\scriptsize 20\div 7=3$).

We start the first class interval below the smallest observation so we can start at $\scriptsize 9$ and go up in intervals of $\scriptsize 3$. You can use a variable to represent age, in this example we have used $\scriptsize x$.

 Age intervals Frequency $\scriptsize 9 \lt x\le 12$ $\scriptsize 5$ $\scriptsize 12 \lt x\le 15$ $\scriptsize 8$ $\scriptsize 15 \lt x\le 18$ $\scriptsize 12$ $\scriptsize 18 \lt x\le 21$ $\scriptsize 9$ $\scriptsize 21 \lt x\le 24$ $\scriptsize 7$ $\scriptsize 24 \lt x\le 27$ $\scriptsize 6$ $\scriptsize 27 \lt x\le 30$ $\scriptsize 1$ Total $\scriptsize 48$

You will notice that we include the upper limit of each interval in the lower class and exclude it from the next class. For example, in the first interval we have included $\scriptsize 12$ $\scriptsize (9 \lt x\le 12)$ as the upper limit and we then exclude it $\scriptsize (12 \lt x\le 15)$ as the lower limit of the next interval.

In the frequency column we count the number of observations that fall into each interval and list it. Make sure that the sum of the frequency column is equal to the total number of observations.

### Exercise 1.1

The following shows a sample of raw data collected on the number of customers entering a printing shop over a period of time.

 $\scriptsize 14$ $\scriptsize 14$ $\scriptsize 15$ $\scriptsize 24$ $\scriptsize 22$ $\scriptsize 26$ $\scriptsize 20$ $\scriptsize 22$ $\scriptsize 14$ $\scriptsize 11$ $\scriptsize 13$ $\scriptsize 24$ $\scriptsize 23$ $\scriptsize 11$ $\scriptsize 12$ $\scriptsize 12$ $\scriptsize 23$ $\scriptsize 16$ $\scriptsize 12$ $\scriptsize 18$ $\scriptsize 20$ $\scriptsize 21$ $\scriptsize 20$ $\scriptsize 15$ $\scriptsize 21$ $\scriptsize 16$ $\scriptsize 12$ $\scriptsize 23$ $\scriptsize 22$ $\scriptsize 14$
1. Construct a frequency table for the data given.
2. For how many days were customers observed?
3. The shop owner is deciding if he should close the shop down. He will only stay open for business if he has $\scriptsize 16$ or more customers on $\scriptsize 50\%$ of days. Based on the frequency table, should he stay open or close down?

The full solutions are at the end of the unit.

### Note

When you have an internet connection watch this video called “Frequency tables and dot plots” to learn more about frequency distributions.

### Cumulative frequencies

Some graphs that represent data from a grouped frequency distribution need cumulative frequencies. Cumulative means to add up or collect values together.

You can think of cumulative frequency as the running total of frequencies in a frequency distribution. This means you are adding a value to all of the values that came before it.

To find the cumulative frequencies, add all the previous frequencies to the frequency for the current row, as shown in the Table 1.1.

Table 1.1 Cumulative frequency of ages

 Age intervals Frequency Cumulative frequency $\scriptsize 9 \lt x\le 12$ $\scriptsize 5$ $\scriptsize 5$ $\scriptsize 12 \lt x\le 15$ $\scriptsize 8$ $\scriptsize 5+8=13$ $\scriptsize 15 \lt x\le 18$ $\scriptsize 12$ $\scriptsize 13+12=25$ $\scriptsize 18 \lt x\le 21$ $\scriptsize 9$ $\scriptsize 25+9=34$ $\scriptsize 21 \lt x\le 24$ $\scriptsize 7$ $\scriptsize 34+7=41$ $\scriptsize 24 \lt x\le 27$ $\scriptsize 6$ $\scriptsize 41+6=47$ $\scriptsize 27 \lt x\le 30$ $\scriptsize 1$ $\scriptsize 47+1=48$ Total $\scriptsize 48$

The last value in the cumulative frequency column will always be equal to the total sum of all observations, since the sum of all frequencies will already have been added to the previous total.

Cumulative frequency graphs are often used to represent and compare data. You will see more of this in unit 2 of this subject outcome.

## Summary

In this unit you have learnt the following:

• How to group data using frequencies.
• How to construct a frequency distribution.
• How to create equal class intervals for a frequency distribution.

# Unit 1: Assessment

#### Suggested time to complete: 20 minutes

1. The maths marks, out of $\scriptsize \displaystyle 50$, for $\scriptsize \displaystyle 35$ learners are given below:
.
\scriptsize \displaystyle \begin{align*}&46;\text{ }40;\text{ }12;\text{ }10;\text{ }47;\text{ }23;\text{ }26;\text{ }8;\text{ }29;\text{ }34;\text{ }37;\text{ }17;\text{ }40;\text{ }50;\text{ }18;\text{ }23;\text{ }33;\text{ }\\&23;\text{ }24;\text{ }15;\text{ }35;\text{ }23;\text{ }19;\text{ }22;\text{ }28;\text{ }35;\text{ }27;\text{ }42;\text{ }29;\text{ }26;\text{ }46;\text{ }33;\text{ }27;\text{ }19;\text{ }28\end{align*}
.
Organise the data by using a frequency table.
2. The frequency table shows the test marks for a statistics course. Find the missing values.
 Test score intervals Frequency Cumulative frequency $\scriptsize 49.5 \lt x\le 59.5$ $\scriptsize 5$ A $\scriptsize 59.5 \lt x\le 69.5$ $\scriptsize 10$ $\scriptsize 15$ $\scriptsize 69.5 \lt x\le 79.5$ B $\scriptsize 45$ $\scriptsize 79.5 \lt x\le 89.5$ $\scriptsize 40$ $\scriptsize 85$ $\scriptsize 89.5 \lt x\le 99.5$ $\scriptsize 15$ C

The full solutions are at the end of the unit.

# Unit 1: Solutions

### Exercise 1.1

1. .
 Intervals for the number of customers Frequency $\scriptsize 10-15$ $\scriptsize 13$ $\scriptsize 16-21$ $\scriptsize 8$ $\scriptsize 22-27$ $\scriptsize 9$
2. $\scriptsize 30\text{ days}$
3. He has $\scriptsize 16$ or more customers on $\scriptsize 17$ days, which is more than $\scriptsize 50\%$ of the total number of days observed so he should not close his shop down.

Back to Exercise 1.1

### Unit 1: Assessment

1. .
 Score intervals Frequency $\scriptsize 0-10$ $\scriptsize 2$ $\scriptsize 11-20$ $\scriptsize 6$ $\scriptsize 21-30$ $\scriptsize 14$ $\scriptsize 31-40$ $\scriptsize 8$ $\scriptsize 41-50$ $\scriptsize 5$
2. .
 Test score intervals Frequency Cumulative frequency $\scriptsize 49.5 \lt x\le 59.5$ $\scriptsize 5$ $\scriptsize \text{A}=5$ $\scriptsize 59.5 \lt x\le 69.5$ $\scriptsize 10$ $\scriptsize 15$ $\scriptsize 69.5 \lt x\le 79.5$ $\scriptsize \text{B}=45-15=30$ $\scriptsize 45$ $\scriptsize 79.5 \lt x\le 89.5$ $\scriptsize 40$ $\scriptsize 85$ $\scriptsize 89.5 \lt x\le 99.5$ $\scriptsize 15$ $\scriptsize \text{C}=100$

Back to Unit 1: Assessment