Data handling: Calculate, represent and interpret measures of central tendency and dispersion in univariate numerical ungrouped data

# Unit 2: The five-number summary and box-and-whisker diagram

Natashia Bearam-Edmunds ### Unit outcomes

By the end of this unit you will be able to:

• Combine measures of central tendency and dispersion to work out the five-number summary.
• Construct the box-and-whisker plot.
• Interpret the box-and-whisker plot.

## What you should know

Before you start this unit, make sure you can:

## Introduction

To have a good understanding of data, and for a better overall picture of what it’s telling us, we must combine the measures of central tendency with the measures of dispersion. The five-number summary combines a measure of central tendency (the median) with measures of dispersion, the range and the inter-quartile range. A box-and-whisker-plot or box plot is a visual display of the five-number summary that helps us see the trends in data.

Box plots show how the quartiles divide the data into sections where each section contains approximately $\scriptsize \displaystyle 25\%$ of the data in that set. The five-number summary is listed in the following order: minimum, first quartile, median, third quartile and maximum.

Minimum value: The lowest value in a data set.

Lower quartile: $\scriptsize \displaystyle (~{{\text{Q}}_{1}})$: The median of the lower half of an ordered data set.

Median: $\scriptsize \displaystyle (~{{\text{Q}}_{2}})$: The median is the middle value of an ordered data set.

Upper quartile: $\scriptsize \displaystyle (~{{\text{Q}}_{3}})$: The median of the upper half of an ordered data set.

Maximum value: The highest value in a data set.

Work through Activity 2.1 to get a better understanding of how to represent the five-number summary using a box-and-whisker diagram. ### Activity 2.1: Constructing a box plot

Time required: 15 minutes

What you need:

• a pen and paper

What to do:

The following data set shows the maths marks of eight learners:

$\scriptsize \displaystyle 62;\text{ }56;\text{ }71;\text{ }78;\text{ }89;\text{ }92;\text{ }86;\text{ }74$

1. Rewrite the data set in ascending order.
2. Find the maximum and minimum marks.
3. Calculate the median mark.
4. Calculate the lower and upper quartiles.
5. Write down the values for the five-number summary.
6. Use a number line to show the information you have calculated. Indicate where the answers to question 5 are located. Draw a box around the IQR and join it to the minimum and maximum values.
7. Indicate if there are any outliers in this data set.

What did you find?

1. The values arranged from lowest to highest: $\scriptsize \displaystyle 56;\text{ }62;\text{ }71;\text{ }74;\text{ }78;\text{ }86;\text{ }89;\text{ }92$
2. The maximum mark is $\scriptsize \displaystyle 92$ and the minimum mark is $\scriptsize \displaystyle 56$.
3. The median, also called the second quartile, will lie between the $\scriptsize \displaystyle 4\text{th}$ and $\scriptsize \displaystyle 5\text{th}$ values. $\scriptsize \text{M}=\displaystyle \frac{{74+78}}{2}=76$.
4. .
\scriptsize \begin{align*}{{\text{Q}}_{1}}&=\displaystyle \frac{{62+71}}{2}\\&=66.5\\{{\text{Q}}_{3}}&=\displaystyle \frac{{86+89}}{2}\\&=87.5\end{align*}
5. Minimum $\scriptsize \displaystyle =56$
First quartile $\scriptsize \displaystyle =66.5$
Median $\scriptsize \displaystyle =76$
Third quartile $\scriptsize \displaystyle =87.5$
Maximum $\scriptsize \displaystyle =92$
6. Mark off the points above the number line and draw a box around the IQR, join it to the minimum and maximum values to get the box-and-whisker plot from the number line. The box shows the distance between $\scriptsize {{\text{Q}}_{1}}$ and $\scriptsize {{\text{Q}}_{3}}$ (IQR). A line inside the box shows the median. The lines extending outside the box (the whiskers) show where the minimum and maximum values are located. Note: this graph can be drawn vertically too.
7. To check for outliers use the $\scriptsize \text{1}\text{.5}\times \text{IQR}$ rule.
\scriptsize \begin{align*}\text{IQR}&=87.5-66.5\\&=21\\1.5\times \text{IQR}&=1.5(21)\\&=31.5\\{{\text{Q}}_{1}}-31.5&=35\\{{\text{Q}}_{3}}+31.5&=119\end{align*}
There are no outliers in this data set. ### Take note!

To identify outliers, upper and lower fences are used to set limits of data values. The formula for the upper fence is $\scriptsize {{\text{Q}}_{3}}+1.5\times \text{IQR}$ and the formula for the lower fence is $\scriptsize {{\text{Q}}_{1}}-1.5\times \text{IQR}$ where IQR is the interquartile range. The lower fence is the lower limit and the upper fence is the upper limit. Any value outside of these fences is considered an outlier.

The diagram below shows the main parts of a box and whisker plot. ### Take note!

It is important to start a box plot with a scaled number line. Otherwise the box plot may not be useful. ### Example 2.1

Douglas works as a telesales person. He keeps a record of the number of sales he makes each month. The data below show how much he sells each month.

$\scriptsize \displaystyle \{49;\text{ 70};\text{ }22;\text{ }35;\text{ 68};\text{ }45;\text{ }60;\text{ }48;\text{ }19;\text{ 120};\text{ }43;\text{ }12\}$

1. Calculate his median number of sales.
2. Give the five-number summary.
3. Draw a box-and-whisker plot of the sales. Find the upper and lower fences and show any outliers on your box-and-whisker plot.

Solution

1. Order the data: $\scriptsize \displaystyle \{12;\text{ }19;\text{ }22;\text{ }35;\text{ }43;\text{ }45;\text{ }48;\text{ }49;\text{ }60;\text{ 68};\text{ 70};\text{ 120}\}\text{ }$
\scriptsize \begin{align*}\text{M}&=\displaystyle \frac{{45+48}}{2}\\&=46.5\end{align*}
2. .
\scriptsize \begin{align*}\text{Min}&=12\\{{\text{Q}}_{1}}&=28.5\\{{\text{Q}}_{2}}(\text{median})&=46.5\\{{\text{Q}}_{3}}&=64\\\text{Max}&=120\end{align*}
3. .
\scriptsize \begin{align*}\text{IQR}&=64-28.5\\&=35.5\\1.5\times \text{IQR}&=1.5(35.5)\\&=53.25\\\text{Lower fence }= {{\text{Q}}_{1}}-53.25&=-24.75\\\text{Upper fence }= {{\text{Q}}_{3}}+53.25&=117.25\end{align*}
$\scriptsize 120$ is an outlier. If a data value is very far away from the quartiles, instead of being shown using the whiskers of the box-and-whisker plot, outliers are shown as separately plotted points. The whiskers are then drawn to the minimum and maximum values that are not outliers.

### Note

When you have access to the internet watch this video, which explains how to draw a box-and-whisker plot for a data set. ### Example 2.2

Determine the five-number summary and inter-quartile range from the box-and-whisker plot below. Solution

We can read off the values for the five-number summary from the box plot.

\scriptsize \begin{align*}\text{Min}=25\\{{\text{Q}}_{1}}=30\\{{\text{Q}}_{2}}(\text{median})=35\\{{\text{Q}}_{3}}=45\\\text{Max}=60\end{align*}

The upper quartile minus the lower quartile gives the inter-quartile range.

\scriptsize \begin{align*}\text{IQR}&=45-30\\&=15\end{align*} ### Exercise 2.1

1. The stem-and-leaf diagram below shows the pulse rate per minute of ten learners.
 $\scriptsize \displaystyle 7$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 1$  $\scriptsize \displaystyle 3$  $\scriptsize \displaystyle 5$  $\scriptsize \displaystyle 5$ $\scriptsize \displaystyle 9$ $\scriptsize \displaystyle 0$  $\scriptsize \displaystyle 1$  $\scriptsize \displaystyle 1$ $\scriptsize \displaystyle 10$ $\scriptsize \displaystyle 3$  $\scriptsize \displaystyle 5$
1. Determine the mean and the range of the data.
2. Give the five-number summary and create a box plot for the data.
2. The following is a list of data: $\scriptsize \displaystyle 3;\text{ }8;\text{ }8;\text{ }5;\text{ }9;\text{ }1;\text{ }4;\text{ }x$
In each case, determine the value of $\scriptsize x$ if the:
1. range $\scriptsize \displaystyle =16$
2. mode $\scriptsize =8$
3. median$\scriptsize =6$
4. mean$\scriptsize \displaystyle =6$
5. box plot looks as follows: 3. Given the following data set:
$\scriptsize \displaystyle 1.25\text{ };\text{ }1.5\text{ };\text{ }2.5\text{ };\text{ }2.5\text{ };\text{ }3.1\text{ };\text{ }3.2\text{ };\text{ }4.1\text{ };\text{ }4.25\text{ };\text{ }4.75\text{ };\text{ }4.8\text{ };\text{ }4.95\text{ };\text{ }5.1$
1. Draw the box plot.
2. Find the upper and lower fences of the data.

The full solutions are at the end of the unit. ### Activity 2.2: Interpreting the box-and whisker plot

Time required: 20 minutes

What you need:

• a pen and paper

What to do:

In Activity 2.1 you saw that the marks of eight learners gave the following box plot:

1. . What percentage of learners scored below $\scriptsize \displaystyle 66.5$?
2. How many learners scored below $\scriptsize \displaystyle 66.5$?
3. Between what two marks does the middle $\scriptsize \displaystyle 50\%$of the data lie?
4. What mark did $\scriptsize \displaystyle 50\%$ of learners score less than?
5. Complete this sentence: Twenty-five percent of learners got a mark greater than________.
6. What percentage of learners scored below $\scriptsize \displaystyle 87.5$?
7. Comment on the symmetry of the data values.
8. Comment on the variability of the data.

What did you find?

1. Twenty-five percent of scores fall below the lower quartile value.
2. $\scriptsize 0.25\times 8=2$ So $\scriptsize 2$ learners got less than $\scriptsize \displaystyle 66.5$.
3. The IQR shows the middle $\scriptsize \displaystyle 50\%$ of scores, therefore $\scriptsize \displaystyle 50\%$ of the data lies between $\scriptsize \displaystyle 66.5$ and $\scriptsize \displaystyle 87.5$.
4. Half the scores are greater than or equal to the median and half are less than the median. $\scriptsize \displaystyle 50\%$ of learners got less than $\scriptsize \displaystyle 76$.
5. Twenty-five percent of learners got a mark greater than $\scriptsize 87.5$.
6. Seventy-five percent of learners scored less than $\scriptsize \displaystyle 87.5$.
7. The data appears to be more or less symmetric as the median lies close to the middle of the box.
8. Consider the range and IQR. The range is easily influenced by extreme values and outliers so it is less reliable than the IQR, which takes into account only the middle $\scriptsize \displaystyle 50\%$ of the data.
.
\scriptsize \begin{align*}\text{IQR}&=87.5-66.5\\&=21\\\text{Range}&=92-56\\&=36\end{align*}
.
Low variability is ideal because it means that you can make better predictions about the population based on sample data. High variability means that the values are less consistent, so it’s more difficult to make predictions.
We can say that the marks are moderately variable.

When data are skewed, the majority of the data values are located either on the high or low ends. The shape of a box-and-whisker plot will show if a data set is normally distributed or skewed.

When the median is in the middle of the box, and the whiskers are about the same length on either side of the box, then the distribution is symmetric or normal.

When the median is pulled toward the upper quartile, and the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left).

When the median is pulled toward the lower quartile, and the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right).

### Note

For more on interpreting box plots watch this video called “Interpreting box plots”.

## Summary

In this unit you have learnt the following:

• How to define the five-number summary.
• How to represent the five-number summary using a box-and-whisker plot.
• How to interpret a box plot.
• How the median is affected by negatively and positively skewed data.

# Unit 2: Assessment

#### Suggested time to complete: 30 minutes

1. Two mathematics classes, A and B, are in competition to see which class performed best in the June examination. The marks of the learners in class A are given below and the box-and-whisker plot for class B illustrates the results of class B. Both classes have 25 learners. Marks are given as percentages.
Marks of class A:
$\scriptsize \displaystyle 9;\text{ }14;\text{ }14;\text{ }19;\text{ }21;\text{ }23;\text{ }33;\text{ }35;\text{ }37;\text{ }37;\text{ }42;\text{ }45;\text{ }55;\text{ }56;\text{ }57;\text{ }59;\text{ }68;\text{ }75;\text{ }75;\text{ }75;\text{ }77;\text{ }78;\text{ }80;\text{ }81;\text{ }92$
The box and whisker diagram for the learners in class B is: 1. Write down the five-number summary for class A.
2. Are there any outliers in the data for class A? Explain.
3. Draw the box-and-whisker diagram (box plot) for class A. Show all relevant values.
4. Determine which class did better in the June Examination and give reasons for your conclusion.
2. The ages of 28 people whose birthday coincides with that of one of their children are shown below.
\scriptsize \displaystyle \begin{align*}&78;\text{ }53;\text{ }70;\text{ }97;\text{ }37;\text{ }68;\text{ }48;\text{ }35;\text{ }71;\text{ }63;\text{ }47;\text{ }60;\text{ }63;\text{ }58;\\&74;\text{ }39;\text{ }67;\text{ }64;\text{ }42;\text{ }52;\text{ }38;\text{ }54;\text{ }60;\text{ }75;\text{ }69;\text{ }77;\text{ }65;\text{ }72\end{align*}
1. Determine the median age of the group by first determining its position.
2. Determine the upper and lower quartiles of the data by first determining their positions.
3. Determine the upper and lower fence values.
4. Construct a box-and-whisker diagram for the above information showing any outliers.
3. Test scores for a college statistics class held during the day are:
\scriptsize \displaystyle \begin{align*}99;\text{ }56;\text{ }78;\text{ }55.5;\text{ }32;\text{ }90;\text{ }80;\text{ }81;\text{ }56;\text{ }59;\text{ }45;\text{ }77;\text{ }84.5;\text{ }84;\text{ }70;\text{ }72;\text{ }68;\text{ }32;\text{ }79;\text{ }90\end{align*}
Test scores for a college statistics class held during the evening are:
\scriptsize \displaystyle \begin{align*}98;\text{ }78;\text{ }68;\text{ }83;\text{ }81;\text{ }89;\text{ }88;\text{ }76;\text{ }65;\text{ }45;\text{ }98;\text{ }90;\text{ }80;\text{ }84.5;\text{ }85;\text{ }79;\text{ }78;\text{ }98;\text{ }90;\text{ }79;\text{ }81;\text{ }25.5\end{align*}
1. For each data set, what percentage of the data is between:
• the smallest value and the first quartile?
• the first quartile and the median?
• the median and the third quartile?
• the third quartile and the largest value?
• the first quartile and the largest value?
2. Create a box plot for each set of data. Use one number line for both box plots.
3. Which box plot has the widest spread for the middle $\scriptsize \displaystyle 50\%$ of the data? What does this mean for that set of data in comparison to the other set of data?

The full solutions are at the end of the unit.

# Unit 2: Solutions

### Exercise 2.1

1. .
1. .
\scriptsize \begin{align*}\bar{x}&=\displaystyle \frac{{78+81+83+2(85)+90+(2)91+103+105}}{{10}}\\&=\displaystyle \frac{{892}}{{10}}\\&=89.2\end{align*}
\scriptsize \begin{align*}\text{Range}&=105-78\\&=27\end{align*}
2. Five number summary:
\scriptsize \begin{align*}\{78;\text{ }&81;\text{ }83;\text{ }85;\text{ }85;\text{ }90;\text{ }91;\text{ }91;\text{ }103;\text{ }105\}\\\text{Min}&=78\\{{\text{Q}}_{1}}&=83\\\text{Median}&=87.5\\{{\text{Q}}_{3}}&=91\\\text{Max}&=105\end{align*}
.
There is an outlier in this data set:
\scriptsize \begin{align*}\text{IQR}&=91-83\\&=8\\\text{Upper fence}&={{\text{Q}}_{3}}+1.5(\text{IQR})\\&=91+1.5(8)\\&=103\end{align*}
$\scriptsize 105$ is an outlier. 2. .
1. .
\scriptsize \begin{align*}x-1&=16\\x&=17\end{align*}
2. $\scriptsize x\in \mathbb{Z},\text{ }x\ne 1;3;4;5;9$
$\scriptsize x$ can be any integer value that will not change the mode of $\scriptsize 8$ so we must exclude all other values in the data set.
3. The median will be between positions $\scriptsize 4$ and $\scriptsize 5$
\scriptsize \begin{align*}\displaystyle \frac{{5+x}}{2}&=\text{M}\\x&=6\times 2-5\\&=7\end{align*}
4. .
\scriptsize \displaystyle \begin{align*}\text{Mean}&=\displaystyle \frac{{1+3+4+5+8+8+9+x}}{8}\\&=6\\\displaystyle \frac{{x+38}}{8}&=6\\x&=10\end{align*}
5. .
\scriptsize \begin{align*}\text{Median}&=4.5\\\displaystyle \frac{{x+5}}{2}&=4.5\\x&=4\end{align*}
3. .
1. .
\scriptsize \displaystyle \begin{align*}\text{Min}&=1.25\\{{\text{Q}}_{1}}&=2.5\\{{\text{Q}}_{2}}&=3.65\\{{\text{Q}}_{3}}&=4.78\\\text{Max}&=5.1\end{align*} 2. .
\scriptsize \begin{align*}\text{IQR}&=4.78-2.5\\&=2.28\\\text{Lower fence}&={{\text{Q}}_{1}}-1.5(2.28)\\&=2.5-1.5(2.28)\\&=-0.92\\\text{Upper fence}&={{\text{Q}}_{3}}+1.5(2.28)\\&=4.78+1.5(2.28)\\&=8.2\end{align*}

Back to Exercise 2.1

### Unit 2: Assessment

1. .
1. $\scriptsize \displaystyle \text{min}=9;\text{ first quartile}=28;\text{ median}=55;\text{ third quartile}=75;\text{ max}=92$
2. All values fall within the lower and upper fences so there are no outliers.
3. . 4. Compare the box-and-whisker plots for the classes. Class B did marginally better than class A. Its median is $\scriptsize \displaystyle 60$ while class A’s is $\scriptsize \displaystyle 55$. Furthermore, the first quartile for class B is higher than that of class A.
2. .
1. First sort the data then calculate the position of the median age.
\scriptsize \displaystyle \begin{align*}&35;\text{ }37;\text{ }38;\text{ }39;\text{ }42;\text{ }47;\text{ }48;\text{ }52;\text{ }53;\text{ }54;\text{ }58;\text{ }60;\text{ }60;\text{ }63\\&63;\text{ }64;\text{ }65;\text{ }67;\text{ }68;\text{ }69;\text{ }70;\text{ }71;\text{ }72;\text{ }74;\text{ }75;\text{ }77;\text{ }78;\text{ }97\end{align*}
The median will be between positions $\scriptsize 14$ and $\scriptsize 15$.
$\scriptsize \text{Median age }=63$
2. The lower quartile lies between positions $\scriptsize 7$ and $\scriptsize 8$.
\scriptsize \begin{align*}{{\text{Q}}_{1}}&=\displaystyle \frac{{48+52}}{2}\\&=50\end{align*}
.
The upper quartile lies between positions $\scriptsize 21$ and $\scriptsize 22$.
\scriptsize \begin{align*}{{\text{Q}}_{3}}&=\displaystyle \frac{{70+71}}{2}\\&=70.5\end{align*}
3. .
\scriptsize \begin{align*}\text{IQR}&=70.5-50\\&=20.5\\\text{Upper fence}&={{\text{Q}}_{3}}+1.5(\text{IQR})\\&=70.5+1.5(20.5)\\&=101.25\\\text{Lower fence}&={{\text{Q}}_{1}}-1.5(\text{IQR})\\&=50-1.5(20.5)\\&=19.25\end{align*}
4. . There are no outliers.
3. .
1. Order the data for each group.
Day class:
$\scriptsize \displaystyle 32;\text{ }32;\text{ }45;\text{ }55.5;\text{ }56;\text{ }56;\text{ }59;\text{ }68;\text{ }70;\text{ }72;\text{ }77;\text{ }78;\text{ }79;\text{ }80;\text{ }81;\text{ }84;\text{ }84.5;\text{ }90;\text{ }90;\text{ }99$
.
There are twenty data values for the day class and the five-number-summary is:
\scriptsize \displaystyle \begin{align*}\text{Min}&=32\\{{\text{Q}}_{\text{1}}}&=56\\{{\text{Q}}_{2}}&=74.5\\{{\text{Q}}_{\text{3}}}&=82.5\\\text{Max}&=99\end{align*}
• There are six data values ranging from the minimum to the first quartile so $\scriptsize 30\%$ of data values lie between $\scriptsize 32$ and $\scriptsize 56$.
• There are six data values ranging from $\scriptsize 56$ to $\scriptsize 74.5$ so $\scriptsize 30\%$ of the data values lie between the first quartile and the median.
• There are five data values ranging from $\scriptsize 74.5$ to $\scriptsize 82.5$ so $\scriptsize 25\%$ of data values lie between the median and the third quartile.
• There are five data values ranging from $\scriptsize 82.5$ to $\scriptsize 99$, which is $\scriptsize 25\%$ of the data.
• There are $\scriptsize 15$ data values between the first quartile, $\scriptsize 56$, and the largest value, $\scriptsize 99$, this is $\scriptsize 75\%$ of the data.

Night class:
$\scriptsize \displaystyle 25.5;\text{ }45;\text{ }65;\text{ }68;\text{ }76;\text{ }78;\text{ }78;\text{ }79;\text{ }79;\text{ }80;\text{ }81;\text{ }81;\text{ }83;\text{ }84.5;\text{ }85;\text{ }88;\text{ }89;\text{ }90;\text{ }90;\text{ }98;\text{ }98;\text{ }98$
.
There are twenty-two data values for the evening class and the five-number-summary is:
\scriptsize \displaystyle \begin{align*}\text{Min}&=25.5\\{{\text{Q}}_{\text{1}}}&=78\\{{\text{Q}}_{2}}&=81\\{{\text{Q}}_{\text{3}}}&=89\\\text{Max}&=98\end{align*}

• There are six data values ranging from the minimum to the first quartile so approximately $\scriptsize 27\%$ of data values.
• There are five data values ranging from the first quartile to the median so approximately $\scriptsize 23\%$ of data values.
• There are six data values ranging from the median to the third quartile so approximately $\scriptsize 27\%$ of data values.
• There are five data values ranging from the third quartile to the maximum so approximately $\scriptsize 23\%$ of data values.
• There are sixteen data values ranging from the first quartile to the maximum so approximately $\scriptsize 73\%$ of data values.
2. . 3. The day class has the wider spread for the middle $\scriptsize 50\%$ of the data. The IQR for the day class data is greater than the IQR for the evening class. This means that there is more variability in the day class data.

Back to Unit 2: Assessment 