Data handling: Calculate, represent and interpret measures of central tendency and dispersion in univariate numerical grouped data

# Unit 3: Constructing histograms

Natashia Bearam-Edmunds

### Unit outcomes

By the end of this unit you will be able to:

• Construct a histogram.
• Interpret a histogram.

## What you should know

Before you start this unit, make sure you can:

## Introduction

Histograms are often used to display grouped data. One advantage of a histogram is that it can easily be used for large data sets. In ‘real life’ applications we generally use a histogram when the data set consists of $\scriptsize \displaystyle 100$ values or more.

A histogram consists of rectangles drawn next to each other so that they touch. The horizontal axis shows what the data represents, for example, distance. The vertical axis shows either frequency or relative frequency. In this unit and for your purposes you will only need to plot frequency graphs and not relative frequency. The histogram shows the shape of the data, the centre and the spread of the data.

## Constructing a histogram

If you are not given a frequency distribution you can construct a histogram as shown in the following example.

### Example 3.1

Create a histogram for the number of books bought by $\scriptsize 50$ part-time college learners.

\scriptsize \displaystyle \begin{align*}&1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1;\text{ }1\\&2;\text{ }2;\text{ }2;\text{ }2;\text{ }2;\text{ }2;\text{ }2;\text{ }2;\text{ }2;\text{ }2\\&3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3;\text{ }3\\&4;\text{ }4;\text{ }4;\text{ }4;\text{ }4;\text{ }4\\&5;\text{ }5;\text{ }5;\text{ }5;\text{ }5\\&6;\text{ }6\end{align*}

Solution

Note that ‘number of books’ is discrete data, since books are counted.

To draw the histogram you must decide how many bars or intervals, also called classes, represent the data.

Calculate the number of bars as follows: $\scriptsize \displaystyle \displaystyle \frac{{\text{largest value}-\text{smallest value}}}{{\text{number of bars}}}=\text{width of bar/interval}$.

Before you can calculate the number of bars you must choose a starting point for the first interval to be less than the smallest data value.

With discrete data, all the data values happen to be integers. In this example the smallest value is $\scriptsize 1$, so a convenient starting point will be $\scriptsize 1-0.5=0.5$. The largest value is $\scriptsize 6$ so we add $\scriptsize 0.5$ to $\scriptsize 6$ to get the end point of $\scriptsize 6.5$.

Next, calculate the width of each bar or class interval. If the data are discrete and there are not too many different values, a width that places the data values in the middle of the bar is the most convenient.

Since the data consist of the numbers $\scriptsize \displaystyle 1;\text{ }2;\text{ }3;\text{ }4;\text{ }5;\text{ }6$ and the starting point is $\scriptsize 0.5$, a width of one places the $\scriptsize 1$ in the middle of the interval from $\scriptsize 0.5$to $\scriptsize 1.5$, the $\scriptsize 2$ in the middle of the interval from $\scriptsize 1.5$ to $\scriptsize 2.5$, and so on.

Therefore, $\scriptsize \displaystyle \frac{{6.5-0.5}}{\text{number of bars}}=1$ so the number of bars is $\scriptsize 6$.

 Intervals Frequency $\scriptsize 0.5 \lt x\le 1.5$ $\scriptsize 11$ $\scriptsize 1.5 \lt x\le 2.5$ $\scriptsize 10$ $\scriptsize 2.5 \lt x\le 3.5$ $\scriptsize 16$ $\scriptsize 3.5 \lt x\le 4.5$ $\scriptsize 6$ $\scriptsize 4.5 \lt x\le 5.5$ $\scriptsize 5$ $\scriptsize 5.5 \lt x\le 6.5$ $\scriptsize 2$

The histogram displays the number of books on the x-axis and the frequency on the y-axis.

The height of the bars gives the frequency of each interval and the intervals are listed on the x-axis. Now that you’ve drawn the histogram, you get a better overall picture of the data because you can visualise it. We can also see at a glance some important things that the data reveals.

The tallest bar shows that $\scriptsize 3$ books was the most popular observation, with a frequency of $\scriptsize 16$.

### Take note!

At this maths level, the interval width will be given in exam questions.

Histograms are often used to display continuous data. You will recall that this is data that can be measured. The next example uses continuous data to construct a histogram.

### Example 3.2

The following data represent the heights of $\scriptsize \displaystyle 16$ people, in centimetres.

$\scriptsize \displaystyle 162\text{ };\text{ }168\text{ };\text{ }177\text{ };\text{ }147\text{ };\text{ }189\text{ };\text{ }171\text{ };\text{ }173\text{ };\text{ }168;\text{ }178\text{ };\text{ }184\text{ };\text{ }165\text{ };\text{ }173\text{ };\text{ }179\text{ };\text{ }166\text{ };\text{ }168\text{ };\text{ }165$

Use class intervals that start at $\scriptsize \displaystyle 140\text{ cm}$ and end at $\scriptsize \displaystyle 190\text{ cm}$ to draw a histogram.

Solution

Step 1: Determine intervals

We need intervals of the same length between $\scriptsize \displaystyle 140\text{ cm}$ and $\scriptsize \displaystyle 190\text{ cm}$ so we could go up by $\scriptsize \displaystyle 10$ centimetres each time. Calculate the number of intervals, which corresponds to the number of bars, using:

$\scriptsize \displaystyle \displaystyle \frac{{\text{largest value}-\text{smallest value}}}{{\text{number of bars}}}=\text{width of bar/interval}$

\scriptsize \displaystyle \begin{align*}\displaystyle \frac{{190-140}}{{\text{number of bars}}}&=10\\10\times \text{number of bars}&=50\\\therefore \text{number of bars}&=5\end{align*}

Note: If you know how many bars you’d like to use you can use the method above to calculate the bar width.

We can use round and square brackets to show which values are excluded and included in the intervals. Round brackets are used to show values that are not included and square brackets are used to show included values. Interval $\scriptsize (140;150]$ means all values greater than $\scriptsize 140\text{ cm}$and less than or equal to $\scriptsize 150\text{ cm}$.

 Intervals $\scriptsize (140;150]$ $\scriptsize (150;160]$ $\scriptsize (160;170]$ $\scriptsize (170;180]$ $\scriptsize (180;190]$

Step 2: Include the frequency in each interval

The following frequency table summarises the number of data values in each of the intervals.

 Intervals Frequency $\scriptsize (140;150]$ $\scriptsize 1$ $\scriptsize (150;160]$ $\scriptsize 0$ $\scriptsize (160;170]$ $\scriptsize 7$ $\scriptsize (170;180]$ $\scriptsize 6$ $\scriptsize (180;190]$ $\scriptsize 2$

Step 3: Draw the histogram

The interval $\scriptsize (150;160]$ has no entries so we do not draw a bar in that interval.

### Exercise 3.1

1. Below is the cumulative frequency table of the ages of people attending a public reading by an author at a library.
 Class intervals Frequency Cumulative frequency $\scriptsize 9 \lt x\le 12$ $\scriptsize \displaystyle 5$ $\scriptsize \displaystyle 5$ $\scriptsize 12 \lt x\le 15$ $\scriptsize \displaystyle 8$ $\scriptsize \displaystyle 13$ $\scriptsize 15 \lt x\le 18$ $\scriptsize \displaystyle 13$ $\scriptsize \displaystyle 26$ $\scriptsize 18 \lt x\le 21$ $\scriptsize \displaystyle 15$ $\scriptsize \displaystyle 41$ $\scriptsize 21 \lt x\le 24$ $\scriptsize \displaystyle 24$ $\scriptsize \displaystyle 65$ $\scriptsize 24 \lt x\le 27$ $\scriptsize \displaystyle 17$ $\scriptsize \displaystyle 82$ $\scriptsize 27 \lt x\le 30$ $\scriptsize \displaystyle 18$ $\scriptsize \displaystyle 100$

Use the table to construct:

1. a histogram.
2. an ogive.
2. The following data represent the number of employees at various restaurants. Use this data to create a histogram.
$\scriptsize \displaystyle 22;\text{ }35;\text{ }15;\text{ }26;\text{ }40;\text{ }28;\text{ }18;\text{ }20;\text{ }25;\text{ }34;\text{ }39;\text{ }42;\text{ }24;\text{ }22;\text{ }19;\text{ }27;\text{ }22;\text{ }34;\text{ }40;\text{ }20;\text{ }38;\text{ }28$
.
Use $\scriptsize 10 \lt x\le 19$ as the first interval.

The full solutions are at the end of the unit.

## Interpreting histograms

The shape of a histogram can tell you a lot about the distribution of the data and gives information about the mean, median and mode of the data set. The most common shapes of histograms are symmetric and skewed.

You will recall that the mean is the measure of central tendency that is highly influenced by extreme values. So, by looking at the position of the mean in relation to the median you can tell if a graph is skewed or symmetric.

A symmetric histogram is more or less identical on both sides of the mean.

For symmetric distributions, the mean is approximately equal to the median and the left and right tails are equally balanced, meaning that they have about the same length.

In a skewed histogram the data seems to extend more to one side than the other. We say the tail is longer on the one side than the tail on the other side. There are two types of skewedness:

1. Skewed right (positive)
A histogram skewed to the right has a longer tail on the right side.

For positively skewed data the mean $\scriptsize \gt$median. If there are extreme values towards the positive end of a distribution, these will increase the value of the mean.
2. Skewed left (negative)
A histogram skewed to the left has a longer tail on the left side.

For negatively skewed data, you will notice that mean $\scriptsize \lt$median. This is due to the presence of extreme lower values to the left (or negative) side of the distribution that will decrease the value of the mean.

### Note

Watch this video called “Classifying shapes of distributions” for a summary of types of histograms.

### Example 3.3

Which box-and-whisker plot represents the shape of the histogram below?

Solution

The histogram has a longer tail on the right so it is positively skewed, which shows that the mean is greater than the median.

Box plot A: The median is in the middle of the box, and the whiskers are about the same length on either side of the box so the distribution is symmetric or normal.

Box plot B: The median is pulled towards the upper quartile, and the whisker is shorter on the upper end of the box so the distribution is negatively skewed (skewed left).

Box plot C: The median is pulled towards the lower quartile, and the whisker is shorter on the lower end of the box so the distribution is positively skewed (skewed right).

Therefore, box plot C is a possible representation of the histogram in this example.

### Exercise 3.2

1. State whether the following data sets are symmetric, skewed right or skewed left and comment on the position of the mean in relation to the median.
1. A data set with this box-and whisker plot:
2. A data set with this histogram:
2. If the mean $\scriptsize \gt$median$\scriptsize \gt$mode will the graph skew to the right or left?
3. The heights of learners in a college are recorded and are shown in the histogram below:
1. How many heights were recorded?
2. What was the most common interval of recorded heights?
3. Comment on the shape of the distribution. What is the cause of this?

The full solutions are at the end of the unit.

## Summary

In this unit you have learnt the following:

• How to construct a histogram from discrete data.
• How to construct a histogram from a frequency distribution.
• How to interpret the shape of a histogram.

# Unit 3: Assessment

#### Suggested time to complete: 25 minutes

1. In a traffic survey a random sample of $\scriptsize \displaystyle 50$ motorists were asked what distance they drove to work daily. The results of the survey are shown in kilometres in the table below. Draw a histogram to represent the data.
 Distance $\scriptsize 0 \lt d\le 10$ $\scriptsize 10 \lt d\le 20$ $\scriptsize 20 \lt d\le 30$ $\scriptsize 30 \lt d\le 40$ $\scriptsize 40 \lt d\le 50$ f $\scriptsize 9$ $\scriptsize 17$ $\scriptsize 15$ $\scriptsize 5$ $\scriptsize 4$
2. The histogram below shows the number of hours learners in a class spent playing video games over a weekend.
1. How many learners were in the survey?
2. What was the maximum number of hours spent playing video games?
3. How many learners spent $\scriptsize 10$ or less hours playing video games?
4. How many learners spent more than $\scriptsize 15$ hours playing video games?
5. How many hours did most learners spend playing video games?
6. Comment on the shape of the histogram.

The full solutions are at the end of the unit.

# Unit 3: Solutions

### Exercise 3.1

1. .
1. The histogram
2. The ogive
2. Draw a frequency table first to make it easier to draw the histogram.
 Intervals Frequency $\scriptsize 10 \lt x\le 19$ $\scriptsize 3$ $\scriptsize 19 \lt x\le 28$ $\scriptsize 11$ $\scriptsize 28 \lt x\le 37$ $\scriptsize 3$ $\scriptsize 37 \lt x\le 46$ $\scriptsize 5$

Back to Exercise 3.1

### Exercise 3.2

1. .
1. Symmetric box-and-whisker plot with mean approximately equal to the median.
2. Negatively skewed histogram with mean less than the median.
2. Positively skewed data (skewed to the right) since the mean $\scriptsize \gt$median$\scriptsize \gt$ mode.
3. .
1. Adding the height of the bars you get $\scriptsize \displaystyle 160$ observations.
2. Most heights were in the $\scriptsize \displaystyle 1.5\text{ m}$ to $\scriptsize \displaystyle 1.7\text{ m}$ interval as this is shown as the tallest bar.
3. The shape of the histogram indicates that the data are skewed to the right. This is due to a few very tall people with a height of over $\scriptsize 2.1\text{ m}$.

Back to Exercise 3.2

### Unit 3: Assessment

1. .
2. .
1. By adding the height of the bars together you will calculate that $\scriptsize 25$ learners made up the survey.
2. $\scriptsize 25$ hours
3. $\scriptsize 5$ learners
4. $\scriptsize 16$ learners
5. Most learners spent $\scriptsize 20$ to $\scriptsize 25$ hours playing video games.
6. The histogram is negatively skewed (it has a longer tail on the left).

Back to Unit 3: Assessment