Data handling: Calculate, represent and interpret measures of central tendency and dispersion in univariate numerical ungrouped data

# Unit 1: Revise measures of central tendency and dispersion for ungrouped data

Natashia Bearam-Edmunds ### Unit outcomes

By the end of this unit you will be able to:

• Find the mean for ungrouped data.
• Find the median for ungrouped data.
• Find the mode for ungrouped data.
• Find the range and interquartile range.

## What you should know

Before you start this unit, make sure you can:

## Introduction

You will come across many formulas to calculate . The purpose of statistics is not to perform numerous calculations using formulas, but to gain an understanding of your data. The calculations are often done using a calculator or a computer but basic calculations can be done by hand too. The understanding must come from you. If you can thoroughly understand the basics of statistics, you can be more confident in the decisions you make in life.

In this unit, we revise how to calculate statistics that give us information about the central values and spread of data. Figure 1: Statistics are all around us from many different sources, such as the weather forecast

Let’s revise the terms that you should know by doing Activity 1.1. ### Activity 1.1: Define the key terms used in data handling

Time required: 10 minutes

What you need:

• a pen and paper

What to do:

We want to study the average amount of money first year learners spend at ABC College on supplies that do not include books. We randomly surveyed 100 first year learners at the college. Three of those learners spent $\scriptsize \text{R }1000,\text{ R }850$ and $\scriptsize \text{R }500$.

1. What is the population in the study?
2. What is the sample in the study?
3. State the parameter.
4. State the statistic.
5. What are the possible variables being studied?
6. Write another word for ‘average’.
7. What are the data values?

What did you find?

1. A population is a collection of persons, things or objects under study. The population in this study is all first year learners attending ABC College.
2. To study the population, we select a sample. The sample could be all enrolled in a beginning statistics course at ABC College (although this sample may not represent the entire population). In this study $\scriptsize 100$ learners make up the sample. The bigger the sample size is the more closely it will represent the population.
3. A parameter is a number used to represent a population characteristic and cannot be easily found hence it is estimated by a statistic. The parameter in this study is the average amount of money spent (excluding books) by all first year learners at ABC College.
4. The statistic is the average amount of money spent (excluding books) by first year college learners in the sample.
5. The variable could be the amount of money spent (excluding books) by one first year learner. For example, let $\scriptsize \displaystyle X=$ the amount of money spent (excluding books) by one first year learner at the college.
6. The ‘average’ is also called the ‘mean’; a number that describes the central tendency of data.
7. Data are the actual values of the variable. The data are the rand amounts spent by the first year learners, or example $\scriptsize \text{R }1000,\text{ R }850$ and $\scriptsize \text{R }500$.

Data are either qualitative or quantitative. Qualitative data describe features. Car colour, race and blood type are examples of qualitative data.

Quantitative data are numbers and the result of counting or measuring. Weight, pulse rate and number of people are examples of quantitative data. Quantitative data is further divided into:

• discrete data – this data takes on only certain numerical values and can be counted, for example the number of days.
• continuous data – this data can be measured and include fractions, decimals or irrational numbers, for example time.

Data in its raw form is called ungrouped data. Once the data has been organised using frequencies we call this grouped data. We use different methods to calculate measures of central tendency for ungrouped and grouped data

## Measures of central tendency for ungrouped data

We can summarise data by calculating measures of central tendency and dispersion. In this unit we will focus on the measures of central tendency, which are the mean, median and mode.

### The mean

The mean (average) and the median are the two measures of central tendency that are used most widely. Of the three measures of central tendency, the mean is most heavily influenced by any outliers or skewness in data. When there are outliers in the data, the median is often the preferred measure of central tendency because the median is more resistant to outliers than the mean.

To calculate the mean we add all the data values and divide by the total number of values. For example, to calculate your average mark for three maths tests you would add the three marks together and divide by three. We can write this as a formula as shown below.

Mean for ungrouped data:

$\scriptsize \bar{x}=\displaystyle \frac{{\sum\limits_{{i=1}}^{n}{{{{x}_{i}}}}}}{n}$

$\scriptsize \displaystyle \bar{x}$ called ‘$\scriptsize \displaystyle ~x$bar’ indicates the mean

$\scriptsize n$ is the total number of data values

$\scriptsize \Sigma$ is the sum of the data

$\scriptsize \displaystyle {{x}_{i}}$ are the data values

Let’s look at an example to better understand these measures of central tendency. ### Example 1.1

Exam scores for $\scriptsize 11$ learners are as follows:

$\scriptsize \displaystyle 50;\text{ }58;\text{ 6}9;\text{ 42};\text{ }63;\text{ 72};\text{ 5}2;\text{ 80};\text{ 65};\text{ }72;\text{ 91}$

1. Calculate the mean mark.
2. How many learners received better than average marks?
3. Adam got $\scriptsize 63\%$he believes that his mark is better than the average because it is greater than $\scriptsize 50\%$. Is he correct? Explain.

Solution

1. Using the formula for the mean, we find the sum of the exam scores and then divide by the number of values.
\scriptsize \displaystyle \begin{align*}\bar{x}&=\displaystyle \frac{{\sum\limits_{{i=1}}^{n}{{{{x}_{i}}}}}}{n}\\&=\displaystyle \frac{{50+58+\text{6}9+\text{42+}63+\text{72+5}2+\text{80+65+}72+\text{91}}}{{11}}\\&=\displaystyle \frac{{714}}{{11}}\\&=64.9\%\end{align*}
.
How to use the Casio fx-82ZA Plus to find the mean for ungrouped data:
Step 1: Start with the calculator turned off. Turn the calculator on then set the calculator to statistics (MODE 2) and press $\scriptsize 1$ for $\scriptsize 1$ variable statistics. If the calculator has a frequency column showing turn the frequency off by pressing SHIFT, MODE (SET UP), REPLAY down, $\scriptsize 3$ for statistics then $\scriptsize 2$ for ‘off’. For ungrouped data make sure frequency is turned off.

Step 2: Enter the data. Press $\scriptsize 50$ then $\scriptsize =$ $\scriptsize 58=$ $\scriptsize 69=$ and so on until you enter $\scriptsize 91=$. If you make a mistake, don‛t worry! Simply scroll up to the wrong data value and type the correct value over it.

Step 3: After you have finished entering the scores, press AC to indicate the completion of the data entering stage. Don‛t panic when the scores disappear! The data entering screen will disappear but can be brought back if needed.

Step 4: To bring up the value for the mean press SHIFT $\scriptsize 1$ then $\scriptsize 4:\text{VAR}$ next press $\scriptsize 2:\overline{x}=$ and the value of $\scriptsize 64.9$ is displayed.

2. Five learners scored more than $\scriptsize 64.9\%$ on the test.
3. No he is not correct. The average test mark is $\scriptsize 64.9\%$ not $\scriptsize 50\%$ and with a mark of $\scriptsize 63\%$ he scored below the average. ### Exercise 1.1

1. AIDS data indicating the number of months a patient with AIDS lives after taking a new antibody drug are as follows:
\scriptsize \displaystyle \begin{align*}&\text{3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; }\\ &\text{29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47}\end{align*}

Calculate the mean and explain what the statistic you have found represents.

2. Statistics can be used to compare anything, in this case authors. The following shows a simple random sample that compares the letter counts for the first words used by three authors in ten articles they wrote.

Terry: $\scriptsize \displaystyle \text{7; 9; 3; 3; 3; 4; 1; 3; 2; 2}$
Davis: 3; 3; 3; 4; 1; 4; 3; 2; 3; 1
Maris: 2; 3; 4; 4; 4; 6; 6; 6; 8; 3

Calculate the mean letter count for each author.

The full solutions are at the end of the unit.

### The median

The median is the middle value of an ordered data set. To find the median, we first sort the data in increasing order and then pick out the value in the middle of the sorted list. The median is the best measure of the centre when a data set contains outliers or extreme values. ### Example 1.2

We will once again use the exam scores for $\scriptsize 11$ learners, which are as follows:

$\scriptsize \displaystyle 50;\text{ }58;\text{ 6}9;\text{ 42};\text{ }63;\text{ 72};\text{ 5}2;\text{ 80};\text{ 65};\text{ }72;\text{ 91}$

Find the median and state how many learners scored marks below the median value and how many scored marks above the median.

Solution

To find the median, we must first order the data.

$\scriptsize \displaystyle 42;\text{ }50;\text{ 52; }58;\text{ 63; 65; 6}9;\text{ 72};\text{ 72; 80};\text{ 91}$

You can quickly find the location of the median by using: $\scriptsize \displaystyle \frac{{n+1}}{2}$

If $\scriptsize n$ is an odd number, the median is the middle value of the ordered data (ordered smallest to largest). If $\scriptsize n$ is an even number, the median is equal to the two middle values added together and divided by two after the data has been ordered.

Since there are an odd number of values the median will be the middle value after ordering the data.

Position of the median is $\scriptsize \displaystyle \frac{{11+1}}{2}=6$

$\scriptsize \displaystyle 42;\text{ }50;\text{ 52; }58;\text{ 63; }\underline{{\text{65}}}\text{; 6}9;\text{ 72};\text{ 72; 80};\text{ 91}$

Count six places from the beginning of the ordered list and you will find that the sixth value is $\scriptsize 65$ therefore the median mark is $\scriptsize 65$.

The median divides the data into two equal parts so $\scriptsize 50\%$ of data values lie below the median and $\scriptsize 50\%$ of values lie above it. There are five marks below $\scriptsize 65$ and five marks above $\scriptsize 65$. ### Exercise 1.2

1. Looking again at the same AIDS data which indicate the number of months a patient with AIDS lives after taking a new antibody drug. The data are as follows:
\scriptsize \displaystyle \begin{align*}&\text{3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; }\\ &\text{29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47}\end{align*}
Calculate the median.
2. Suppose that in a small town of $\scriptsize \displaystyle 50$ people, one person earns $\scriptsize \displaystyle \text{R}5\text{ }000\text{ }000$ per year and the other $\scriptsize \displaystyle 49$ each earn $\scriptsize \displaystyle \text{R}30\text{ }000$ a year. Which is the better measure of the ‘centre’; the mean or the median? Explain your reasoning.
3. In a sample of $\scriptsize \displaystyle 60$ houses, one house is worth $\scriptsize \displaystyle \text{R}2\text{ }500\text{ }000$. Half of the rest are worth $\scriptsize \displaystyle \text{R}280\text{ }000$, and all the others are worth $\scriptsize \displaystyle \text{R350 }000$. Without doing any calculations, which is the better measure of the “centre” house value; the mean or the median?

The full solutions are at the end of the unit.

### The mode

The mode is the most frequent value. There can be more than one mode in a data set. A data set with two modes is called bimodal. ### Example 1.3

The number of books checked out from the library by $\scriptsize \displaystyle 25$ learners are as follows:

$\scriptsize \displaystyle \text{0; 0; 0; 1; 2; 3; 3; 4; 4; 5; 5; 7; 7; 7; 7; 8; 8; 8; 9; 10; 10; 11; 11; 12; 12}$

Find the mode.

Solution

It is easier to pick out the mode if the data are ordered. The mode is $\scriptsize 7$ as it is repeated the most often ($\scriptsize 4$ times). ### Exercise 1.3

1. Look at the following set of numbers:
$\scriptsize \displaystyle 6;\text{ }7;\text{ }11;\text{ }10;\text{ }15;\text{ }13;\text{ }14;\text{ }8$
1. Notice in this data set there are an even number of values. What should you do first to find the average of these numbers?
2. Write down the steps you would use to find the median.
3. Find the mean of the data.
4. Is it possible to calculate the mode of the data? Will it be a useful measure in this case?
2. What word describes a distribution that has two modes?

The full solutions are at the end of the unit.

### Note

To review measures of central tendency and the shape of distributions you can watch this video when you have an internet connection.

## Measures of dispersion for ungrouped data

Measures of dispersion tell us how spread out a data set is. If a measure of dispersion is small, the data are clustered in a small region. If a measure of dispersion is large, the data are spread out over a large region. In this section we revise the measures of dispersion, the range and interquartile range.

The simplest measure of dispersion is the range. The range is the difference between the maximum and minimum values of the data set.

Quartiles divide data into four equal parts. The quartiles are computed in a similar way to the median. The median is halfway into the ordered data set and is also called the second quartile.

The first quartile $\scriptsize ({{\text{Q}}_{1}})$ is one quarter of the way into the ordered data set; whereas the third quartile $\scriptsize \displaystyle ({{\text{Q}}_{3}})$ is three quarters of the way into the ordered data set. The data must be ordered before you can calculate the quartiles.

The inter-quartile range is the difference between the first and third quartiles of the data set. ### Activity 1.2: Calculate the range and interquartile range (IQR)

Time required: 25 minutes

What you need:

• a pen and paper

What to do:

Answer these questions and show all the necessary steps.

The data below gives the heights (in mm) of seedlings $\scriptsize 4$ weeks after germinating:
$\scriptsize \displaystyle 47;\text{ }52;\text{ }56;\text{ }62;\text{ }71;\text{ }74;\text{ }78;\text{ }86;\text{ }89;\text{ }92;\text{ }93;\text{ }95$

1. Calculate the median.
2. Divide the data into an upper half and lower half.
3. Find the median of the lower half of the data. What have you found?
4. Find the median of the upper half of the data. What have you found?
5. How many parts has the data been divided into?
6. Find the difference between the values in 3 and 4 above. What have you found?
7. Between what heights do $\scriptsize 50\%$ of the seedlings lie?
8. Find the difference between the tallest and shortest seedling. What do we call this?

What did you find?

1. The data is already arranged from lowest to highest and has $\scriptsize \displaystyle 12$ entries. To find the position of the median: $\scriptsize \displaystyle \frac{{n+1}}{2}=\displaystyle \frac{{12+1}}{2}=6.5$.
To find the median we must add the $\scriptsize \displaystyle 6\text{th}$ and $\scriptsize \displaystyle 7\text{th}$ data values and divide by $\scriptsize 2$:
$\scriptsize \text{M}=\displaystyle \frac{{74+78}}{2}=76\text{ mm}$
The median divides the data into an upper and lower half and is located between $\scriptsize \displaystyle 74$ and $\scriptsize \displaystyle 78$.
\scriptsize \displaystyle \begin{align*}\text{Lower half: }47;\text{ }52;\text{ }56;\text{ }62;\text{ }71;\text{ }74\\\text{Upper half: }78;\text{ }86;\text{ }89;\text{ }92;\text{ }93;\text{ }95\end{align*}
2. We must find the median of the lower half of the data, which is called the first quartile and represented by $\scriptsize {{\text{Q}}_{\text{1}}}$. The first quartile (also called the lower quartile) has $\scriptsize \displaystyle 25\%$ of data (scores) below it; it is the median of the lower $\scriptsize \displaystyle 50\%$ of the data. It is found between the $\scriptsize \displaystyle 3\text{rd}$ and $\scriptsize \displaystyle ~4\text{th}$ values.
\scriptsize \begin{align*}{{\text{Q}}_{\text{1}}}&=\text{ first }\!\!~\!\!\text{ quartile}\\&=\displaystyle \frac{{56+62}}{2}\\&=59\text{ mm}\end{align*}
3. The median of the upper half of the data is called the upper or third quartile and is represented as $\scriptsize {{\text{Q}}_{3}}$. The third quartile has $\scriptsize \displaystyle 75\%$ of the data (scores) below it. It will be located between the $\scriptsize \displaystyle 9\text{th}$ and $\scriptsize \displaystyle 10\text{ th}$value of the original data set.
\scriptsize \begin{align*}{{\text{Q}}_{3}}&=\text{third }\!\!~\!\!\text{ quartile}\\&=\displaystyle \frac{{89+92}}{2}\\&=90.5\text{ mm}\end{align*}
4. The data has been divided into four parts.
5. We can use the first quartile and third quartile to compute the IQR.
\scriptsize \begin{align*}\text{IQR}&={{\text{Q}}_{3}}-{{\text{Q}}_{1}}\\&=90,5-59\\&=31.5\text{ mm}\end{align*}
6. $\scriptsize 50\%$ of the seedlings are between $\scriptsize 59\text{ mm}$ and $\scriptsize 90.5\text{ mm}$ in height.
7. $\scriptsize \text{Tallest}-\text{shortest}=95-47=48\text{ mm}$
8. The difference between the largest and smallest values in a data set is called the range.

An outlier is a data point that is significantly different from the other data points. The IQR can help to determine potential outliers.

A value is suspected to be a potential outlier if it is less than $\scriptsize \displaystyle 1.5\times \text{IQR}$ below the first quartile or more than $\scriptsize \displaystyle 1.5\times \text{IQR}$ above the third quartile. Potential outliers always require further investigation, for example, investigating the effect that it has on the data set as a whole. ### Exercise 1.4

1. For the following $\scriptsize \displaystyle 13$ house prices, calculate the IQR and determine if any prices are potential outliers.
\scriptsize \displaystyle \begin{align*}&389\text{ }950;\text{ }230\text{ }500;\text{ }158\text{ }000;\text{ }479\text{ }000;\text{ }639\text{ }000;\text{ }114\text{ }950;\text{ }5\text{ }500\text{ }000;\text{ }387\text{ }000;\text{ }\\&659\text{ }000;\text{ }529\text{ }000;\text{ }575\text{ }000;\text{ }488\text{ }800;\text{ }1\text{ }095\text{ }000\end{align*}
2. The table below summarises two data sets of test scores for an evening statistics class and a daytime statistics class.
 Min $\scriptsize ({{\text{Q}}_{1}})$ Median $\scriptsize ({{\text{Q}}_{3}})$ Maximum Day $\scriptsize 32$ $\scriptsize 56$ $\scriptsize 74.5$ $\scriptsize 82.5$ $\scriptsize 99$ Night $\scriptsize 25.5$ $\scriptsize 78$ $\scriptsize 81$ $\scriptsize 89$ $\scriptsize 98$
1. Find the IQR for both classes. Compare and comment on the two IQRs.
2. Are there outliers in either of the classes? Provide reasons for your answer.

The full solutions are at the end of the unit.

## Summary

In this unit you have learnt the following:

• How to find the mean for ungrouped data by hand and by using a calculator.
• How to find the median and mode for ungrouped data.
• How to find the range and interquartile range for ungrouped data.

# Unit 1: Assessment

### Suggested time to complete: 30 minutes

1. Sixty-five randomly selected car salespeople were asked the number of cars they generally sell in one week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve generally sell five cars; nine generally sell six cars; eleven generally sell seven cars.
1. Calculate the sample mean.
2. Find the median.
3. Find the mode.
2. Of the three measures, which tends to reflect skewing the most; the mean, the mode, or the median? Why?
3. Sharpe School is applying for a grant that will be used to add fitness equipment to the school gym. The principal surveyed $\scriptsize \displaystyle 15$ anonymous learners to determine how many minutes a day the learners spend exercising. The results from the learners are shown: \scriptsize \displaystyle \begin{align*}&{0\text{ minutes; }40\text{ minutes; }60\text{ minutes; }30\text{ minutes; 60 minutes;}} \\&{10\text{ minutes; }45\text{ minutes; }30\text{ minutes; }300\text{ minutes; }90\text{ minutes;}} \\&{30\text{ minutes; }120\text{ minutes; }60\text{ minutes; }0\text{ minutes; }20\text{ minutes}} \end{align*}
1. Determine the following five values of the data:
minimum, first quartile, median, third quartile and maximum.
2. If you were the principal, would you be justified in purchasing new fitness equipment? Calculate the IQR and any potential outliers and note any pitfalls the principal should be aware of as part of your answer.

The full solutions are at the end of the unit.

# Unit 1: Solutions

### Exercise 1.1

1. .
\scriptsize \displaystyle \begin{align*}\bar{\text{x}}&=\displaystyle \frac{{\text{3+4+2(8)+10+11+}...\text{+44(2)+47}}}{{40}}\\&=\displaystyle \frac{{944}}{{40}}\\&=23.6\end{align*}
On average a patient will live $\scriptsize 23.6$ months after taking the new antibody drug.
2. .
Terry’s mean:
$\scriptsize \displaystyle \frac{{\text{7+9+4(3)+4+1+2(2)}}}{{10}}=3.7$
Davis’ mean:
$\scriptsize \displaystyle \frac{{\text{5(3)+2(4)+2(1)+2}}}{{10}}=2.7$
Maris’ mean:
$\scriptsize \displaystyle \frac{{\text{2+2(3)+3(4)+3(6)+8}}}{{10}}=4.6$

Back to Exercise 1.1

### Exercise 1.2

1. To find the median, M, first use the formula for the location. The location is: $\scriptsize \displaystyle \frac{{n+1}}{2}=\displaystyle \frac{{41}}{2}=20.5$. Starting at the smallest value, the median is located between the $\scriptsize \displaystyle 20\text{th}$ and $\scriptsize \displaystyle 21\text{st}$ values (the two $\scriptsize \displaystyle 24\text{s}$):
\scriptsize \displaystyle \begin{align*}&\text{3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; }\underline{{\text{24; 24}}}\text{; 25; 26; 26; 27; 27; }\\&\text{29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47}\end{align*}
\scriptsize \displaystyle \begin{align*}\text{M}&=\displaystyle \frac{{24+24}}{2}\\&=24\end{align*}
2. Calculate the mean then the median and compare.
\scriptsize \begin{align*}\bar{x}&=\displaystyle \frac{{5\text{ }000\text{ }000+49(30\text{ }000)}}{{50}}\\&=129\text{ }400\end{align*}
The mean is $\scriptsize \text{R}129\text{ 4}00$
.
$\scriptsize \text{M}=30\text{ }000$. The median is $\scriptsize \text{R}30\text{ }000$ (since there are $\scriptsize \displaystyle 49$ people who earn $\scriptsize \text{R}30\text{ }000$ and one person who earns $\scriptsize \text{R}5\text{ 00}0\text{ }000$ and the median is in position $\scriptsize \displaystyle 25.5$).
The median is a better measure of the centre of the data than the mean because $\scriptsize \displaystyle 49$ of the values are $\scriptsize \text{R}30\text{ }000$ and only one is $\scriptsize \text{R}5\text{ 00}0\text{ }000$. The $\scriptsize \text{R}5\text{ 00}0\text{ }000$ is an outlier. The $\scriptsize \text{R}30\text{ }000$ gives us a better sense of the middle of the data.
3. The median will be a better indication of the middle of the data as most of the data values lie between $\scriptsize \displaystyle \text{R}280\text{ }000$ and $\scriptsize \displaystyle \text{R35}0\text{ }000$. $\scriptsize \displaystyle \text{R}2\text{ }500\text{ }000$ is an outlier and will cause the mean value to be increased and give an incorrect indication of the centre of the data.

Back to Exercise 1.2

### Exercise 1.3

1. .
1. Find the sum of the data values.
2. Step 1: Order the data set from least to greatest.
Step 2: Find the position of the median using the formula $\scriptsize \displaystyle \frac{{n+1}}{2}$ .
Since there is an even number of items in the data set, the median is found by taking the mean (average) of the two middlemost numbers.

Step 3: Write down the median.

3. .
\scriptsize \begin{align*}\bar{x} &=\displaystyle \frac{{6+7+11+10+15+13+14+8}}{8}\\&=10.5\end{align*}
4. There is no mode as no number occurs more than once. The mode will not be useful in this case, the mean and median are equal therefore the distribution would be symmetric.
2. Bimodal.

Back to Exercise 1.3

### Exercise 1.4

1. Order the data from smallest to largest value.
\scriptsize \displaystyle \begin{align*}&114\text{ }950;\text{ }158\text{ }000;\text{ }230\text{ }500;\text{ }387\text{ }000;\text{ }389\text{ }950;\text{ }479\text{ }000;\text{ }488\text{ }800;\\&529\text{ }000;\text{ }575\text{ }000;\text{ }639\text{ }000;\text{ }659\text{ }000;\text{ }1\text{ }095\text{ }000;\text{ }5\text{ }500\text{ }000;\text{ }\end{align*}
.
Find the median.
$\scriptsize \text{M}=488\text{ }800\text{ }$
\scriptsize \begin{align*}{{\text{Q}}_{1}}&=\displaystyle \frac{{230\text{ }500+387\text{ }000}}{2}\\&=308\text{ }750\end{align*}
\scriptsize \begin{align*}{{\text{Q}}_{3}}&=\displaystyle \frac{{\text{639 0}00+659\text{ }000}}{2}\\&=649\text{ 00}0\end{align*}
\scriptsize \displaystyle \begin{align*}\text{IQR}&=649\text{ }000-308\text{ }750\\&=340\text{ }250\\1.5\times \text{IQR}&=1.5\times 340\text{ }250\\&=510\text{ }375\\{{\text{Q}}_{1}}-1.5\times \text{IQR}&=308\text{ }750-510\text{ }375\\&=-201\text{ }625\\{{\text{Q}}_{3}}+1.5\times \text{IQR}&=649\text{ }000+510\text{ }375\text{ }\\&=1\text{ }159\text{ }375\end{align*}
.
No house price is less than $\scriptsize \displaystyle \text{R}-201\text{ }625$. However, $\scriptsize \displaystyle \text{R}5\text{ }500\text{ }000$ is more than $\scriptsize \displaystyle \text{R}1\text{ }159\text{ }375$. Therefore, $\scriptsize \displaystyle \text{R}5\text{ }500\text{ }000$ is a potential outlier.
2. .
1. The IQR for the day group is:
\scriptsize \begin{align*}{{\text{Q}}_{3}}-{{\text{Q}}_{1}}&=82.5-56\\&=26.5\end{align*}
.
The IQR for the evening group is:
\scriptsize \begin{align*}{{\text{Q}}_{3}}-{{\text{Q}}_{1}}&=89-78\\&=11\end{align*}
.
The interquartile range (the spread or variability) for the day class is larger than the evening class IQR. This suggests more variation will be found in the day class’s test scores.
2. Day class outliers are found using:
\scriptsize \displaystyle \begin{align*}{{\text{Q}}_{1}}-(1.5)\text{IQR}&=56-1.5(26.5)\\&=16.25\\{{\text{Q}}_{3}}+(1.5)\text{IQR}&=82.5+1.5(26.5)\\&=122.25\end{align*}
Since the minimum and maximum values for the day class are greater than $\scriptsize \displaystyle 16.25$ and less than $\scriptsize \displaystyle 122.25$, there are no outliers in the day class.
Evening class outliers are found using:
\scriptsize \displaystyle \begin{align*}{{\text{Q}}_{1}}-(1.5)\text{IQR}&=78-1.5(11)\\&=61.5\\{{\text{Q}}_{3}}+(1.5)\text{IQR}&=89+1.5(11)\\&=105.5\end{align*}
For the evening class, any test score less than $\scriptsize \displaystyle 61.5$ is an outlier: this includes the score of $\scriptsize \displaystyle 25.5$. Since no test score is greater than $\scriptsize \displaystyle 105.5$, there is no upper end outlier.

Back to Exercise 1.4

### Unit 1: Assessment

1. .
1. .
\scriptsize \begin{align*}\bar{x}&=\displaystyle \frac{{14(3)+19(4)+12(5)+9(6)+11(7)}}{{65}}\\&=\displaystyle \frac{{309}}{{65}}\\&=4.75\end{align*}
2. The median is in the $\scriptsize 33\text{rd}$ position. Therefore, the median number of cars sold is $\scriptsize 4$.
3. The mode is $\scriptsize 4$ as it appears nineteen times.
2. The mean reflects skewness the most as it is calculated using every value in a data set.
3. Order the data first.
1. .
\scriptsize \begin{align*}\text{Min}&=0\text{ minutes}\\{{\text{Q}}_{1}}&=20\\\text{Median}&=40\\{{\text{Q}}_{3}}&=60\\\text{Max}&=300\end{align*}
2. Because $\scriptsize \displaystyle 75\%$ of the learners exercise for $\scriptsize \displaystyle 60$ minutes or less daily, and the IQR is $\scriptsize \displaystyle 40$ minutes, we know that half of the learners surveyed exercise between $\scriptsize \displaystyle 20$ minutes and $\scriptsize \displaystyle 60$ minutes daily. This seems a reasonable amount of time spent exercising, so without looking at potential outliers it seems the principal would be justified in purchasing the new equipment.
Next, let’s look for outliers. The value $\scriptsize \displaystyle 300$ appears to be a potential outlier.
\scriptsize \displaystyle \begin{align*}{{\text{Q}}_{\text{3}}}\text{+ 1}\text{.5(IQR)}&=60+(1.5)(40)\text{ }\\&=\text{ }120\end{align*}
The value $\scriptsize \displaystyle 300$ is greater than $\scriptsize \displaystyle 120$ so it is a potential outlier. If we delete it and calculate the five values, we get the following values:
\scriptsize \displaystyle \begin{align*}\text{Min}&=0\\{{\text{Q}}_{\text{1}}}&=20\\{{\text{Q}}_{\text{3}}}&=60\\\text{Max}&=120\end{align*}
We still have $\scriptsize \displaystyle 75\%$ of the learners exercising for $\scriptsize \displaystyle 60$ minutes or less daily and half of the learners exercising between $\scriptsize \displaystyle 20$ and $\scriptsize \displaystyle 60$ minutes a day. However, $\scriptsize \displaystyle 15$ learners is a small sample and the principal should survey more learners to be more confident in his survey results.

Back to Unit 1: Assessment 