Unit 4 sec 4.2
Quartiles and interquartile range
12 December 2015
22:47
15 dogs have been placed above the number line, this session
correspond to the weekly earnings of the 15 members of staff. Where values
coincide (for example, there are three values of £280), the dots are placed
vertically, one above another.
When you have a
small number of values in the datasets, as is the case here, it is quick and
easy to create a simple dotplot of the data wipe excess. Usually it provides a
useful, intuitive picture of where the values lie, whether there is some
bunching of the data to one side of there are symmetrical, and whether they are
outliers.
How then can the
problem of the range being unduly affected by this outlier be sold? You might
simply decide to ignore this particular untypical value, but that is a somewhat
arbitrary decision and not one that can be called a general method, although it
is sometimes done.
Alternatively, you might choose to omit, say, the largest
and smallest values into the range of the remaining 13 values. This is a better
solution, and one that works well in this particular instance, but there were
several outliers at either end, the problem will not be solved. In order to be
confident that you have dealt with the outlying problem, you really need to
exclude a greater number of values at either end.
Introducing the
quartiles
The conventional solution, and the one described now, is to
exclude the top quarter and bottom quarter of the values and create a new measure
of spread that measures the “range” of the middle 50% of the values. They are
known as the quartiles - in
particular the lower quartile (Q1)
and upper quartile (Q3).
You will probably
find out the description quartile is not totally convincing as it rather
depends on how we choose the interpret “a quarter of the way through the
dataset”. Incidentally, the median, the value that lies halfway through the
data, sometimes referred to as Q2, as it is the second quartile.
The convention when
defining what quartile is Q1 and which is Q3 is that the data can be presented
in increasing order of size. Then, even and odd sample sizes need slightly
different approaches, and there are various ways of coping with this. The
method for finding the quartiles described in the following two examples is
used on some graphical calculators - it is straightforward and quite easy to
perform. These example also show you how to find the measure of spread known as
the interquartile range, or IQR. The
interquartile range is the difference between the upper and lower quartiles,
that is, it is the value Q3 – Q1.
Example 3 finding the lower and upper quartiles: even
sample size
find the lower quartile (Q1),
the median and the upper quartile (Q3) of the following dataset.
8 3 2 6 4 1 5 7
then find the interquartile
range.
Solution
sort data into increasing order.
Find the median
in increasing order, the dataset
is
1 2 3 4 5 6 7 8.
The median is the mean of the
team middle data values, 4 and 5.
Median = 4.5
to find the lower quartile,
focus on the lower half of the dataset find the median of the smallest dataset.
The last half of the dataset is 1
2 3 4.
It is median is 2.5.
So Q1 = 2.5.
To find the upper quartile,
focus on the upper half of the dataset and find the median of the smallest
dataset.
There are half of the dataset is
5 6 7 8.
It is median is 6. 5.
So Q3 = 6.5.
The interquartile range is the
difference between the upper and lower quartiles.
The interquartile range is thus
6.5 – 2.5 = 4.
Example 4 finding
the lower and upper quartiles: odd sample size
find the lower quartile (Q1),
the median and the upper quartile (Q3) of the following dataset:
1 2 3 4 5 6 7
to find the interquartile range.
Solution
first, find the median.
The median is the middle value
of the ordered datasets.
Median = 4
to find the lower quartile,
ignore the middle data value find the median of the lower “half” of the dataset.
The lower half of the dataset is
1 2 3.
The median is 2
So Q1 = 2.
To find the outer quartile,
ignore the middle data value and find the median of the upper “half” of the
dataset.
The half of the dataset is 5 6 7.
Its median is 6
So Q3 = 6
the interquartile range is the
difference between the upper and lower quartiles.
The interquartile range is thus
6 – 2 =4.
The examples lead to the following strategy finding the
quartiles and in quartile range.
Strategy to find the
quartiles and interquartile range of dataset
1.
arrange the dataset in
increasing order.
2.
Next:
(A) If there is an
even number of data values, then the lower quartile (Q1) is the median of the
lower half of the dataset, and the upper quartile (Q3) is the median of the
upper half of the dataset.
(B) If there is an odd
number of data values, throw out the middle data point (which of course has the
median value of the dataset). Then the lower quartile (Q1) is the median of the
lower half of the dataset, and the upper quartile (Q3) is the median of the
upper half of the new dataset.
(3) the interquartile range (IQR) is
Q3 – Q1.
As you have seen, when there is an even number of data
values, the dataset breaks neatly in her and the quartiles are simply median of
these two half sets. The procedure is slightly more complicated. The original
dataset contains an odd number of values, as decision needs to be made about
what constitutes these half sets. However, the choice of whether or not to
include the middle data value is quite arbitrary - some authors include date
and others, as we have done here exclude it. Indeed, there are yet other
methods of calculation that are different again and all of these may give
slightly different answers for the values of the quartiles. With very small
datasets like the ones you have been using, these differences may be
noticeable, but in a real investigation, where the sizes would be larger, these
small differences tend to disappear.