Saturday, 12 December 2015

Unit 4 sec 4.2 Quartiles and interquartile range

Unit 4 sec 4.2 Quartiles and interquartile range
12 December 2015
22:47

15 dogs have been placed above the number line, this session correspond to the weekly earnings of the 15 members of staff. Where values coincide (for example, there are three values of £280), the dots are placed vertically, one above another.
   When you have a small number of values in the datasets, as is the case here, it is quick and easy to create a simple dotplot of the data wipe excess. Usually it provides a useful, intuitive picture of where the values lie, whether there is some bunching of the data to one side of there are symmetrical, and whether they are outliers.
  How then can the problem of the range being unduly affected by this outlier be sold? You might simply decide to ignore this particular untypical value, but that is a somewhat arbitrary decision and not one that can be called a general method, although it is sometimes done.
Alternatively, you might choose to omit, say, the largest and smallest values into the range of the remaining 13 values. This is a better solution, and one that works well in this particular instance, but there were several outliers at either end, the problem will not be solved. In order to be confident that you have dealt with the outlying problem, you really need to exclude a greater number of values at either end.

Introducing the quartiles
The conventional solution, and the one described now, is to exclude the top quarter and bottom quarter of the values and create a new measure of spread that measures the “range” of the middle 50% of the values. They are known as the quartiles - in particular the lower quartile (Q1) and upper quartile (Q3).
  You will probably find out the description quartile is not totally convincing as it rather depends on how we choose the interpret “a quarter of the way through the dataset”. Incidentally, the median, the value that lies halfway through the data, sometimes referred to as Q2, as it is the second quartile.
  The convention when defining what quartile is Q1 and which is Q3 is that the data can be presented in increasing order of size. Then, even and odd sample sizes need slightly different approaches, and there are various ways of coping with this. The method for finding the quartiles described in the following two examples is used on some graphical calculators - it is straightforward and quite easy to perform. These example also show you how to find the measure of spread known as the interquartile range, or IQR. The interquartile range is the difference between the upper and lower quartiles, that is, it is the value Q3 – Q1.

Example 3 finding the lower and upper quartiles: even sample size
find the lower quartile (Q1), the median and the upper quartile (Q3) of the following dataset.
8 3 2 6 4 1 5 7
then find the interquartile range.

Solution
sort data into increasing order. Find the median
in increasing order, the dataset is
1 2 3 4 5 6 7 8.
The median is the mean of the team middle data values, 4 and 5.
Median = 4.5
to find the lower quartile, focus on the lower half of the dataset find the median of the smallest dataset.
The last half of the dataset is 1 2 3 4.
It is median is 2.5.
So Q1 = 2.5.
To find the upper quartile, focus on the upper half of the dataset and find the median of the smallest dataset.
There are half of the dataset is 5 6 7 8.
It is median is 6. 5.
So Q3 = 6.5.
The interquartile range is the difference between the upper and lower quartiles.
The interquartile range is thus
6.5 – 2.5 = 4.

Example 4 finding the lower and upper quartiles: odd sample size
find the lower quartile (Q1), the median and the upper quartile (Q3) of the following dataset:
1 2 3 4 5 6 7
to find the interquartile range.
Solution
first, find the median.
The median is the middle value of the ordered datasets.
Median = 4
to find the lower quartile, ignore the middle data value find the median of the lower “half” of the dataset.
The lower half of the dataset is 1 2 3.
The median is 2
So Q1 = 2.
To find the outer quartile, ignore the middle data value and find the median of the upper “half” of the dataset.
The half of the dataset is 5 6 7.
Its median is 6
So Q3 = 6
the interquartile range is the difference between the upper and lower quartiles.
The interquartile range is thus
6 – 2 =4.
The examples lead to the following strategy finding the quartiles and in quartile range.

Strategy to find the quartiles and interquartile range of dataset
1.    arrange the dataset in increasing order.
2.    Next:
(A) If there is an even number of data values, then the lower quartile (Q1) is the median of the lower half of the dataset, and the upper quartile (Q3) is the median of the upper half of the dataset.
(B) If there is an odd number of data values, throw out the middle data point (which of course has the median value of the dataset). Then the lower quartile (Q1) is the median of the lower half of the dataset, and the upper quartile (Q3) is the median of the upper half of the new dataset.
(3) the interquartile range (IQR) is Q3 – Q1.


As you have seen, when there is an even number of data values, the dataset breaks neatly in her and the quartiles are simply median of these two half sets. The procedure is slightly more complicated. The original dataset contains an odd number of values, as decision needs to be made about what constitutes these half sets. However, the choice of whether or not to include the middle data value is quite arbitrary - some authors include date and others, as we have done here exclude it. Indeed, there are yet other methods of calculation that are different again and all of these may give slightly different answers for the values of the quartiles. With very small datasets like the ones you have been using, these differences may be noticeable, but in a real investigation, where the sizes would be larger, these small differences tend to disappear.

No comments:

Post a Comment