Unit 4 sec 4.3
standard deviation
12 December 2015
23:32
The best known measure of spread is the standard deviation,
or SD. The bad news is that, using pencil and paper, it is hard work to calculate
the standard deviation, particularly with large datasets. The good news is
that, these days, once the day the have been keyed in, calculator or computer
can work out the standard deviation in a flash. But before becoming totally
reliant on a machine, it is a good idea to perform one or 2 pencil and paper
calculations of the standard deviations using very simple datasets.
An alternative name
for the standard deviation is the RMS
deviation - in full, the root mean squared deviation. Literally, it is the
(square) root of the mean of the squared deviations. This complicated name will
make more sense when you follow through the steps involved in the calculation.
Strategy to find the
standard deviation of a dataset
1.
find the mean of the
dataset.
2.
Find the difference of
each value from the mean - these are the deviations, often labelled as the d values.
3.
Square each deviation - this
gives the d² values.
4.
Find the mean of these
squared deviations - this number is the mean squared deviation, better known as
the variance.
5.
The square root of the
variance to get the root mean square deviation - that is, the standard
deviation.
Example 5 finding
a standard deviation
find the standard deviation of
the following dataset.
1,2,4,6,7
Solution
find the mean
mean= (1+2+4+6+7)/5=20/5=4
subtract the mean from each data
value to find the deviations
the deviations are -3, -2, 0, 2,
3.
Square the deviations.
The squared deviations are 9, 4,
0, 4, 9.
Calculate the mean of the
squared deviations to find the variance.
The variance is (9+4+0+4+9)/5=
26/5= 5.2.
The standard deviation is the
square root of the variance.
So the standard deviation is
√5.2= 2.3 (to 1d.p).
You often find the calculation standard deviation, rather
complicated, and the steps hard to remember. It can be helpful to think about
some of the ideas of visually.
You may have wondered
why it is necessary to square the deviations step 3 of the calculation. In
order to see the point of this, consider what would have happened if you had
not squared them.
As you can see, the
positive and negative deviations have cancelled each other out and we are left
with the numerator of 0. So the value of the main deviations is 0., This will
always be true to the mean deviation; the positive and negative deviations will
always cancel each other out, leaving an answer of 0 for the mean deviation.
You may like to try this yourself with some other examples.
It is to avoid this
problem that the deviations are squared in step 3 (making them positive), and
this is then undone by taking the square root in step 5.
Although calculating
standard deviation by pencil and paper is quite hard work, rest assured that,
these days it is normally done on a calculator or computer; as you will see
later, the module resource dataplotter calculates and displays it and other statistical
summaries automatically. There are a number of reasons why the standard
deviation is a useful measure of spread, and here are 2 of the main ones.
Reasons for using the standard deviation as a measure of
spread.
The standard deviation is the best
known and most commonly used measure of spread.
All the values in the dataset are included in
its calculation.
(However, unlike the interquartile range, its
value can be to sell extent distorted by outliers.)
No comments:
Post a Comment