Home » Math Theory » Statistics and Probability » Measures of Dispersion

# Measures of Dispersion

## Overview

The measures of central tendencies, such as mean, median, and mode, disclose where a data set’s central value is located; but they do not reveal how the data is distributed or spread.

The below example shows marks obtained by two students for five subjects.

Andrew’s marks: {50, 55, 60, 65, 70}
Simon’s marks: {35, 40, 60, 70, 95}

You will notice that the mean and median marks for Andrew and Simon are the same: 60. However, the spread of Andrew’s marks, which varies from 50 to 70, is significantly different from Simon’s marks, which vary from 35 to 95.

Other measurements must be used to determine the magnitude of the spread between these averages or the variation of items. These measurements are called ‘Measures of Dispersion’. This is also called ‘Measures of Variations’.

## Definition of Dispersion

The dispersion of data is the degree to which numerical data tends to spread around an average value.

## Measures of Dispersion

The measures of dispersion are the statistical methods used to measure the degree to which numerical data tends to spread around an average value. There are two types of measures of dispersion.

The most often used absolute measure of dispersion is the standard deviation. The range, semi-interquartile range, and variance are also regularly used as absolute measures of dispersion. The coefficient of variation is frequently used as a relative measure of dispersion.

### Absolute Measures of Dispersion

#### Range

The range is the simplest and easiest method to measure the dispersion of a data set. It is the difference between the largest value (maximum value) in a data set and the smallest value (minimum value) in a data set. In other words, it shows the spread between the two ends in the data set or the total spread in the data set.

Range = Largest value – Smallest value

The range does not use all the values in the data set, and two data sets with the same range can contain vastly different data variations. See the below example.

Share A’s prices in week 1: {$50,$60, $70,$80, $90,$100}
Share B’s prices in week 1: {$50,$55, $60,$95, $100,$100}

Since the range uses only extreme values, it is easily biased and distorted.

Further, this method is highly sensitive to outliers of the data set. See the below example.

Data set A: {1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5}
Data set B: {1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 75}

The range of the data set A is 4 (i.e. 5-1=4). There is only one different number between the data set A and the data set B. Since the largest number of data set B is an extreme value, the range of data set B is 74 and it is significantly different from the range value of the data set A.

Example:

For each of the following data sets, determine the range.

1. 24, 26, 29, 40, 44, 46, 54, 56, 64, 69
2. –16, –15, -5, 3, 8, 12, 18, 22, 24, 29
3. 0, 4, 6, 9, 12, 15, 22, 23, 44, 45

Solution:

Since,

Range = Largest value – Smallest value

1. Range = 69 – 24 = 45
2. Range = 29 – (–16) = 45
3. Range = 45 – 0 = 45

#### Semi Inter-Quartile Range

The semi-inter-quartile range is another simple and easy method to measure the dispersion in a data set. The semi-inter-quartile range is half of the difference between the upper quartile and the lower quartile. In other words, the semi-inter-quartile range is one-half of the difference between the value of the third quartile in a data set and the value of the first quartile in a data set. This measure is also known as quartile deviation.

Semi Inter-Quartile Range=$\frac{(Q_3-Q_1)}{2}$

The difference between the value of the third quartile in a data set and the value of the first quartile in a data set is known as the inter-quartile range. Therefore, the semi-inter-quartile range can be calculated using the below formula as well.

Semi Inter-Quartile Range=$\frac{Inter-quartile\: range}{2}$

Since the first quartile equals the 25th percentile and the third quartile equals the 75th percentile, the above formula could be expressed in the following form.

Semi Inter-Quartile Range=$\frac{(75^{th}\:percentile-25^{th}\:percentile)}{2}$

The semi-inter-quartile range removes the main disadvantage of the range. It eliminates the lowest 25% in a data series and the highest 25% in a data series. Therefore, it is not affected by extreme values.

The below example shows that semi-inter-quartile ranges are ordinarily free of extreme values.

Data set A: {1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5}
Data set B : {1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 75}

Data set A’s range = 5 – 1 = 4
Data set A’s semi inter-quartile range = 5 – 3 = 2

Data set B’s range = 75 – 1 = 74
Data set B’s semi inter-quartile range = 5 – 3 = 2

The semi-inter-quartile range can be used to determine the dispersion if the extreme values are not recorded correctly, such as open-ended class intervals in a frequency distribution.

This measure is depending only on two values in a data set. It ignores all values in the data set other than the first quartile and the third quartile in the data set. Therefore, semi inter-quartile range cannot be identified as a representative measure of all data in a data set. Further algebraic treatment is not possible with semi inter-quartile range. It is not a stable measure of dispersion because it is strongly impacted by sampling fluctuations. In normal distributions, it is more vulnerable to sampling fluctuation than the standard deviation, hence it is rarely used for data that are approximately normally distributed.

Example:

Determine the semi inter-quartile range for the below data set

290, 340, 380, 400, 520, 530, 630, 710, 820, 860, 880

Solution:

Step 1: Find the first quartile, Q1.

The first quartile =$(\frac{(N + 1)}{4})^{th}$ item

Q1 = The item of (11 + 1) / 4 = 3rd item

So, Q1 =380

Step 2: Find the third quartile, Q3.

The third quartile =$(\frac{(N + 1)}{4}×3)^{th}$ item

Q3 = The item of  $\frac{11 + 1}{4}×3$  = 9th item

So, Q3 =820

Step 3: Find the inter-quartile range. Subtract step 1 from step 2.

Q3 – Q1 = 820 – 380 = 440

Step 4: Divide the inter-quartile range by 2.

$\frac{Q_{3} – Q_{1}}{2}=\frac{440}{2}$=220

The quartile deviation for this set of data is 220.

#### Mean Absolute Deviation

The mean of the absolute values of the difference from the mean is known as the mean absolute deviation (MAD). We must use absolute deviations from the mean since the sum of the deviations from the mean will always be zero.

The mean deviation is shown in the example below without taking into account the absolute deviation.

The absolute deviation is the difference between a value in a data set and the mean, regardless of whether it is positive or negative. It is calculated as follows.

Absolute Deviation from the mean=$\sum(x-\bar{x})$

Using the above formula, we can construct the formula for the mean absolute deviation.

Mean Absolute Deviation from the mean=$\frac{\sum(x-\bar{x})}{n}$

The mean absolute deviation uses all values in a data set. Therefore, this is a representative measure of all data in a data set. It eliminates the disadvantages of the range and the semi-inter-quartile range measures. Therefore, mean absolute deviation is considered a better measure of dispersion than the range and the semi-inter-quartile range.

Example:

Find the mean absolute deviation for the below data set.

Solution:

First, we have to find the mean of the data set.

Mean $\bar{x}=\frac{\sum{x}}{n}=\frac{25+30+45+60+75}{5}=47$

Then, find the difference between a value in a data set and the mean for each value.

Next, find the absolute value of the calculated difference by ignoring the negative signs and positive signs.

After that, take the total of the calculated absolute values.

Finally, divide it by the number of observations to find the mean absolute deviation.

Mean Absolute Deviation from the mean=$\frac{\sum{(x-\bar{x})}}{n}=\frac{82}{5}$=16.4

#### Variance

Instead of using the mean absolute deviation, the dispersion of the data set can be calculated by squaring the deviations from the mean to account for positive and negative deviations. The variance is the average squared deviations from the mean. The variance is also known as the ‘Mean Square Deviation’.

The variance of a population is denoted by the lowercase Greek letter σ2. The variance of a sample is denoted by the lowercase letter s2.

The drawback of neglecting the algebraic sign in the mean absolute deviation metric is eliminated with the variance. All deviations are squared in the variance, resulting in all deviations being positive.

Formulas to find the variance for individual data in a population

σ2=$\frac{\sum{X}^2}{N}$-μ2

σ2=$\frac{\sum{(X-\mu)}^2}{N}$

Formula to find the variance for grouped data in a population

σ2=$\frac{\sum{fX}^2}{\sum{f}}$-μ2

Formula to find the variance for individual data in a sample

s2=$\frac{\sum{(x-\bar{x})^2}}{{n-1}}$

Formula to find the variance for grouped data in a sample

s2=$\frac{\sum{f(x-\bar{x})^2}}{{n-1}}$

Example:

Find the variance for the data set of a population given below.

Solution:

First, we must find the population mean of the data set.

Mean (μ)=$\frac{\sum{X}}{n}=\frac{25+30+45+60+75}{5}$=47

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

After that, take the total of the calculated squared values.

Finally, divide it by the number of observations to find the variance.

Population variance (2)=$\frac{\sum{(X-μ)}^2}{N}=\frac{1,730}{5}$=346

Example:

Find the variance for the data set of a sample given below.

Solution:

First, we must find the sample mean of the data set.

Mean ($\bar{x}$)=$\frac{\sum{x}}{n}=\frac{25+30+45+60+75}{5}$=47

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

After that, take the total of the calculated squared values.

Finally, divide the above calculated by one less than the number of observations to find the variance.

Sample variance (s2)=$\frac{\sum{(x-\bar{x})}^2}{n-1}=\frac{1,730}{(5-1)}$=432.5

Example:

Find the variance for the data set of a population given below.

Solution:

First, we must find the population mean of the data set.

Mean (μ)=$\frac{\sum{fX}}{\sum{f}}$

Then, square each value of the data set.

Next, multiply each squared value from the respective frequency.

After that, take the total of the above-calculated values.

Finally, divide it by the number of observations and deduct the squared value of the mean.

Mean (μ)=$\frac{ΣfX}{Σf}=\frac{1,000}{20}$=50

Population variance (σ2)=$\frac{ΣfX^2}{Σf}-μ^2=\frac{57,800}{20}-50^2$=2,890-2500=390

Find the variance for the data set of a sample given below.

Example:

Solution:

First, we must find the population mean of the data set.

Mean ($\bar{x}$)=$\frac{ΣfX}{Σf}$

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

Then, multiply each calculated value from the respective frequency.

After that, take the total of the final calculated values.

Finally, divide the above calculated by one less than the number of observations to find the variance.

Mean (μ)=$\frac{ΣfX}{Σf}=\frac{1,000}{20}$=50

Sample variance (s2)=$\frac{Σf(x-x)^2}{n-1}=\frac{7,800}{20-1}$=410.53

#### Standard Deviation

The variance is stated in squared units. Rather than an average absolute deviation, the variance number shows an average squared deviation from the mean. The solution is to take the square root of the variance. The standard deviation is the positive square root of the variance.

This is often called as Root-Mean Square Deviation or Mean Error. The standard deviation of a population is denoted by the lowercase Greek letter. The standard deviation of a sample is denoted by the lowercase letter ‘s’.

The standard deviation has many advantages over the other absolute measures of dispersion. It is measured using the same units as the data. For example, the data value represents the length in km, the variance is measured in km2, which is a measure of variance. But with the square root calculation in the standard deviation, dispersion is measured using the same units as the data.

When data follows a normal distribution, the standard deviation is a dependable measure. It is widely used in further statistical calculations because it has most of the characteristics of an ideal measure of dispersion.

Formulas to find the standard deviation for individual data in a population

σ=$\sqrt{\frac{ΣX^2}{N}-μ^2}$

σ=$\sqrt{\frac{Σ(X-μ)^2}{N}}$

Formula to find the standard deviation for grouped data in a population

σ=$\sqrt{\frac{ΣfX^2}{Σf}-μ^2}$

Formulas to find the standard deviation for individual data in a sample

s=$\sqrt{\frac{Σ(x-\bar{x})2}{n-1}}$

Formulas to find the standard deviation for grouped data in a sample

s=$\sqrt{\frac{Σf(x-x)^2}{n-1}}$

Example:

Find the standard deviation for the data set of a population given below.

Solution:

First, we must find the population mean of the data set.

Mean (μ)=$\frac{ΣX}{n}=\frac{25+30+45+60+75}{5}$=47

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

After that, take the total of the calculated squared values.

Then, divide it by the number of observations to find the variance.

Finally, find the square root of the variance to calculate the standard deviation.

Population Standard deviation =$\sqrt{\frac{Σ(X-μ)^2}{N}}=\sqrt{\frac{1,730}{5}}=\sqrt{346}=18.6$

Example:

Find the standard deviation for the data set of a sample given below.

Solution:

First, we must find the sample mean of the data set.

Mean ($\bar{x}$)=$\frac{Σx}{n}=\frac{25+30+45+60+75}{5}=47$

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

After that, take the total of the calculated squared values.

Then, divide the above calculated by one less than the number of observations to find the variance.

Finally, find the square root of the variance to calculate the standard deviation.

Sample standard deviations=$\sqrt{\frac{Σ(x-\bar{x})^2}{n-1}}=\sqrt{\frac{1,730}{(5-1)}}=\sqrt{432.5}=20.8$

Example:

Find the standard deviation for the data set of a population given below.

Solution:

First, we must find the population mean of the data set.

Mean (μ)=$\frac{ΣfX}{Σf}$

Then, square each value of the data set.

Next, multiply each squared value from the respective frequency.

After that, take the total of the above-calculated values.

Then, divide it by the number of observations and deduct the squared value of the mean.

Finally, find the square root of the variance to calculate the standard deviation.

Mean (μ)=$\frac{ΣfX}{Σf}=\frac{1,000}{20}$=50

Population standard deviation σ=$\sqrt{\frac{ΣfX^2}{Σf}-μ^2}=\sqrt{\frac{57,800}{20}-50^2}=\sqrt{2,890-2500}=\sqrt{390}=19.75$

Example:

Find the standard deviation for the data set of a sample given below.

Solution:

First, we must find the population mean of the data set.

Mean ($\bar{x}$)=$\frac{Σfx}{Σf}$

Then, find the difference between a value in a data set and the mean for each value.

Next, find the squared values of the calculated values.

Then, multiply each calculated value from the respective frequency.

After that, take the total of the final calculated values.

Then, divide the above calculated by one less than the number of observations to find the variance.

Finally, find the square root of the variance to calculate the standard deviation.

Mean (μ)=$\frac{ΣfX}{Σf}=\frac{1,000}{20}$=50

Sample standard deviation(s)=$\sqrt{\frac{Σf(x-\bar{x})^2}{n-1}}=\sqrt{\frac{7,800}{20-1}}=\sqrt{410.53}=20.26$

### Relative Measures of Dispersion

#### Coefficient of Variation

If we want to make a comparison of the variability between two or more data sets, it is difficult to do with absolute measures of dispersion. Therefore, a comparison of the variability should be done with relative measures of dispersion. The coefficient of variation is considered one of the important relative measures of dispersion.

The coefficient of variation is the ratio of the standard deviation to the mean as a percentage.

Coefficient of Variation of a population=$\frac{σ}{μ}$×100%

Coefficient of Variation of a sample=$\frac{s}{\bar{x}}$×100%

Example:

The mean and standard deviation of two employees’ sales is shown in the table below. Determine which employee consistently contributes to the company’s sales.

Solution:

Coefficient of Variation of Thomas’s sales=$\frac{70}{700}$×100%=10%

Coefficient of Variation of Peter’s sales=$\frac{100}{800}$×100%=12.5%

Since the lower coefficient of variation indicates the data is more stable, uniform, and consistent, Thomas is the employee who consistently contributes to the company’s sales.

## Summary

The measure of dispersion helps to determine the degree to which numerical data tends to spread around an average value. There are absolute measures of dispersion and relative measures of dispersion. The standard deviation could be identified as the most important absolute measure of dispersion and the coefficient of variation could be identified as the most important relative measure of dispersion.