Home » Math Theory » Statistics and Probability » Box and Whisker Plots

Box and Whisker Plots

Introduction

A box-and-whisker plot is a histogram like method of displaying data. It was first introduced by J. Tukey in 1970. However, the arrangement of the box and whisker that is in use today slightly varies from what was proposed by J Turkey. For instance, Tukey’s original formulation did not have horizontal crossbars. He used extended the whiskers all the way to the extreme data points and drew an unfilled dot at the maximum along with a hatched horizontal strip at the minimum. Tukey also considered an additional variation in which the outliers are indicated separately and whiskers are dashed, ending with dashed crossbars at “adjacent values”. Now before we move further and learn more about the box and whisker plots, we must first recall some important terms that are relevant to the understanding of the box and whisker plots.

Median – The median of a group of observations is the value of the variable which divides the group of n numbers into two equal parts. If n is odd, then use the following formula –

Median = Value of $(\frac{n+1}{2}))$th observation

If n is even, then use the following formula

Median = $\frac{Value\:of\:\frac{n}{2}^{th}\:observation\:+\:Value\:of\:(\frac{n}{2}+1)^{th}\:observation}{2}$

Quartiles – A Quartile is a percentile measure that divides the total of 100% into four equal parts: 25%,50%,75%  and  100%. A particular quartile is a border between two neighbouring quarters of the distribution.

Now that we have recalled what is meant by median and quartile, let us learn what we mean by box and whisker plots.

What are Box and Whisker Plots?

The box and whisker plot, which is also known as simply the box plot, is a type of graph that helps visualize the five-number summary. These five numbers are median, upper and lower quartile, minimum and maximum data values which are also known as extremes.

In other words, Box and Whisker Plots are a standardized way of displaying the distribution of data based on a five-number summary (minimum, first quartile (Q1), median, third quartile (Q3), and “maximum”). Some important terms relevant for obtaining these five numbers are –

1. Median – The middle value of the dataset.
2. First quartile – It is the middle number between the smallest number and the median of the dataset.
3. Third quartile – It is the middle value between the median and the highest value of the dataset.
4. Interquartile range – it ranges from the 25th to the 75th percentile.
5. Whiskers – These are the lines that extend from the boxes. They are used to indicate variability out of the upper and lower quartiles.
6. Outliers – If a data value is very far away from the quartiles (either much less than Q1 or much greater than Q3 ), it is termed as an outlier.
7. Maximum – It is the highest value in a given dataset.
8. Minimum – It is the lowest value in a given dataset.

Let us now understand the description of a box and a whisker plot.

Description of a Box and Whisker Plot

The following is the diagrammatic representation of a box and whisker plot.

Let us now understand each representation in this diagram of the box and whisker plot. It can be clearly seen in the above diagram that –

1. The left and right sides of the box are the regions that mark the lower as well as the upper quartiles. The interquartile interval is covered by the box, where 50% of the data is present.
2. Secondly, the median is that line that is placed vertically at the centre of the box in a manner that splits the box into two. There are a number of times when the mean is also indicated using a dot or a cross on the box plot.
3. Next, there can be seen two lines that are present outside the box. These lines are the whiskers. These whiskers can go from the minimum to the lower quartile (which is present at the start of the box) and then it can go from the upper quartile (which is present at the end of the box) to the maximum.

It is important to note here that another characteristic of the box and the whisker plot is the presentation of a graph with an axis that is used for indicating the values. Moreover, there are two ways of representing the box and the whisker plot – horizontal or vertical manner.

An important aspect of the box and the whisker plot is that a variation of the box and whisker plot restricts the length of the whiskers to a maximum of 1.5 times the interquartile range. This means that the whisker reaches the value that is the furthest from the centre while still being inside a distance of 1.5 times the interquartile range from the lower or upper quartile. Data points that are outside this interval are represented as points on the graph and considered potential outliers.

Now, let us understand how to read a box and whisker plot diagram.

How to Read a Box and Whisker Plot

We have now learnt that a Box and a Whisker plot is a tool that is used to show the distribution of a dataset. Now, if we are given a box and a whisker plot, how do we read it? Rather, how do we interpret a box and a whisker plot? Let us find out.

Let us consider an example. Suppose we are given the following box and whisker plot and we intend to read it.

The following steps will be followed for the interpretation of the above box and whisker plot.

1. First, we will find the minimum. We have learnt that the minimum is the far left-hand side of the graph, at the tip of the left whisker. For this graph, the left whisker end is at approximately 0.75.
2. Next, we will find Q, the first Quartile. We have learnt that Q1 is represented by the far left hand side of the box. In this case, we can see that Q1 will be about 2.5.
3. In the third step, we will find the median. We have learnt that the median is represented by the vertical bar that is present at the centre of the box. In this box and whisker plot, we can clearly see that the median is present at about 6.5.
4. In Step 4, we will find Q3, the third quartile. We have learnt that Q3 is the far right hand edge of the box. In this box and whisker plot, we can see that Q3 is at about 12 in this graph.
5. We will find the maximum in the next step. We have learnt that the maximum is the end of the whiskers. In this case, we can see that the maximum is at approximately 16.

With this, we have found all the five number values that are represented in a box and whisker plot which completes our interpretation of this box plot.

Now, that we have learnt how to read a box and whisker plot, we shall proceed to learn How to Draw a Box and Whisker Plot.

How to Draw a Box and Whisker Plot?

The following steps are used to plot a box and whisker plot –

1. When you have a row of values, you have to find the lowest and highest value together with Q1, the median and Q3.
2. You draw a number line that will be the horizontal axis. It is important to note here that if your values are, for example, about time in hours this will also be the label of the axis.
3. Every number mentioned above will get a vertical line above the axis.
4. After that, you can make the ‘box’ and draw the lines, known as whiskers between Q1 and the lowest value and between Q3 and the highest value.

Let us understand this by an example.

Suppose we have been given the following data set and we intend to graph this data set using a box and whisker plot.

1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 11.5

Let us now use the above data set and graph it using a box and whisker plot.

For plotting the graph, we will have to obtain the five number values that are integral to the box and whisker plots.

From the above data, it is clear that –

The first quartile, Q1 of this data set = 2

The median of the given data set = 7

The third quartile, Q3 of this data set = 9

The minimum or the smallest value of this data set  = 1

The maximum or the largest value of this data set  = 11.5

Now, that we have obtained all the five numbers we will plot these values in the box and whisker plot as shown below –

Now, that we have understood how to read as well as graph the box and whisker plots, it is important to learn about the uses as well as the advantages of these box and whisker plots.

Let us consider another example.

Example

Luke never feels like answering the phone, so he never picks up. He writes down how many times the phone rings before they hang up.

7, 3, 8, 6, 8, 5, 4, 5, 3, 6, 2, 6, 9, 1, 2, 7, 5, 8, 7, 6.

Draw a box plot for this data.

Solution The numbers in ascending order are:

1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9

The median is 6, Q1 = 3.5 and Q3 = 7

Uses of Box and Whisker Plots

The following are the uses of Box and Whisker Plots –

1. Box and Whisker plots are among the most used types of graphs in the business, statistics and data analysis.
2. Box and Whisker plots don’t show the distribution in as much detail as histogram does, but they are especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations or outliers in the data set.
3. A Box and Whisker plot is ideal for comparing distributions because the centre, spread and overall range are immediately apparent.
4. Although Box and Whisker plots may seem primitive in comparison to a histogram or density plot, they have the advantage of taking up less space, which is useful when comparing distributions between many groups or datasets. These plots are also widely used for comparing two data sets.

Let us now look at the advantages of using Box and Whisker plots.

Advantages of using Box and Whisker plots

The major advantages of using Box and Whisker plots include –

1. The box and whisker plots have the ability to handle and present a large amount of data.
2. The box and whisker plots are regarded as a visually effective method of viewing a summary.
3. The box and whisker plot is a great tool for graphical representation of the display of outliers.
4. The box and whisker plots are a perfect tool for the comparison of two or more datasets.

We have learnt how to graph a box and whisker plot. We have learnt in the uses of the Box and whisker plots above that they can be used for comparative analysis as well. Let us now see how we can do this.

Using Box and Whisker Plots for Comparative Analysis

One of the most important uses of the box and whisker plot is that it is an ideal means of comparing many samples at once, in a way that would be impossible to do using a histogram. Box and whisker plots of the individual samples can be lined up side by side on a common scale and the various attributes of the samples are compared at a glance. Let us understand this using an example.

Suppose a company has two stores that sell televisions. The company recorded the number

of sales each store made each month. In the past 12 months, the following numbers of sold televisions –

Store 1: 350, 460, 20, 160, 580, 250, 210, 120, 200, 510, 290, 380.

Store 2: 520, 180, 260, 380, 80, 500, 630, 420, 210, 70, 440, 140

Now, let us graph the above two data sets using box and whisker plots.

In order to compare the two stores sales performance, we will make two box and whisker plots, one for Store 1 and one for Store 2.

Let us do it for Store 1 first.

At first, we will place the values of the store 1 in ascending order. We will get

Store 1 – 20, 120, 160, 200, 210, 250, 290, 350, 380, 460, 510, 580

Now, we will find the median of this data. We get,

Median  = ( 250 + 290 ) / 2 = 540 / 2 = 270

Now let’s see what happens with the lower and upper quartiles in an even data set.  We can see that there are six numbers below the median, namely: 20, 120, 160, 200, 210, 250.

We know that the lower quartile is the median of these six items. Therefore

Lower Quartile = (third + fourth data point) / 2 = (160 + 200) / 2 = 180

Similarly, we can see that there are also six numbers above the median, namely: 290, 350, 380, 460, 510 580 and  we know that upper quartile is the median of these six data points.

Therefore, Upper quartile = (third + fourth data points) / 2 = 420

So, now we have the five number values of Store 1 as –

Median = 270

Minimum = 20

Maximum = 580

Q1 = 180

Q3 = 420

Now, we perform the same calculations for Store 2.

At first, we will place the values of the store 2 in ascending order. We will get

70, 80, 140, 180, 210, 260, 380, 420, 440, 500, 520, 630

Median  = ( 260 + 380 ) / 2 = 640 / 2 = 320

Now let’s see what happens with the lower and upper quartiles in an even data set.  We can see that there are six numbers below the median, namely: 70, 80, 140, 180, 210, 260

We know that the lower quartile is the median of these six items. Therefore

Lower Quartile = (third + fourth data point) / 2 = (140 + 180) / 2 = 320 / 2 = 160

Similarly, we can see that there are also six numbers above the median, namely: 380, 420, 440, 500, 520, 630 and we know that the upper quartile is the median of these six data points.

Therefore, Upper quartile = (third + fourth data points) / 2 = ( 440 + 500 ) / 2 = 940 / 2 = 470

So, now we have the five number values of Store 2 as –

Median = 320

Minimum = 70

Maximum = 630

Q1 = 160

Q3 = 470

Now, that we have all the values, we will graph these values on the box and whisker plot as shown below –

Now, let us see how to interpret the results –

We can see from the box and whisker plot that store 2’s highest and lowest sales are both higher than Store 1’s relevant sales. Also, additionally, store 2’s median sales value is higher than Store 1’s. Also, Store 2’s interquartile range is larger. Hence, these results show us that Store 2 consistently sells more computers than Store 1.

Key Facts and Summary

1. The box and whisker plot, sometimes simply called the box plot, is a type of graph that helps visualize the five-number summary. These five numbers are median, upper and lower quartile, minimum and maximum data values which are also known as extremes.
2. The median of a group of observations is the value of the variable which divides the group of n numbers into two equal parts.
3. A Quartile is a percentile measure that divides the total of 100% into four equal parts: 25%,50%,75%  and  100%.