box and whisker plot

How to Create a Boxplot in R

A boxplot (also known as box and whisker plot) shows the distribution of a continuous variable. It is a robust equivalent of the mean and standard deviation. The boxplot uses 5 summary statistics: minimum, Q1, median, Q2 and maximum, to provide a better impression of the distribution of a continuous variable. It also has the ability to show extreme values (aka outliers) as discussed in the last section below.

The dataset

Let’s use the famous iris dataset to generate a few boxplots. The iris data frame is among the numerous datasets available in base-r. It contains the lengths and widths (in centimetres) of the petal and sepals of three species of the iris flower (setosa, versicolor and virginica).

The iris data frame has:

  • 5 columns (variables): Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
  • 150 rows (50 rows per iris species). Each row represents the records from a single flower

Generate Boxplots in R

The following example creates three boxplots of the sepal length (Sepal.Length), one for each iris species. In the following r code, the first line takes the name of the dataset (iris) and the aesthetics (aes…), that is, species should be plotted on the x-axis and sepal length on the y-axis. The geom_boxplot function renders the right plot, boxplot is this case.

gplot(iris, aes(x=Species, y=Sepal.Length)) + 
   geom_boxplot()

With only two line of code you get the job done. However, you may want to go a step further by using proper x- and y-axis labels. You may even want to give a graph a title, and why not display each box in a different color.

We will now add a few more functions to our existing code to make the graph more appealing.

boxplot generated in r
boxplot generated in r – not customized

ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species)) + 
  geom_boxplot()+
  labs(title="Plot of Sepal Length  per Iris Species", x="Iris Species", y = "Sepal Length (cm)") +
  scale_fill_manual(values=c("green", "red", "yellow")) +
  theme_classic()

the labs() function was used to add custom x- and y-axis labels, and also the title above the graph.

scale_fill_manual() was used to manually select the color for each boxplot. If this function is left out, then default colors will be applied.

Finally theme_classic() is used to customize the background and to give an overall clean consistent look. This is an important function to meet the high quality for presentations or publication.

How about we use hexadecimal color values to make the graph more professional? We could also flip the x- and y-axes to show the boxplots horizontally.

customized vertical boxplot in r
boxplot – customized in R
#change color and show the boxplots horizontally
ggplot(iris, aes(x=Sepal.Length, y=Species, fill=Species)) + 
  geom_boxplot()+
  labs(title="Plot of Sepal Length  per Iris Species", x="Sepal Length (cm)", y="Iris Species") +
  scale_fill_manual(values=c("#3adcc6", "#ffb6c1", "#d6deff")) +
  theme_classic()
Horizontal boxplots generated in R

All done!

The orientation of the plots were changed from vertical to horizontal by switching the variables in the x- and y-axes (x=Sepal.Length, y=Species). Don’t forget to also switch the x- and y-axis labels.

The previous colors were replaced with the hexadecimal color values in the function scale_fill_manual().

Summary Statistics of the Boxplot Explained

summary statistics of a boxplot
The five summary statistics of a boxplot

As mentioned earlier, a boxplot shows 5 summary statistics (min, Q1, median, Q3 and Max). These 5 summary statistics are illustrated in above figure. The ends of the whiskers indicate non-outlying minimum and maximum values. The minimum value here is the smallest observation that is at most 1.5 x IQR below Q1, where IQR is the interquartile range calculated as Q3 minus Q1 (Q3-Q1). The maximum value here is the largest observation that at most 1.5 x IQR above Q3. Observations beyond the whiskers are plotted individually as outliers. In the above examples, the species virginica has one outlier.

Similar Posts

Leave a comment