Plot Means and Standard Deviations in R ggplot2
Mean values and standard deviations (SD) can easily be plotted using the ggplot2 package in R. The same approach can also be used to plot the means and confidence intervals. In that case, you plot the lower and upper confidence limits instead of the mean-SD and mean+SD respectively.
First, we compute the means and standard deviations. We will use the iris dataset for this example. Let’s calculate the mean and SD of the petal widths per species. Then we add and subtract the SD from the means to get the lower and upper error bars for plotting.
The script should work after installing and loading the tidyverse package or dplyr package.
#Install and load tidyverse
#install.packages("tidyverse")
library(tidyverse)
#Compute means and 95% confidence intervals
swstats <- iris %>%
group_by(Species) %>%
summarise(
count = n(),
mean = mean(Petal.Width,na.rm=TRUE),
stddev = sd(Petal.Width, na.rm=TRUE),
meansd_l = mean - stddev,
meansd_u = mean + stddev
)
The resulting output is a data frame (named swstats) that looks like this:
If you are interested in plotting the confidence interval instead of the SD, then the confidence intervals should be calculated using the summarise() function.
The following script would generate a plot with petal width on the y-axis and species on the x-axis.
#Mean plot per species
a <- ggplot(swstats, aes(x=Species, y=mean)) +
geom_point()
a
The following script adds the error bars (mean – SD and mean + SD) to the plot.
#Add the error bars
b <- a + geom_errorbar(aes(ymin = meansd_l, ymax = meansd_u), width=0.1)
b
geom_errorbar() is the function that plots the error bars. The parameters ymin= and ymax= take the variables for the lower (mean – SD) and upper (mean + SD) bars, respectively.
Then we add the individual data points with the following:
#Add the individual data points
b + geom_point(data=iris, aes(x=Species, y=Petal.Width))
The geom_point() function is used for the second time to add the individual data points.
There are points overlapping or too close to each other. This is fixed below.
Finally, we add jittering to the individual data points:
#Add jitters to the individual data points
c <- b + geom_point(data=iris, aes(x=Species, y=Petal.Width), position = position_jitter())
c
Jittering is applied using the function parameter position=position_jitter(). This reduces the overlap between the points. However, it is still a bit difficult to differentiate the mean points from the individual points. We will fix this below with some styling.
Here is the complete R script with some styling added. Labels for the x- and y-axes were also added:
#The complete script with some styling added
ggplot(swstats, aes(x=Species, y=mean)) +
#Now plotting the individual data points before the mean values
geom_point(data=iris, aes(x=Species, y=Petal.Width), position = position_jitter(), color="#D0C6C4") +
geom_point() +
geom_errorbar(aes(ymin = meansd_l, ymax = meansd_u), width=0.1) +
ylim(0, 3) +
labs(x="Species", y = "Mean (-/+SD) Petal Width (cm)") +
theme_bw(base_size = 14)