Mean Profile Plot in R
A mean profile plot is used to visualize the evolution of a variable measured over time. In a previous, we used a dataset of blood glucose over time as an example. We created a dummy dataset consisting of two groups of 25 diabetic patients with their blood glucose measured consecutively for 10 days.
To plot the mean blood glucose over time, we first have to calculate the mean value per time point. We did so using the summarise() and other functions in the dplyr package. An easy way to get the dplyr, ggplot2, and other packages with useful functions for generating R graphs is by installing and loading the tidyverse package.
#installing and load tidyverse package
install.packages("tidyverse")
library(tidyverse)
#calculate mean, SD, SE and 95% CI per time point for each group
statsBG <- glcdata3 %>%
group_by(groupid,timedays) %>%
summarise(
count = n(),
meanBG = mean(bloodglc2,na.rm=TRUE),
sdBG = sd(bloodglc2, na.rm=TRUE),
seBG = sdBG/sqrt(count),
ci95lower = meanBG - seBG*1.96,
ci95upper = meanBG + seBG*1.96
)
The summarise() function was used to calculate the number of data points, mean, standard error, standard deviation, and the 95% confidence interval, per time point for each group.
Mean Profile Plots in Separate Graphs
Let’s visualize the mean blood glucose per day for each group in a separate graph. We use the functions geom_line() and geom_point() to plot the mean values calculated above. Geom_line() connects the lines from one day to another, and geom_point() places a symbol at each point plotted in the graph. We did not specify what symbol should be used so the default (dot) is used. The facet_wrap() and the specified arguments create two separate graphs (for the two groups: ~groupid ) side-by-side (based on the argument nrow=1) in two separate panels on a single row.
#mean profiles in separate graphs
ggplot(statsBG, aes(x=timedays, y=meanBG)) +
geom_line() +
geom_point() +
facet_wrap(~groupid, nrow = 1) +
labs(x="Day", y = "Mean Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(1,10)) +
ylim(120, 200) +
theme_bw()
Mean Profile Plots in the Same Graph
Now we want both groups in the same graph. So we removed facet_wrap() and specified color=groupid and shape=groupid parameters to assign different colors and symbols to the groups. Finally, scale_color_manual() and scale_shape_manual() functions provide the specific colors and symbols we want rather than going with the default colors and symbols.
#mean profiles in the same graphs
mp <- ggplot(statsBG, aes(x=timedays, y=meanBG, color=groupid, shape=groupid)) +
geom_line() +
geom_point(size = 2.5) +
labs(color="Group ID", x="Day", y = "Mean Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(0.5,10.5)) +
ylim(120, 200) +
scale_color_manual(values = c("#55acee", "#bb4444")) +
scale_shape_manual(values=c(15,16)) +
guides(shape=FALSE) +
theme_bw()
mp
The function guides(shape=) is used to remove the legend for the symbols/shape. Otherwise, we will get two legends one for color and one for shape/symbol.
The function theme_bw() helps us get a more aesthetically pleasing figure.
See below regarding the parameter limits = c(0.5,10.5).
Adding Errorbars to the Mean Points
It might be interesting to see the spread of the data. So we tend to also plot the error bars in addition to the means. First, we need to choose which statistic to use as error bars. Confidence intervals, standard error, and standard deviation can all be used, it all depends on your objective. In our example below, we will use the 95% confidence interval as error bars, in the function geom_errorbar().
#Adding errorbars (95% confidence interval) to the mean profile
mp + geom_errorbar(aes(ymin = ci95lower, ymax = ci95upper), width = 0.5)
The variable in the dataset holding the value of the lower limit is assigned to the parameter ymin=, and the variable holding the upper limit is assigned to ymax=.
Missing Errorbars with geom_errorbars()
Regarding, limits=c(0.5,10.5), the min and max values on the x-axis were set a bit beyond the x-axis range, to avoid missing error bars on Day 1 and Day 10.
Mean Profiles Overlaid on Individual Profiles
Finally, we want to view the mean profile on top of the individual profiles or spaghetti plots. So we use the first two geom_line() calls to plot the individual profile or spaghetti plot for Group 1 and Group 2 separately. The second geom_line() call is to plot the mean profiles on top of the spaghetti plot. Next, we add the error bars with geom_errorbar(), before calling geom_point() to overlay the symbols for each mean value on top of everything.
#Mean profile and errorbars overlaid on the individual profiles
ggplot(statsBG, aes(x=timedays, y=meanBG, color=groupid, shape=groupid)) +
geom_line(aes(x=timedays, y=bloodglc2, group=patientn, color=" Group 1 Patients"), data=glcdata3[glcdata3$groupid == 'Group 1',]) +
geom_line(aes(x=timedays, y=bloodglc2, group=patientn, color=" Group 2 Patients"), data=glcdata3[glcdata3$groupid == 'Group 2',]) +
geom_line() +
geom_errorbar(aes(ymin = ci95lower, ymax = ci95upper), width = 0.5) +
geom_point(size = 2.5) +
labs(color="Group ID", x="Day", y = "Mean Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(0.5,10.5)) +
ylim(130, 180) +
scale_color_manual(values = c("grey90", "grey70", "#55acee", "#bb4444")) +
scale_shape_manual(values=c(15,16)) +
guides(shape=FALSE) +
theme_bw()
In the above R codes, the first two geom_line() calls were done separately for Group 1 and Group 2 because we want to specify a different shade of grey for the individual values of each group. The argument color=” Group 1 Patients” with a space (” “) in front of the text is to align with the color specification in scale_color_manual(). The space will ensure the sorting is right: the first two grey colors will be assigned to the individual values of Group 1 and Group 2 respectively, and the other two colors will be assigned to the two mean profiles.
For the last graph with multiple geom_line() calls, it was important to place the aes() argument before the data= argument in the geom_line() call, to avoid the error: “Error: mapping
must be created by aes()
“. See more about this error in a separate ggplot2 post.