How to Plot a Line Graph in R with GGplot2
A line graph or line plot is a type of graph typically used to visualize longitudinal data. Assume we had a longitudinal dataset consisting of the fasting blood glucose from two groups of 25 diabetic patients collected every morning for 10 days. One group is on a special program to improve their lifestyle and hence blood glucose. The other group is not on any program or medication, this is the control group. Unfortunately, we don’t have such a dataset. But we can generate a fake one in R.
Follow this link to see how we generated the dummy longitudinal data in R.
We will use the line graphs below to visualize the blood glucose data at the individual patient level, to pick up any trend in the two groups. The function that generates line graphs in the ggplot2 package is geom_line().
Spaghetti Plot
We can plot a line for each patient in a single graph. This results in a single graph with 50 lines, where each line represents the blood glucose over time for each patient. This type of line graph is known as a spaghetti plot.
#Spaghetti plot
ggplot(glcdata3, aes(x=timedays, y=bloodglc2, group=patientn, color=factor(patientn))) +
geom_line() +
labs(color="Patient ID", x="Day", y = "Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(1,10)) +
ylim(120, 200) +
theme_classic()
The R codes generate a spaghetti plot using the dummy data above. The first line in the code specifies the name of the dataset (glcdata3), and the name of the x-axis and y-axis variables. The parameters group=patientn and color=factor(patientn) indicate that each patient profile should be plotted separately and shown in a different color.
Geom_line() is the type of graph we want, i.e., a line graph.
The next line specifies the legend title, and the x and y-axis labels. The last three lines are for the x and y-axis range, and the theme.
The Same Color per Group
#A. All patients in the same group are assigned the same default color
sp <- ggplot(glcdata3, aes(x=timedays, y=bloodglc2, group=patientn, color=groupid)) +
geom_line() +
labs(color="Group ID", x="Day", y = "Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(1,10)) +
ylim(120, 200) +
theme_classic()
sp
#B. Remove color=groupid from the above R code to get all profiles in the same color.
To get all the patient lines in the same color (the default is black), remove the parameter color=groupid from both locations in the above code. This will also automatically remove the legend. But if you want to choose your own colors, then you would need one or two more lines of code, as illustrated in the R code below.
Choose Your Colors
The function scale_color_manual() is used to specify the colors of the spaghetti plots. So let’s now add that to the above R code of the first graph below. We have chosen the hexadecimal colors ’29ab87′ and ‘#4a4b7b’. In the second graph, we show all the lines in brown and removed the legend, using the function scale_colour_manual() and theme() functions respectively.
#C. Specify colors
sp + scale_colour_manual(values = c("#29ab87","#4a4b7b"))
#D. All patient profile with the same color (brown) and remove legend
ggplot(glcdata3, aes(x=timedays, y=bloodglc2, group=factor(patientn), colour="")) +
geom_line() +
labs(x="Day", y = "Fasting Blood Glucose (mg/dl)") +
scale_x_continuous(breaks=seq(1,10,1), limits = c(1,10)) +
ylim(120, 200) +
scale_colour_manual(values = c("brown")) +
theme_classic() +
theme(legend.position = "none")