High Quality Forest Plots in R GGPLOT2
A forest plot is a very efficient way to present the results of an analysis that compares two groups for several populations or subgroups. It shows all the important information together in a single figure.
Forest plots usually consist of multiple plots and tables. In this post, we will create each individual figure and table separately with the ggplot2 package. Then we will merge all the individual plots and tables together using the grid.arrange() function in the gridExtra package.
The structure of the datasets used to create the tables and figures in this post is as follows:


Generate the Separate Panels of the Forest Plot
Plot 1
p1 <-ggplot(data=statsrash, aes(x=factor(country), y=mean_r, fill=factor(productid))) +
#Creates a bar graph
geom_bar(stat="identity", width=0.5, position=position_dodge()) +
#Flips the axes to create horizontal bars
coord_flip() +
ggtitle("Mean per Group") +
#Creates a horizontal line on top of the graph
geom_vline(xintercept=4.6, size=2) +
labs(y = "Mean % of Rash Left (cm sq.)", x = "Country", fill = "Group") +
scale_y_continuous(breaks=seq(0,100,20), limits=c(-1,101), expand=c(0,0) ) +
scale_fill_manual(values=c("#E0EEEE", "#5F9EA0")) +
theme_classic(base_size=14) +
theme(
#Colors the y-axis line white to make it invisible and setting the length of ticks
axis.ticks.length=unit(0.3,"cm"),
axis.line.y = element_line(colour = "white"),
axis.line.x = element_line(size = 0.6),
#Use this to control the legend position
plot.title = element_text(hjust =0.5),
legend.position=c(0.8, 0.1),
legend.title=element_blank()
)
p1

Removing the y-axis line may not be necessary in your case. I personally like the look of the bars without the y-axis line.
Depending on how data appear on your graph, you may need to adjust the position of the legend, such that it is clearly visible and does not cover a section of the plot.
Plot 2
p2 <- ggplot(diffci_3, aes(x=diff, y=country, shape=country), color="black") +
#Add dot plot and error bars
geom_errorbar(aes(xmin = lowerCL, xmax = upperCL), width = 0.25) +
geom_point(size = 2.5) +
ggtitle("Group B minus Group A") +
#Add a line above graph
geom_hline(yintercept=4.6, size=2) +
#Add a reference dashed line at 20
geom_vline(xintercept = 20, linetype = "longdash") +
labs(x="Difference and 95% CI", y = "Country") +
scale_x_continuous(breaks=seq(-20,80,20), limits=c(-20,80), expand=c(0,0) ) +
scale_shape_manual(values=c(15,16,17,18)) +
theme_gray(base_size=14) +
#Remove legend
#Also remove y-axis line and ticks
theme(legend.position = "none",
plot.title = element_text(hjust =0.5),
axis.line.x = element_line(size = 0.6),
axis.ticks.length=unit(0.3,"cm"),
axis.text.y = element_blank(),
axis.line.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank()
)
p2

Plot 1 and Plot 2 will be placed side-by-side. Therefore, if a y-axis range is specified in Plot 1, the same range should be used in Plot 2, to have both plots align properly in the final forest plot below.
Table 1
t1 <- ggplot(data=diffci_3) +
geom_text(aes(y=country, x=1, label= paste0(round(diff, digits=2))), vjust=0) +
#Add a line above graph
geom_hline(yintercept=4.6, size=2) +
ggtitle("Difference") +
xlab(" ") +
theme_classic(base_size=14) +
theme(
axis.line.y = element_blank(),
axis.line.x = element_line(color = "white"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.x = element_line(color = "white"),
axis.ticks.length=unit(0.3,"cm"),
axis.title.y = element_blank(),
axis.text.x = element_text(color="white"),
plot.title = element_text(hjust =0.5)
)
t1

The x-axis is not necessary here, hence it is removed (colored white). Even though the y-axis is not shown here, it should align with the above plots in the final plot.
Table 2
t2 <- ggplot(data=diffci_3) +
geom_text(aes(y=country, x=1, label= paste0("(",round(lowerCL, digits=2),"; ", round(upperCL, digits=2),")")), vjust=0) +
#Add a line above graph
geom_hline(yintercept=4.6, size=2) +
ggtitle("90% CI") +
xlab(" ") +
theme_classic(base_size=14) +
theme(
axis.line.y = element_blank(),
axis.line.x = element_line(color = "white"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.ticks.x = element_line(color = "white"),
axis.ticks.length=unit(0.3,"cm"),
axis.title.y = element_blank(),
axis.text.x = element_text(color="white"),
plot.title = element_text(hjust =0.5)
)
t2

Similar to Table 1 above, the x-axis has been removed. The y-axis range is similar to that of the table and plots above. Everything should align properly in the final forest plot below.
Combine all Panels to Create a Single Figure
#Install and load the gridExtra package to put together the individual components
install.packages("gridExtra")
library(gridExtra)
#Put the individual components of the forest plot together
fplt <- grid.arrange(p1, p2, t1, t2, widths=c(4,4,1,2))

Your code is really helpful!! Thank you so much for sharing it. I was wondering why my p2 can’t be aligned with t1 and t2? What could be the possibility of them not aligned?
Hi. I loved the example. I was a little confused. Would you be able to help me? In the example with the dataset (diffci_3) what would be “lowerCL” and “upperCL”?
Hi Nivea,
thanks for stopping by and leaving a comment.
In the above example, “lowerCL” and “upperCL” are the confidence limits (i.e., the confidence interval [CI]), which you should calculated before creating your plot. These two variables will be used to plot the error bars on both sides of the “difference”.
Depending on what you want to present, you could also replace the CI with another statistic. For example, you can use the standard error or the standard deviation to derive the error bars.
I hope this answers you question.
Good luck.
Hi, Thanks very much for the code. really helpful. I was wondering if it is possible to add a p-value exponent (scientific) notation? For example, something like `expression(1.9%*%10^-08)`
Thanks
Similar to the way the difference and the 90% CI are added to the graph (as illustrated in the post), you also create a table with the p-values.
I am not sure why you need to express your p-values in scientific notation. Assuming the level of significance you are working with is not very small to warrant using scientific notation, I would recommend you round your p-values to about 3 or 4 decimal places. For example, you can show a small p-value as<.0001.
However, if you really need to use the scientific notation, try the following in the geom_text function:
geom_text(aes(y=country, x=1, label= expression(paste0(round(pValues_in, digits=2))) ), vjust=0)
"pValues_in" represents the variable holding the p-values.
Let us know whether or not this helps.