High Quality Forest Plots in R GGPLOT2

A forest plot is a very efficient way to present the results of an analysis that compares two groups for several populations or subgroups. It shows all the important information together in a single figure.

Forest plots usually consist of multiple plots and tables. In this post, we will create each individual figure and table separately with the ggplot2 package. Then we will merge all the individual plots and tables together using the grid.arrange() function in the gridExtra package.

The structure of the datasets used to create the tables and figures in this post is as follows:

Dataset (statsrash) used to generate Plot 1

Dataset (diffci_3) used to generate Plot 2, Table 1, and Table 2

Generate the Separate Panels of the Forest Plot

Plot 1

p1 <-ggplot(data=statsrash, aes(x=factor(country), y=mean_r, fill=factor(productid))) +
#Creates a bar graph
        geom_bar(stat="identity", width=0.5, position=position_dodge()) +
#Flips the axes to create horizontal bars
        coord_flip() +
        ggtitle("Mean per Group") +
#Creates a horizontal line on top of the graph
        geom_vline(xintercept=4.6, size=2) + 
        labs(y = "Mean % of Rash Left (cm sq.)", x = "Country", fill = "Group") +
        scale_y_continuous(breaks=seq(0,100,20), limits=c(-1,101), expand=c(0,0) ) +
        scale_fill_manual(values=c("#E0EEEE", "#5F9EA0")) + 
        theme_classic(base_size=14) +
        theme(
          #Colors the y-axis line white to make it invisible and setting the length of ticks
          axis.ticks.length=unit(0.3,"cm"),
          axis.line.y = element_line(colour = "white"),
          axis.line.x = element_line(size = 0.6),
          #Use this to control the legend position
          plot.title = element_text(hjust =0.5),
          legend.position=c(0.8, 0.1),
          legend.title=element_blank()
        )

p1

plot mean values as bar graph — Plot 1: Bar graph showing the mean value per group for the different populations

Removing the y-axis line may not be necessary in your case. I personally like the look of the bars without the y-axis line.
Depending on how data appear on your graph, you may need to adjust the position of the legend, such that it is clearly visible and does not cover a section of the plot.

Plot 2

p2 <- ggplot(diffci_3, aes(x=diff, y=country, shape=country), color="black") +
#Add dot plot and error bars
        geom_errorbar(aes(xmin = lowerCL, xmax = upperCL), width = 0.25) +
        geom_point(size = 2.5) + 
        ggtitle("Group B minus Group A") +
#Add a line above graph
        geom_hline(yintercept=4.6, size=2) + 
#Add a reference dashed line at 20
        geom_vline(xintercept = 20, linetype = "longdash") + 
        labs(x="Difference and 95% CI", y = "Country") +
        scale_x_continuous(breaks=seq(-20,80,20), limits=c(-20,80), expand=c(0,0) ) +
        scale_shape_manual(values=c(15,16,17,18)) +
        theme_gray(base_size=14) +
#Remove legend
#Also remove y-axis line and ticks
        theme(legend.position = "none",
              plot.title = element_text(hjust =0.5),
              axis.line.x = element_line(size = 0.6),
              axis.ticks.length=unit(0.3,"cm"),
              axis.text.y  = element_blank(),
              axis.line.y = element_blank(),
              axis.ticks.y = element_blank(),
              axis.title.y  = element_blank()
              )

p2

Forest plot per population — Plot 2: Forest plot showing the group difference (B-A) per population

Plot 1 and Plot 2 will be placed side-by-side. Therefore, if a y-axis range is specified in Plot 1, the same range should be used in Plot 2, to have both plots align properly in the final forest plot below.

Table 1

t1 <- ggplot(data=diffci_3) +
  geom_text(aes(y=country, x=1, label= paste0(round(diff, digits=2))), vjust=0) +
  #Add a line above graph
  geom_hline(yintercept=4.6, size=2) + 
  ggtitle("Difference") +
  xlab("  ") +
  theme_classic(base_size=14) +
  theme(
    axis.line.y = element_blank(),
    axis.line.x = element_line(color = "white"),
    axis.text.y  = element_blank(),
    axis.ticks.y  = element_blank(),
    axis.ticks.x = element_line(color = "white"),
    axis.ticks.length=unit(0.3,"cm"),
    axis.title.y  = element_blank(),
    axis.text.x = element_text(color="white"),
    plot.title = element_text(hjust =0.5)
  )

t1

The x-axis is not necessary here, hence it is removed (colored white). Even though the y-axis is not shown here, it should align with the above plots in the final plot.

Table 2

t2 <- ggplot(data=diffci_3) +
  geom_text(aes(y=country, x=1, label= paste0("(",round(lowerCL, digits=2),"; ", round(upperCL, digits=2),")")), vjust=0) +
#Add a line above graph
  geom_hline(yintercept=4.6, size=2) + 
  ggtitle("90% CI") +
  xlab("  ") +
  theme_classic(base_size=14) +
  theme(
    axis.line.y = element_blank(),
    axis.line.x = element_line(color = "white"),
    axis.text.y  = element_blank(),
    axis.ticks.y  = element_blank(),
    axis.ticks.x = element_line(color = "white"),
    axis.ticks.length=unit(0.3,"cm"),
    axis.title.y  = element_blank(),
    axis.text.x = element_text(color="white"),
    plot.title = element_text(hjust =0.5)
  )

t2

Similar to Table 1 above, the x-axis has been removed. The y-axis range is similar to that of the table and plots above. Everything should align properly in the final forest plot below.

Combine all Panels to Create a Single Figure

#Install and load the gridExtra package to put together the individual components
install.packages("gridExtra")
library(gridExtra)

#Put the individual components of the forest plot together
fplt <- grid.arrange(p1, p2, t1, t2, widths=c(4,4,1,2))

Combine Plot and Table in R GGplot2

Byauthor November 11, 2022January 8, 2023

When presenting data on a plot, it is sometimes useful to also show a table with the actual numbers next to the plot. In some cases (e.g., mean profile plots and forest plots), the x-axes or y-axes of the individual plots/tables need to align with each other. There are several methods in combine plots and…

Bar graph

Stacked Column Chart and Clustered Column Chart in R GGplot

Byauthor March 27, 2022January 8, 2023

Stacked Column Chart A stacked column chart also known as a stacked bar chart is a type of bar graph that stacks the categories of a group on top of each other. Each stack (category) is usually presented in a different color. The height of each stack is proportional to the percentages of that category….

Mean plot

Forest Plot in R

Byauthor March 13, 2022January 8, 2023

Forest plots are useful when conducting meta-analyses, to visualize the results of multiple scientific studies in a single graph. Forest plots are also commonly used in subgroup analyses to visualize the results of the different categories of a subgroup, or different categories from multiple subgroups on a single graph. The Dataset In this post, we…

Bar graph

How to Create a Bar Graph in R

Byauthor May 16, 2021January 8, 2023

Bar graphs, also known as Bar charts, are one of the most commonly used methods of visualizing grouped data. The length of each bar represents the magnitude of the data it is presenting. Bar graphs can be plotted horizontally or vertically. Bar graphs can easily be created in R using the ggplot2 package with the…

Annotate | Bar graph | General

Annotate with Geom_text in GGplot2

Byauthor October 2, 2022January 8, 2023

The function geom_text() can be used to annotate on graphs in GGplot2. Geom_text() uses data frames to annotate. Other functions such as annotate() can also be used to annotate in GGplot2. The annotate() function is described in this other post. The dataset The following dataset is used below to illustrate how to annotate on a…

Mean plot | Scatterplot

Plot Means and Standard Deviations in R ggplot2

Byauthor January 25, 2023

Mean values and standard deviations (SD) can easily be plotted using the ggplot2 package in R. The same approach can also be used to plot the means and confidence intervals. In that case, you plot the lower and upper confidence limits instead of the mean-SD and mean+SD respectively. First, we compute the means and standard…

4 Comments

Hi. I loved the example. I was a little confused. Would you be able to help me? In the example with the dataset (diffci_3) what would be “lowerCL” and “upperCL”?

Helper says:

March 25, 2024 at 9:00 am

Hi Nivea,
thanks for stopping by and leaving a comment.
In the above example, “lowerCL” and “upperCL” are the confidence limits (i.e., the confidence interval [CI]), which you should calculated before creating your plot. These two variables will be used to plot the error bars on both sides of the “difference”.

Depending on what you want to present, you could also replace the CI with another statistic. For example, you can use the standard error or the standard deviation to derive the error bars.
I hope this answers you question.
Good luck.

Reply

Hi, Thanks very much for the code. really helpful. I was wondering if it is possible to add a p-value exponent (scientific) notation? For example, something like `expression(1.9%*%10^-08)`

Thanks

Helper says:

October 11, 2023 at 6:42 pm

Similar to the way the difference and the 90% CI are added to the graph (as illustrated in the post), you also create a table with the p-values.

I am not sure why you need to express your p-values in scientific notation. Assuming the level of significance you are working with is not very small to warrant using scientific notation, I would recommend you round your p-values to about 3 or 4 decimal places. For example, you can show a small p-value as<.0001.

However, if you really need to use the scientific notation, try the following in the geom_text function:
geom_text(aes(y=country, x=1, label= expression(paste0(round(pValues_in, digits=2))) ), vjust=0)

"pValues_in" represents the variable holding the p-values.

Let us know whether or not this helps.

Reply