|

High Quality Forest Plots in R GGPLOT2

A forest plot is a very efficient way to present the results of an analysis that compares two groups for several populations or subgroups. It shows all the important information together in a single figure.

Forest plots usually consist of multiple plots and tables. In this post, we will create each individual figure and table separately with the ggplot2 package. Then we will merge all the individual plots and tables together using the grid.arrange() function in the gridExtra package.

The structure of the datasets used to create the tables and figures in this post is as follows:

Dataset (statsrash) used to generate Plot 1
Dataset (diffci_3) used to generate Plot 2, Table 1, and Table 2

Generate the Separate Panels of the Forest Plot

Plot 1

p1 <-ggplot(data=statsrash, aes(x=factor(country), y=mean_r, fill=factor(productid))) +
#Creates a bar graph
        geom_bar(stat="identity", width=0.5, position=position_dodge()) +
#Flips the axes to create horizontal bars
        coord_flip() +
        ggtitle("Mean per Group") +
#Creates a horizontal line on top of the graph
        geom_vline(xintercept=4.6, size=2) + 
        labs(y = "Mean % of Rash Left (cm sq.)", x = "Country", fill = "Group") +
        scale_y_continuous(breaks=seq(0,100,20), limits=c(-1,101), expand=c(0,0) ) +
        scale_fill_manual(values=c("#E0EEEE", "#5F9EA0")) + 
        theme_classic(base_size=14) +
        theme(
          #Colors the y-axis line white to make it invisible and setting the length of ticks
          axis.ticks.length=unit(0.3,"cm"),
          axis.line.y = element_line(colour = "white"),
          axis.line.x = element_line(size = 0.6),
          #Use this to control the legend position
          plot.title = element_text(hjust =0.5),
          legend.position=c(0.8, 0.1),
          legend.title=element_blank()
        )

p1
plot mean values as bar graph
Plot 1: Bar graph showing the mean value per group for the different populations

Removing the y-axis line may not be necessary in your case. I personally like the look of the bars without the y-axis line.
Depending on how data appear on your graph, you may need to adjust the position of the legend, such that it is clearly visible and does not cover a section of the plot.

Plot 2

p2 <- ggplot(diffci_3, aes(x=diff, y=country, shape=country), color="black") +
#Add dot plot and error bars
        geom_errorbar(aes(xmin = lowerCL, xmax = upperCL), width = 0.25) +
        geom_point(size = 2.5) + 
        ggtitle("Group B minus Group A") +
#Add a line above graph
        geom_hline(yintercept=4.6, size=2) + 
#Add a reference dashed line at 20
        geom_vline(xintercept = 20, linetype = "longdash") + 
        labs(x="Difference and 95% CI", y = "Country") +
        scale_x_continuous(breaks=seq(-20,80,20), limits=c(-20,80), expand=c(0,0) ) +
        scale_shape_manual(values=c(15,16,17,18)) +
        theme_gray(base_size=14) +
#Remove legend
#Also remove y-axis line and ticks
        theme(legend.position = "none",
              plot.title = element_text(hjust =0.5),
              axis.line.x = element_line(size = 0.6),
              axis.ticks.length=unit(0.3,"cm"),
              axis.text.y  = element_blank(),
              axis.line.y = element_blank(),
              axis.ticks.y = element_blank(),
              axis.title.y  = element_blank()
              )

p2
Forest plot per population
Plot 2: Forest plot showing the group difference (B-A) per population

Plot 1 and Plot 2 will be placed side-by-side. Therefore, if a y-axis range is specified in Plot 1, the same range should be used in Plot 2, to have both plots align properly in the final forest plot below.

Table 1

t1 <- ggplot(data=diffci_3) +
  geom_text(aes(y=country, x=1, label= paste0(round(diff, digits=2))), vjust=0) +
  #Add a line above graph
  geom_hline(yintercept=4.6, size=2) + 
  ggtitle("Difference") +
  xlab("  ") +
  theme_classic(base_size=14) +
  theme(
    axis.line.y = element_blank(),
    axis.line.x = element_line(color = "white"),
    axis.text.y  = element_blank(),
    axis.ticks.y  = element_blank(),
    axis.ticks.x = element_line(color = "white"),
    axis.ticks.length=unit(0.3,"cm"),
    axis.title.y  = element_blank(),
    axis.text.x = element_text(color="white"),
    plot.title = element_text(hjust =0.5)
  )

t1

The x-axis is not necessary here, hence it is removed (colored white). Even though the y-axis is not shown here, it should align with the above plots in the final plot.

Table 2

t2 <- ggplot(data=diffci_3) +
  geom_text(aes(y=country, x=1, label= paste0("(",round(lowerCL, digits=2),"; ", round(upperCL, digits=2),")")), vjust=0) +
#Add a line above graph
  geom_hline(yintercept=4.6, size=2) + 
  ggtitle("90% CI") +
  xlab("  ") +
  theme_classic(base_size=14) +
  theme(
    axis.line.y = element_blank(),
    axis.line.x = element_line(color = "white"),
    axis.text.y  = element_blank(),
    axis.ticks.y  = element_blank(),
    axis.ticks.x = element_line(color = "white"),
    axis.ticks.length=unit(0.3,"cm"),
    axis.title.y  = element_blank(),
    axis.text.x = element_text(color="white"),
    plot.title = element_text(hjust =0.5)
  )

t2

Similar to Table 1 above, the x-axis has been removed. The y-axis range is similar to that of the table and plots above. Everything should align properly in the final forest plot below.

Combine all Panels to Create a Single Figure

#Install and load the gridExtra package to put together the individual components
install.packages("gridExtra")
library(gridExtra)

#Put the individual components of the forest plot together
fplt <- grid.arrange(p1, p2, t1, t2, widths=c(4,4,1,2))
Forest plot with all panels together

Similar Posts

Leave a comment

4 Comments

  1. Hi. I loved the example. I was a little confused. Would you be able to help me? In the example with the dataset (diffci_3) what would be “lowerCL” and “upperCL”?

    1. Hi Nivea,
      thanks for stopping by and leaving a comment.
      In the above example, “lowerCL” and “upperCL” are the confidence limits (i.e., the confidence interval [CI]), which you should calculated before creating your plot. These two variables will be used to plot the error bars on both sides of the “difference”.

      Depending on what you want to present, you could also replace the CI with another statistic. For example, you can use the standard error or the standard deviation to derive the error bars.
      I hope this answers you question.
      Good luck.

  2. Hi, Thanks very much for the code. really helpful. I was wondering if it is possible to add a p-value exponent (scientific) notation? For example, something like `expression(1.9%*%10^-08)`

    Thanks

    1. Similar to the way the difference and the 90% CI are added to the graph (as illustrated in the post), you also create a table with the p-values.

      I am not sure why you need to express your p-values in scientific notation. Assuming the level of significance you are working with is not very small to warrant using scientific notation, I would recommend you round your p-values to about 3 or 4 decimal places. For example, you can show a small p-value as<.0001.

      However, if you really need to use the scientific notation, try the following in the geom_text function:
      geom_text(aes(y=country, x=1, label= expression(paste0(round(pValues_in, digits=2))) ), vjust=0)

      "pValues_in" represents the variable holding the p-values.

      Let us know whether or not this helps.