How to Add a Regression Line to a Scatterplot in R

In a previous post, we described how to create a scatterplot plot in R. In the current post we will focus on fitting a regression line to a scatterplot. We will first generate the scatterplot and then fit a linear regression line to the scatterplot. We will look at two ways to do this.

Method 1

The first method used below to add the regression line to the scatterplot makes use of the function geom_smooth().

#Scatterplot of height versus diameter
ggplot(trees, aes(y=Diameter, x=Volume)) + 
  geom_point(colour = "brown") +

  #add the regression line
  geom_smooth(method = "lm", se=FALSE) + 

  #axes range
  ylim(5, 25) +
  xlim(10, 100) + 

  #cosmetics
  theme_classic()

scatterplot with regression line using geom_smooth in r

The trees dataset is used to generate a scatterplot of volume versus diameter. As mentioned above, the function geom_smooth() is what adds the regression line to the scatterplot. The parameter method=lm specifies the method used to plot the line, linear regression model is this case. Other methods can be used to add a fitted line to the data. For example, if the relationship between the two variables is non-linear, a smoothing method such as loess can be used by specifying method=”loess”. Check the documentation for more details. The parameter se=FALSE is used to remove the confidence band (confidence interval of the slope) from the graph. To show the confidence band, se=TRUE should be specified, or the parameter se=…. should be omitted completely because this is the default specification. The color of the regression line can be changed by adding color=”” as an additional argument to the function. For example, adding color = “green” will show the regression line in green:

...
geom_smooth(method = "lm", se=FALSE, color="green")
...

Method 2

Another method to add a linear regression line to a scatterplot is by using the function geom_abline(). With this method, the function requires the coefficients of the regression model, that is, the y-intercept and the slope. So the linear regression model will need to be fitted to obtain the intercept and the slope. These should then be supplied to the geom_abline() function when generating the scatterplot. Let’s illustrate with the same dataset used in method 1.

#fit the linear regression model diameter versus volume to obtain the intercept and slope
fit_lm <- lm(Diameter ~ Volume, data=trees)

#view summary of results
 which were save above in the object called: fit_lm
summary(fit_lm)

ggplot(trees, aes(y=Diameter, x=Volume)) + 
  #scatterplot
  geom_point(colour = "brown") +
  #axes range
  ylim(5, 25) +
  xlim(10, 100) + 
  #cosmetics
  theme_classic()+
  
  #regression line coefficients
  geom_abline(slope = coef(fit_lm)[[2]], intercept = coef(fit_lm)[[1]])

scatterplot with regression line using geom_abline

The coef() extracts the model coefficient from the object that contains the results from the regression model. The intercept is obtained from the first position and the slope from the second position. Run coef(fit_lm) to see the position of the coefficients.

To specify a color for the line, the argument “color=” can be added to the geom_abline() function call, like so:

...
geom_abline(slope = coef(fit_lm)[[2]], intercept = coef(fit_lm)[[1]], color="blue")

More Than One Regression Line on a Scatterplot

When there are more than two variables plotted in the scatterplot, if might be necessary to show more than one regression line; one line for each group being plotted. Here is an example using the iris dataset and method 1 above.


#showing multiple regression lines: one per group
ggplot(iris, aes(y=Petal.Width, x=Petal.Length, shape = Species, color=Species)) + 
  geom_point(size = 2) +
  #regression line
  geom_smooth(method = "lm", se=FALSE, color="black") + 
  xlim(0, 8) +
  ylim(0, 3) + 
  theme_classic()

The important step here is to specify the shape and/or color parameters inside the ggplot() function. If the color=”black” is omitted in the geom_smooth() function, then the group color will be used for each regression line instead of black.

Similar Posts

Leave a comment