library(tidyverse)
library(ggplot2)
library(MASS)
library(mfp)
data("birthwt")
data("Pima.tr")
Platelet<- read.table("data/Platelet.txt", header=T, sep="")
data(bodyfat, package="mfp")
saltBP <- read.table(file="data/saltBP.txt", header=T, sep="")

Fitting a linear regression model in R is straightforward. Here, we model the relationship between percent body fat, {siri}, and height, using a simple linear regression model.

To fit the least-squares regression model, use the lm() function:

fit<- lm(siri ~ abdomen, data=bodyfat)

The first argument of the function is the formula of the form of “response \(\sim\) explanatory”. The second argument specifies the data set. By giving the name of the data set this way, we avoid witting the equation as “bodyfat$siri \(\sim\) bodyfat$abdomen”.

The fit object now stores all the output from the linear regression. Type fit to get the estimates of the \(\alpha\) and \(\beta\).

fit
## 
## Call:
## lm(formula = siri ~ abdomen, data = bodyfat)
## 
## Coefficients:
## (Intercept)      abdomen  
##    -39.2802       0.6313

To get the corresponding confidence intervals, we can use the function confint():

confint(fit)
##                   2.5 %      97.5 %
## (Intercept) -44.5197140 -34.0406553
## abdomen       0.5750739   0.6875349

Of course, the fit object contains much more information.

summary(fit)
## 
## Call:
## lm(formula = siri ~ abdomen, data = bodyfat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.0160  -3.7557   0.0554   3.4215  12.9007 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -39.28018    2.66034  -14.77   <2e-16 ***
## abdomen       0.63130    0.02855   22.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.877 on 250 degrees of freedom
## Multiple R-squared:  0.6617, Adjusted R-squared:  0.6603 
## F-statistic: 488.9 on 1 and 250 DF,  p-value: < 2.2e-16

With the names() function, we can view all the information contained in the fit object.

names(fit)
##  [1] "coefficients"  "residuals"     "effects"       "rank"          "fitted.values" "assign"       
##  [7] "qr"            "df.residual"   "xlevels"       "call"          "terms"         "model"

Now we can use the “$” operator to access information. For instance, suppose we wanted the point estimates of \(\alpha\) and \(\beta\):

fit$coefficients
## (Intercept)     abdomen 
## -39.2801847   0.6313044

Likewise, the estimated response values for all people in our sample are stored in the fitted.values object within fit. Suppose we wanted the estimates for the first 5 people:

fit$fitted.values[1:5]
##        1        2        3        4        5 
## 14.50695 13.11808 16.21147 15.26451 23.85025

The differences between actual and estimated response values are stored in the residuals object within fit. The following command returns the residuals of the first 5 people:

fit$residuals[1:5]
##         1         2         3         4         5 
## -2.206949 -7.018079  9.088529 -4.864514  4.849746

Adding the least-squares line to the scatterplot is easy with the abline() function:

plot(bodyfat$abdomen, bodyfat$siri, main="Scatterplot for Percent Body Fat by Abdomen",  xlab='Abdomen', ylab='Percent Body Fat')
abline(fit)

Activity 1

We would like to predict a baby’s birthweight ({bwt}) before she is born using her mother’s weight at last menstrual period ({lwt}).

  • Use the {birthwt} data set to build a simple linear regression model, where {bwt} is the response variable and {lwt} is the predictor.
  • Interpret your estimate of regression coefficient and examine its statistical significance.
  • Find the 95% confidence interval for the regression coefficient.
  • If mother’s weight at last menstrual period is 170 pounds, what would be your estimate for the birthweight of her baby?

Activity 2

We want to examine the relationship between body temperature, \(Y\), and heart rate, \(X\). Further, we would like to use heart rate to predict the body temperature.

  • Use the “BodyTemperature.txt” data set to build a simple linear regression model for body temperature using heart rate as the predictor.
  • Interpret the estimate of regression coefficient and examine its statistical significance.
  • Find the 95% confidence interval for the regression coefficient.
  • Find the value of \(R^{2}\) and show it is equal to sample correlation coefficient.
  • Create simple diagnostic plots for your model and identify possible outliers.
  • If someone’s heart rate is 75, what would be your estimate of this person’s body temperature?