library(tidyverse)
library(ggplot2)
library(MASS)
library(mfp)
data("birthwt")
data("Pima.tr")
Platelet<- read.table("data/Platelet.txt", header=T, sep="")
data(bodyfat, package="mfp")
saltBP <- read.table(file="data/saltBP.txt", header=T, sep="")
Fitting a linear regression model in R is straightforward. Here, we model the relationship between percent body fat, {siri}, and height, using a simple linear regression model.
To fit the least-squares regression model, use the lm()
function:
fit<- lm(siri ~ abdomen, data=bodyfat)
The first argument of the function is the formula of the form of “response \(\sim\) explanatory”. The second argument specifies the data set. By giving the name of the data set this way, we avoid witting the equation as “bodyfat$siri \(\sim\) bodyfat$abdomen”.
The fit
object now stores all the output from the linear regression. Type fit
to get the estimates of the \(\alpha\) and \(\beta\).
fit
##
## Call:
## lm(formula = siri ~ abdomen, data = bodyfat)
##
## Coefficients:
## (Intercept) abdomen
## -39.2802 0.6313
To get the corresponding confidence intervals, we can use the function confint()
:
confint(fit)
## 2.5 % 97.5 %
## (Intercept) -44.5197140 -34.0406553
## abdomen 0.5750739 0.6875349
Of course, the fit
object contains much more information.
summary(fit)
##
## Call:
## lm(formula = siri ~ abdomen, data = bodyfat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.0160 -3.7557 0.0554 3.4215 12.9007
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39.28018 2.66034 -14.77 <2e-16 ***
## abdomen 0.63130 0.02855 22.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.877 on 250 degrees of freedom
## Multiple R-squared: 0.6617, Adjusted R-squared: 0.6603
## F-statistic: 488.9 on 1 and 250 DF, p-value: < 2.2e-16
With the names()
function, we can view all the information contained in the fit
object.
names(fit)
## [1] "coefficients" "residuals" "effects" "rank" "fitted.values" "assign"
## [7] "qr" "df.residual" "xlevels" "call" "terms" "model"
Now we can use the “$” operator to access information. For instance, suppose we wanted the point estimates of \(\alpha\) and \(\beta\):
fit$coefficients
## (Intercept) abdomen
## -39.2801847 0.6313044
Likewise, the estimated response values for all people in our sample are stored in the fitted.values
object within fit
. Suppose we wanted the estimates for the first 5 people:
fit$fitted.values[1:5]
## 1 2 3 4 5
## 14.50695 13.11808 16.21147 15.26451 23.85025
The differences between actual and estimated response values are stored in the residuals
object within fit
. The following command returns the residuals of the first 5 people:
fit$residuals[1:5]
## 1 2 3 4 5
## -2.206949 -7.018079 9.088529 -4.864514 4.849746
Adding the least-squares line to the scatterplot is easy with the abline()
function:
plot(bodyfat$abdomen, bodyfat$siri, main="Scatterplot for Percent Body Fat by Abdomen", xlab='Abdomen', ylab='Percent Body Fat')
abline(fit)
We would like to predict a baby’s birthweight ({bwt}) before she is born using her mother’s weight at last menstrual period ({lwt}).
We want to examine the relationship between body temperature, \(Y\), and heart rate, \(X\). Further, we would like to use heart rate to predict the body temperature.