2023-02-02 Linear Regresion, Sessión 3
- Instituut Voor Tropische Geneeskunde - Antwerp, Belgium
- Javier Silva-Valencia
Step by Step
Import data
Importing a CSV database under the name of “cholest”, with “,” as separator and “.” as decimal:
Starting the Exercise:
We want to respond to the question: How do we best explain the variability of cholesterol with the data we have?
1. We explore the variables - Cleaning
1.1 Cholesterol
-Cholesterol seems ok
1.2 Activity
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 5.00 8.50 10.92 15.50 26.00
-Activity seems ok
1.3 Occupation
1.3.1 First we see the values of the variable if they make sense
-Seems ok, Occupation only have 4 categories, and there are 4 categories in my data
1.3.2 Second we see if the variable is in a categorical or numerical way as I want. Ocupation is in a numerical way, need to change it to a factor
1.3.2 Then, we have to be sure that the first category of the variable should be the reference category
-activity seems ok
Doing the modeling - linear regresion - Method: Change-in-estimate model
Model with inly the primary expouse variable
Call:
lm(formula = cholesterol ~ activity, data = cholest)
Residuals:
Min 1Q Median 3Q Max
-0.85397 -0.49306 -0.06166 0.34349 1.25118
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.05397 0.22507 18.012 1.17e-14 ***
activity -0.06410 0.01704 -3.762 0.00108 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6205 on 22 degrees of freedom
Multiple R-squared: 0.3914, Adjusted R-squared: 0.3638
F-statistic: 14.15 on 1 and 22 DF, p-value: 0.001077
Model with two independent variables
summary(mod2) summary(mod3) summary(mod4) summary(mod5)
After see the % of change we see that only “activity+age” and “activity+BMI” has a change % higher than 10
Multivariate model
The adjusted effect of activity is: -0.016
Multivatiate model without age
The adjusted effect of activity now is: -0.029
Is that a substancial change? (-0.029 - -0.016)/-0.016 81% Yes, it is a substancial change, so we shoulnt take age of the equation
Multivatiate model without bmi
Call:
lm(formula = cholesterol ~ activity + age, data = cholest)
Residuals:
Min 1Q Median 3Q Max
-0.65374 -0.23438 -0.04463 0.20482 0.54035
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.67711 0.32871 5.102 4.71e-05 ***
activity -0.01687 0.01078 -1.565 0.132
age 0.04722 0.00610 7.741 1.39e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3235 on 21 degrees of freedom
Multiple R-squared: 0.8421, Adjusted R-squared: 0.827
F-statistic: 55.99 on 2 and 21 DF, p-value: 3.836e-09
The adjusted effect of activity now is: -0.017
Is that a substancial change? (-0.017 - -0.016)/-0.016 6% No, it is not a substancial change, so we can take BMI of the equation
So the final model (the more simple) is cholesterol ~ activity + age Mod8