2023-01-31. Session 1
Linear Regression - Session 1
Starting the Exercise
Instructions:
An ecological study has looked into the relationship between the incidence of low birth weight (“low birth weight per 100 births”) and perinatal mortality (“perinatal mortality per 100 births”) in health districts of a certain region.
Importing data
Importing a CSV database under the name of lbwpmor, with “,” as separator, and with “.” as decimal:
We assume that all variables ok to start (we dont need to transform or create variables)
Step 1
Finding if there is a Correlation between inclbw and permor (This is to assume linearity, there are other ways buy we will start this exercise doing this)
As a result: we have an r of 0.68. It is a moderate strong positive correlation
Step 2
Doing the Scatterplot
As a result: we can see in the graph the linear positive correlation
Step 3
Doing the Linear regresion for only inclbw and permor (bivariate)
Call:
lm(formula = permor ~ inclbw, data = lbwpmor)
Residuals:
Min 1Q Median 3Q Max
-0.30030 -0.17013 -0.03691 0.21897 0.34469
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.42139 0.31880 1.322 0.202786
inclbw 0.14676 0.03668 4.001 0.000837 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.218 on 18 degrees of freedom
Multiple R-squared: 0.4708, Adjusted R-squared: 0.4414
F-statistic: 16.01 on 1 and 18 DF, p-value: 0.0008373
-A. As a result: we can recreate the formula with the data we obtained
* y = a+b(x)
* a= 0.42139
* b = 0.14677 (slope)
So, the formula would be:
* Y= 0.42 +0.147(X)
-B. We also find the R2
* R2 is the "Determination coefficient".
* R2 = 0.4708
Interpretation:
"47% of the variability of the mortality could be explained by the model"
-C. We also find the p value = 0.000837
Interpretation:
*P value in this case says that there is a statisticas association between mortality
and low birth weight (that is the same that says that "the slope of the regresion line
is significanly different from zero)
-D. As another result we have the “correlation coefficient”
Remember that we calculate the correlation coefficiente before in part 1 ("1. Finding
Correlation"). But we can also calculate here because:
Correlation coefficient = Square root of R2
So:
Correlation coefficient (r) = Sqrt(0.4708)
Correlation coefficient (r) = 0.6861
Interpretation:
"The value of 0.68 suggest a strong positive association"
Step 4
Predict “y” when X is “8”
Before we have already calculated the formula: “Y= 0.42 +0.147(X)” So we just need to replace the “X”
So, using the Linear regression formula above we can say that
When "low birth weight" is 8
the "perinatal mortality" may be 1.596