<- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2/Linear Regresion Exercise Database/Datasets/lbwpmor.csv", sep=",", dec= ".") lbwpmor
2023-01-31. Session 1
- Instituut Voor Tropische Geneeskunde - Antwerp, Belgium
- Javier Silva-Valencia
Step by Step
Import data
Importing a CSV database under the name of lbwpmor, with “,” as separator, and with “.” as decimal:
We assume that all variables ok to start (we dont need to transform or create variables)
Starting the Exercise: Low birth and perinatal mortality
An ecological study has looked into the relationship between the incidence of low birth weight (“low birth weight per 100 births”) and perinatal mortality (“perinatal mortality per 100 births”) in health districts of a certain region.
1. Finding Correlation (To assume linearity)
#Code in R:
cor(lbwpmor$inclbw, lbwpmor$permor)
[1] 0.6861169
As a result: we have an r of 0.68. It is a moderate strong positive correlation
2. Doing the Scatterplot
#Code in R:
# Make a scatterplot
plot(lbwpmor$inclbw, lbwpmor$permor, main="Title")
# add a regression line
abline(lm(permor~inclbw, data = lbwpmor), col = "blue")
As a result: we can see a linear positive correlation
3. Doing the Linear regresion
#Code in R:
= lm(permor ~ inclbw, data = lbwpmor)
lmHeight summary(lmHeight)
Call:
lm(formula = permor ~ inclbw, data = lbwpmor)
Residuals:
Min 1Q Median 3Q Max
-0.30030 -0.17013 -0.03691 0.21897 0.34469
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.42139 0.31880 1.322 0.202786
inclbw 0.14676 0.03668 4.001 0.000837 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.218 on 18 degrees of freedom
Multiple R-squared: 0.4708, Adjusted R-squared: 0.4414
F-statistic: 16.01 on 1 and 18 DF, p-value: 0.0008373
-A. As a result: we can recreate the formula with the data we obtained
* y = a+b(x)
* a= 0.42139
* b = 0.14677
So, the formula would be:
* Y= 0.42 +0.147(X)
-B. We also find the R2
* R2 is the "Determination coefficient".
* R2 = 0.4708
Interpretation:
"47% of the variability of the mortality could be explained by the model"
-C. We also find the p value = 0.000837
Interpretation:
*P value in this case says that there is a statisticas association between mortality
and low birth weight (that is the same that says that "the slope of the regresion line
is significanly different from zero)
-D. As another result we have the “correlation coefficient”
Remember that we calculate the correlation coefficiente before in part 1 ("1. Finding
Correlation"). But we can also calculate here because:
Correlation coefficient = Square root of R2
So:
Correlation coefficient (r) = Sqrt(0.4708)
Correlation coefficient (r) = 0.6861
Interpretation:
"The value of 0.68 suggest a strong positive association"
4. Predict “y” where X=8
Before we have already calculated the formula: “Y= 0.42 +0.147(X)” So we just need to replace the “X”
#Calculate:
0.42 +(0.147*8)
[1] 1.596
Clarification: Using the Linear regression formula above we can say that
when "low birth weight per 100 births" is 8
the "perinatal mortality per 100 births" may be 1.596