2023-01-31. Session 1

Step by Step

Import data

Importing a CSV database under the name of lbwpmor, with “,” as separator, and with “.” as decimal:

lbwpmor <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2/Linear Regresion Exercise Database/Datasets/lbwpmor.csv", sep=",", dec= ".")

We assume that all variables ok to start (we dont need to transform or create variables)



Starting the Exercise: Low birth and perinatal mortality

An ecological study has looked into the relationship between the incidence of low birth weight (“low birth weight per 100 births”) and perinatal mortality (“perinatal mortality per 100 births”) in health districts of a certain region.

1. Finding Correlation (To assume linearity)
#Code in R:
  cor(lbwpmor$inclbw, lbwpmor$permor)
[1] 0.6861169
Note

As a result: we have an r of 0.68. It is a moderate strong positive correlation


2. Doing the Scatterplot
#Code in R:          
  # Make a scatterplot
  plot(lbwpmor$inclbw, lbwpmor$permor, main="Title")
  # add a regression line
  abline(lm(permor~inclbw, data = lbwpmor), col = "blue")

Note

As a result: we can see a linear positive correlation


3. Doing the Linear regresion
#Code in R:
lmHeight = lm(permor ~ inclbw, data = lbwpmor)
summary(lmHeight)

Call:
lm(formula = permor ~ inclbw, data = lbwpmor)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.30030 -0.17013 -0.03691  0.21897  0.34469 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.42139    0.31880   1.322 0.202786    
inclbw       0.14676    0.03668   4.001 0.000837 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.218 on 18 degrees of freedom
Multiple R-squared:  0.4708,    Adjusted R-squared:  0.4414 
F-statistic: 16.01 on 1 and 18 DF,  p-value: 0.0008373
Note

-A. As a result: we can recreate the formula with the data we obtained

* y = a+b(x)
* a= 0.42139
* b = 0.14677
So, the formula would be:
* Y= 0.42 +0.147(X)
  

-B. We also find the R2

* R2 is the "Determination coefficient". 
* R2 =  0.4708
Interpretation: 
"47% of the variability of the mortality could be explained by the model"
    

-C. We also find the p value = 0.000837

Interpretation:
*P value in this case says that there is a statisticas association between mortality
and low birth weight (that is the same that says that "the slope of the regresion line 
is significanly different from zero)
    

-D. As another result we have the “correlation coefficient”

Remember that we calculate the correlation coefficiente before in part 1 ("1. Finding
Correlation"). But we can also calculate here because: 
      Correlation coefficient = Square root of R2
So:
      Correlation coefficient (r) = Sqrt(0.4708)
      Correlation coefficient (r) = 0.6861
Interpretation: 
    "The value of 0.68 suggest a strong positive association"
4. Predict “y” where X=8

Before we have already calculated the formula: “Y= 0.42 +0.147(X)” So we just need to replace the “X”

#Calculate:
0.42 +(0.147*8)
[1] 1.596
Note

Clarification: Using the Linear regression formula above we can say that

when "low birth weight per 100 births" is 8 
the "perinatal mortality per 100 births" may be 1.596