Pearson Correlation Coefficient
Pearson correlation is a number between -1 and +1 that indicates how much 2 variables are linearly related.
Python Code to find out
We will use python code to find out Pearson correlation coefficient between two variables.
# Importing the required libraries
import pandas as pd
import pingouin as pg
#a csv file with data.
df = pd.read_csv('Assignment1_problem1_data.csv')
df.head(10)
#The file looks like this.
x1 | x2 | x3 | |
---|---|---|---|
0 | 2.5 | 1.2 | 8 |
1 | 3.6 | 1.0 | 15 |
2 | 1.2 | 1.8 | 12 |
3 | 0.8 | 0.9 | 6 |
4 | 4.0 | 3.0 | 8 |
5 | 3.4 | 2.2 | 10 |
#### Simple correlation between two columns (r is the correlation coefficient)
pg.corr(x=df['x1'], y=df['x2'])
#the output is as follows
n | r | CI95% | r2 | adj_r2 | p-val | BF10 | power | |
---|---|---|---|---|---|---|---|---|
6 | 0.529748 | [-0.49, 0.94] | 0.280633 | -0.198945 | 0.27971 | 0.814 | 0.199882 |
So the x1 and x2 are positivily correlated as the r= 0.529748, meaning when x1 increases x2 increases and vice versa.
####Now for x1 and x3
pg.corr(x=df['x1'], y=df['x3'])
#the output is as follows
n | r | CI95% | r2 | adj_r2 | p-val | BF10 | power | |
---|---|---|---|---|---|---|---|---|
6 | 0.314443 | [-0.67, 0.9] | 0.098875 | -0.501876 | 0.54388 | 0.577 | 0.094945 |
So the x1 and x3 are positivily correlated as the r= 0.314443, meaning when x1 increases x2 increases and vice versa.
####Now for x2 and x3
pg.corr(x=df['x2'], y=df['x3'])
#the output is as follows
n | r | CI95% | r2 | adj_r2 | p-val | BF10 | power | |
---|---|---|---|---|---|---|---|---|
6 | -0.129455 | [-0.85, 0.76] | 0.016759 | -0.638736 | 0.806902 | 0.504 | 0.057173 |
So the x1 and x3 are negatively correlated as the r=-0.129455, meaning when x2 increases x3 decreases and vice versa.