Co-Relation
For example, consider the variables family income and family expenditure. It is well known that income and expenditure increase or decrease together. Thus they are related in the sense that change in any one variable is accompanied by change in the other variable.
As a rule of thumb, the following guidelines on strength of relationship are often useful (though many experts would somewhat disagree on the choice of boundaries).
Reference : http://explorable.com/statistical-correlation
For example, consider the variables family income and family expenditure. It is well known that income and expenditure increase or decrease together. Thus they are related in the sense that change in any one variable is accompanied by change in the other variable.
Again
price and demand of a commodity are related variables; when price increases
demand will tend to decreases and vice versa.
If
the change in one variable is accompanied by a change in the other, then the
variables are said to be correlated. We can therefore say that family income
and family expenditure, price and demand are correlated.
Relationship Between Variables
Correlation
can tell you something about the relationship
between variables. It is used to understand:- whether the relationship is positive or
- the strength of relationship.
Correlation
is a powerful tool that provides these vital pieces of information.
In
the case of family income and family expenditure, it is easy to see that they
both rise or fall together in the same direction. This is called positive
correlation.
In
case of price and demand, change occurs in the opposite direction so that
increase in one is accompanied by decrease in the other. This is called
negative correlation.
Further to relationship of co-relation of one variable with another and we computed the Chi-Square statistics to tell us if the variables are independent or not. While this type of analysis is very useful for categorical data, for numerical data the resulting tables would (usually) be too big to be useful. Therefore we need to learn different methods for dealing with numerical variables to decide whether two such variables are related.
Example: Suppose that 5 students were asked their high school GPA and their College GPA, with the answers as follows:
Student
|
HS GPA
|
College GPA
|
A
|
3.8
|
2.8
|
B
|
3.1
|
2.2
|
C
|
4.0
|
3.5
|
D
|
2.5
|
1.9
|
E
|
3.3
|
2.5
|
We want to know: is high school and college GPA related according to this data, and if they are related, how can I use the high school GPA to predict the college GPA?
There are two answers to give:
· first, are they related, and
· second, how are they related.
Casually looking at this data it seems clear that the college GPA is always worse than the high school one, and the smaller the high school GPA the smaller the college GPA. But how strong a relationship, if any, seems difficult to quantify.
We will first discuss how to compute and interpret the so-called correlation coefficient to help decide whether two numeric variables are related or not. In other words, it can answer our first question. We will answer the second question in later sections. First, let's define the correlation coefficient mathematically.
Coefficient of Correlation
Statistical correlation is
measured by what is called coefficient of correlation (r). Its numerical value
ranges from +1.0 to -1.0. It gives us an indication of the strength of
relationship.
In
general, r > 0 indicates positive relationship, r < 0 indicates negative relationship
while r = 0 indicates no relationship (or that the variables are independent
and not related). Here r = +1.0 describes a perfect positive correlation and r
= -1.0 describes a perfect negative correlation.
Closer
the coefficients are to +1.0 and -1.0, greater is the strength of the
relationship between the variables.As a rule of thumb, the following guidelines on strength of relationship are often useful (though many experts would somewhat disagree on the choice of boundaries).
| Value of r | Strength of relationship |
|---|---|
| -1.0 to -0.5 or 1.0 to 0.5 | Strong |
| -0.5 to -0.3 or 0.3 to 0.5 | Moderate |
| -0.3 to -0.1 or 0.1 to 0.3 | Weak |
| -0.1 to 0.1 | None or very weak |
Correlation is only appropriate for examining the relationship between meaningful quantifiable data (e.g. air pressure, temperature) rather than categorical data such as gender, favourite colour etc.
Disadvantages
While 'r' (correlation coefficient) is a powerful tool, it has to be handled with care.
- The most used correlation coefficients only measure linear relationship. It is therefore perfectly possible that while there is strong non linear relationship between the variables, r is close to 0 or even 0. In such a case, a scatter diagram can roughly indicate the existence or otherwise of a non linear relationship.
- One has to be careful in interpreting the value of 'r'. For example, one could compute 'r' between the size of shoe and intelligence of individuals, heights and income. Irrespective of the value of 'r', it makes no sense and is hence termed chance or non-sense correlation.
- 'r' should not be used to say anything about cause and effect relationship. Put differently, by examining the value of 'r', we could conclude that variables X and Y are related. However the same value of 'r' does not tell us if X influences Y or the other way round. Statistical correlation should not be the primary tool used to study causation, because of the problem with third variables.
Reference : http://explorable.com/statistical-correlation