Discover more from Daily Dose of Data Science
Why Correlation (and Other Statistics) Can Be Misleading.
Correlation is often used to determine the association between two continuous variables. But it has a major flaw that often gets unnoticed.
Folks often draw conclusions using a correlation matrix without even looking at the data. However, the obtained statistics could be heavily driven by outliers or other artifacts.
This is demonstrated in the plots above. The addition of just two outliers changed the correlation and the regression line drastically.
Thus, looking at the data and understanding its underlying characteristics can save from drawing wrong conclusions. Statistics are important, but they can be highly misleading at times.
Share this post on LinkedIn: Post Link.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.