Be Cautious Before Drawing Any Conclusions Using Summary Statistics
You never know what's inside the data.
While analyzing data, one may be tempted to draw conclusions solely based on its statistics. Yet, the actual data might be conveying a totally different story.
Here's a visual depicting nine datasets with approx. zero correlation between the two variables. But the summary statistic (Pearson correlation in this case) gives no clue about what's inside the data.
What's more, data statistics could be heavily driven by outliers or other artifacts. I covered this in a previous post here.
Thus, the importance of looking at the data cannot be stressed enough. It saves you from drawing wrong conclusions, which you could have made otherwise by looking at the statistics alone.
For instance, in the sinusoidal dataset above, Pearson correlation may make you believe that there is no association between the two variables. However, remember that it is only quantifying the extent of a linear relationship between them. Read more about this in another one of my previous posts here.
Thus, if there’s any other non-linear relationship (quadratic, sinusoid, exponential, etc.), it will fail to measure that.
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day.
👉 Read what others are saying about this post on LinkedIn.
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Avi, thanks for the post. I cited your post in a piece that I just released. Thanks for your work.