Exploring and analyzing data is a fundamental aspect of data science.
Here, visualizations play a crucial role in understanding complex patterns and relationships.
They offer a concise way to:
understand the intricacies of statistical models,
validate model assumptions,
evaluate model performance, and much more.
The visual above depicts 9 of the most important and must-know plots in data science.
KS Plot: It compares the cumulative distribution functions (CDFs) of a dataset to a theoretical distribution or between two datasets to assess the distributional differences.
SHAP Plot: It provides a summary of feature importance to a model’s predictions, by considering interactions/dependencies between them.
QQ Plot: It is used to assess the distributional similarity between observed data and theoretical distribution.
Here, we plot the quantiles of the two distributions against each other.
Deviations from the straight line indicate a departure from the assumed distribution.
Cumulative Explained Variance Plot: I covered this in a detailed post before: How Many Dimensions Should You Reduce Your Data To When Using PCA?
Gini-Impurity vs. Entropy: They are used to measure the impurity or disorder of a node or split in a decision tree.
The plot compares Gini impurity and Entropy across different splits. This provides insights into the tradeoff between these measures.
Bias-Variance Tradeoff: It is used to find the right balance between the bias and the variance of a model.
ROC Curve: It depicts the trade-off between the true positive rate (TPR) and the false positive rate (FPR) across different classification thresholds.
Precision-Recall Curve: It depicts the trade-off between Precision and Recall across different classification thresholds.
Elbow Curve: The plot helps identify the optimal number of clusters for k-means algorithm.
Over to you: What more plots will you include here?
Thanks for reading Daily Dose of Data Science! Subscribe for free to learn something new and insightful about Python and Data Science every day. Also, get a Free Data Science PDF (250+ pages) with 200+ tips.
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Thanks a lot really very useful
This summary is really awesome, so many useful ways to understand ML !
However, I would advice against elbow method, as many article showed how wrong it can be. Here is a link of an excellent and recent articles, but they are many more :
Thanks again for your excellent work 😊