Week 5 – BALT 4363 – Probability & Statistics for Data Science


This week’s chapter introduced two of the most important tools in all of data science: descriptive statistics and probability distributions. Before working through the examples, I mostly thought of statistics as something abstract or something used only in academic research. But Chapter 5 showed how directly these concepts influence real data analysis, prediction, and model evaluation. Descriptive statistics such as mean, median, variance, and standard deviation give us a quick summary of what’s happening inside a dataset. Whether it’s the heights of a group of people or the ages of Titanic passengers, these values immediately reveal patterns like central tendency and spread—something spreadsheets can compute, but Python handles more efficiently and reproducibly.

What really stood out this week was seeing how probability distributions help us model uncertainty. The normal distribution examples made it easy to understand why so many real-world variables, like heights or measurement errors, tend to form that familiar bell curve. Plotting simulated height data in Google Colab using NumPy and Matplotlib helped connect the mathematical idea to an actual visual shape. The Iris dataset example went one step further by showing how real measurements (sepal length) form a distribution that we can plot, analyze, and compare. This made probability feel less like theory and more like a practical tool for understanding patterns in the world.

Comments

Popular posts from this blog

Week 6 – BALT 4363 – Exploring Replit & AI-Powered App Building

Week 3 – BALT 4363 – Data Cleaning and Handling with Python