Week 3 – BALT 4363 – Data Cleaning and Handling with Python
This week, I worked through Chapter 3 of the Data Toolkit, which focused on handling and cleaning data using two of the most important Python libraries for data science: Pandas and NumPy. These tools are the foundation of almost every data-driven project, so learning how to manipulate real-world datasets with them is a major step toward becoming comfortable with data work. The chapter explained the difference between Pandas' main structures, Series and DataFrames, and showed how these structures make it easy to import data, explore it, and prepare it for analysis. What stood out to me is how Pandas transforms messy raw data into something structured and usable with just a few lines of code. I also learned how essential data cleaning is before any analysis can even begin. Chapter 3 walked through handling missing values, removing duplicates, renaming columns, and replacing incorrect values. Seeing each of these operations applied in context made it much easier to understand why clea...