cst383 - week 2

    This week the focus was on working with data using the pandas library, and later an introduction to probability density functions. One of the main topics was pandas series and how flexible they are for working with and visualizing data. We covered indexing, vectorized operations, and how Series differ from standard numpy arrays. While both structures are similar, what stood out to me is how pandas series include labeled indices, which makes the data much easier to interpret and work with. Instead of just relying on positional indexing like arrays, being able to have meaningful labels allows for clearer data manipulation. This seems especially useful when working with large real world datasets where context is just as impotent as the values.

    Pandas dataframes and how they are used to organize data was also covered. This section felt pretty intuitive because it felt like working with tables in a database or spreadsheet. Along with that was an introduction to aggregation and grouping in pandas. Aggregation feels like a very powerful tool seeing how quickly you can summarize and analyze large datasets using it. Being able to group data by a category and then computing averages or totals inherently is an important ability. These tools did a great job of demonstrating how powerful pandas can be for turning raw data into meaningful information with very little work.

    Later we were introduced to probability density functions and culminative distribution functions and how to analyze them. While simple enough in theory, it took me some time to become familiar with reading the graphs in a way that I could extract meaning from the numbers. Particular, understanding how to accurately estimate percentages from the graph took some effort in how you have to carefully consider the area under the curve represents probability, and how that relates to different regions of the graph like peaks and tails. After practicing more problems, and getting more familiar with the concept of a continues variable, I started to feel more comfortable identifying these regions and interpreting what they represent. I still think I need more practice with this so that I can do it at a glance, but I feel more confident than I did at the start.

    Something I found very interesting this week was revisiting concepts like mean, median, and mode. While I was already very familiar with these, the problems we worked on made me think more deeply about when each measure should actually be used. In the past, I tended to treat the average/mean as the default choice without questioning it much. However, I now started to consider why the median can often be more useful, especially in situations where there are outliers or skewed data. Seeing how extreme values can pull the mean in one direction made it clearer why the median can sometimes give a more accurate representation of the “typical” value.

Comments

Popular posts from this blog

CST334 Week 2

CST334 Week 5

CST334 Week 6