Posts

cst438 - week1

     This week was a nice introduction to the course along with some of the services we'll be using, specifically spring boot. I really enjoyed the labs because they gave me a chance to become familiar with spring boot again and refresh my understanding of its components. It was a good balance of reviewing concepts I had seen before while also preparing me for the rest of the course.      Before this week, I thought of software engineering primarily as the process of designing and integrating large systems. In my mind, software development was focused on writing the individual pieces of code, while software engineering was about connecting those pieces through APIs, microservices, and overall system architecture. After this week, I still think those ideas are an important part of software engineering, but I've realized that the development workflow is just as important. The processes, tools, and teams use to build software efficiently are a major part of en...

cst383 - week 7

     This week focused on encoding categorical variables, logistic regression, and overfitting. While the lectures introduced several important machine learning concepts, I think the homework was what helped reinforce them the most. Unlike many previous assignments, this one was much more open ended and required me to make my own decisions about how to approach the problem. We were given a dataset and told to predict a target variable using machine learning practices that we've learned, so how to go about preprocessing the data and tuning the model was left to me.      Initially the open endedness of the assignment made it a bit more challenging with not having a clear step by step process to follow, but it certainly made the assignment feel much more realistic seeing as in an actual career, problems arent going to be presented with exact instructions. Being able to evaluate different approaches is an important skill to train. I felt that I had made good d...

cst383 - week 6

     This week focused on hyperparameter tuning, KNN regression, linear regression, and evaluating regression models. There was a lot to cover in these topics but interestingly enough they gave me an appreciation for what goes into creating machine learning models. From my previous perspective I think I viewed machine learning as mostly selecting an algorithm and allowing it to produce results. But in learning about these concepts like hyperparameter tuning, it showed me that there is still a significant human element involved in the process. The performance of a model can depend heavily on the choices made by the developer, and finding the right settings requires testing and careful evaluation.      Additionally the distinction between classification and regression was interesting because while both are forms of prediction, they are designed to solve different types of problems. Regression is useful in that many real world situations involve predicting nu...

cst383 - week 5

     This week covered machine learning topics like handling missing data, data scaling, z-score calculations, knn classification, test sets, cross validation, and evaluating models. These topics, while diverse, worked well together for preparing data and testing to make sure conclusions drawn from it are accurate.     I found  learning about missing data and how it is represented interesting. I had originally not given much thought to the distinction between values like None and NaN so it was fun learning how these values behave and why they are treated differently. Data is often filled with holes and knowing how to handle missing information is an important part of the analysis process.      Cross validation was another topic that stood out to me. The idea of testing a model against different subsets of data to verify that the results are reliable seems intuitive and clever. It made me think about the people who originally developed these ...

cst383 - week 4

     This week focused heavily on probability and different methods of calculating and interpreting it. We covered topics like boolean, conditional, joint, and marginal probability along with working with data tables to organize the information. Having so many different forms of probability calculations felt a bit overwhelming at first due the concepts being pretty closely related, but after reviewing the lecture and working on the homework I started to become more comfortable with them.     Something I really enjoyed this  week was the data visualization portion of the homework in google Colab. Working with pandas to create graphs and visualizations was very satisfying. The process of building the visualization piece by piece and gradually adjusting the code until it displayed exactly what I wanted felt rewarding due tot he immediate visualization feedback. Manipulating a bunch of raw data into nice visual formats also reinforced how important visualizati...

cst383 - week 3

     This week covered a large amount of topics related to data visualization and statistical analysis using pandas. We learned different plotting systems, how to display information in meaningful ways using , how to customize visualizations, and how to perform calculations using the data. At first it was a little overwhelming because there were many different visualization methods and understanding where each was best used depending on the different types of data. It took me some time and a decent amount of review before I became more comfortable recognizing where certain plotting systems are most useful or how to interpret them effectively.      We also covered concepts like correlation and covariance. One thing I found especially interesting was the idea that even though data may appear objective and straightforward, it still requires critical thinking to interpret correctly. Identifying correlation does not necessarily reveal the full truth behind the ...

cst383 - week 2

     This week the focus was on working with data using the pandas library, and later an introduction to probability density functions. One of the main topics was pandas series and how flexible they are for working with and visualizing data. We covered indexing, vectorized operations, and how Series differ from standard numpy arrays. While both structures are similar, what stood out to me is how pandas series include labeled indices, which makes the data much easier to interpret and work with. Instead of just relying on positional indexing like arrays, being able to have meaningful labels allows for clearer data manipulation. This seems especially useful when working with large real world datasets where context is just as impotent as the values.     P andas dataframes and how they are used to organize data was also covered. This section felt pretty intuitive because it felt like working with tables in a database or spreadsheet. Along with that was an introducti...