Okay, here’s my blog post about my little project, “pga major winners.” I’m gonna walk you through how I tackled this.

Alright folks, so recently I got this little itch to dive into some data analysis. I’m no pro, just a regular dude who likes messing around with code and spreadsheets. I decided to focus on something I enjoy: golf! Specifically, I wanted to get a list of PGA Major winners.
First things first: Data Gathering. I started with the obvious: Google. I spent a good hour or two just surfing around, trying to find a reliable source. I bounced between the PGA Tour’s official site, Wikipedia, and some random golf stat websites. Honestly, it was a bit of a mess. Different sites had different formats, some were missing data, and others were just plain hard to navigate.
Eventually, I pieced together a pretty decent list from a few sources. I ended up copying and pasting a bunch of stuff into a Google Sheet. Yeah, super high-tech, I know! But hey, it worked.
Next up: Data Cleaning. This was probably the most tedious part. The raw data was all over the place. I had to standardize the names, make sure the years were consistent, and weed out any duplicates or errors. I used Google Sheets’ built-in functions for this – things like TRIM
to remove extra spaces, UPPER
to make everything uppercase, and some basic IF
statements to correct inconsistencies. It was seriously like a digital version of spring cleaning.
Then came: The Code!. I fired up Python. I’m still learning, so this was a good practice run. I used the pandas
library to load my cleaned data from the Google Sheet (saved as a CSV file, of course). I wanted to do some basic analysis, like:
- Find out which golfers had the most Major wins.
- See which country produced the most winners.
- Plot the number of wins over time.
The pandas code was fairly straightforward. I used groupby
and count
to aggregate the data, and matplotlib
to create some basic charts. I even tried using seaborn
for a slightly fancier visualization, but honestly, my pandas charts were good enough for what I needed.
And finally: Some results, Okay, so here’s what I found. Tiger Woods (no surprise there) had the most wins by far, followed by a few other legends. The US had the most winners overall, but other countries like the UK and Australia were also well-represented. I also saw some clear peaks and valleys in terms of the number of Major wins per year, but I didn’t really dig too deep into the reasons behind those trends. Maybe something for another project!
What I Learned
This whole thing was a good reminder that data analysis is rarely glamorous. It’s a lot of grunt work – cleaning data, wrangling formats, and dealing with inconsistencies. But once you get past that, it’s pretty cool to see what insights you can uncover. I’m definitely going to keep practicing and exploring more complex datasets. Maybe next time I’ll try predicting future winners using machine learning. Who knows?
