Okay, so today I’m gonna walk you through this thing I was messing with – trying to pull some data about the Kia Tigers, you know, the Korean baseball team. Thought it would be a fun little project.

First off, I just started by scouting around for some data sources. I knew I wasn’t gonna build a whole API myself, so I was hoping to find some sites with stats already laid out. Ended up finding a few that had what I needed, but the formats were all over the place, which was kinda annoying.
Next up, I fired up Python – my go-to for this kinda stuff. I used requests
to grab the HTML from the webpages. Pretty standard stuff, you know? Just import requests
and then start hitting those URLs.
The real pain was parsing the HTML. The tables were all structured differently, and some of the sites used weird CSS classes. I ended up using BeautifulSoup
. Spent a good chunk of time just figuring out the right selectors to get the data I wanted. It was a lot of “inspect element” in the browser, then tweaking the code, then running it again, and just repeating that until it worked.
Once I had the data, it was a mess. Dates were in different formats, some numbers were strings, and there were just a bunch of random characters floating around. So I spent a while cleaning it up. Used a bunch of string methods, like .replace()
and .strip()
, and tried to convert things to the right data types using int()
and float()
. It wasn’t pretty, but it got the job done.
After cleaning, I organized everything into a Pandas DataFrame. This made it way easier to work with the data. I could start doing things like calculating averages, sorting by different stats, and all that jazz.
Finally, I wanted to visualize the data a little bit. I used matplotlib
to make some simple charts. Nothing too fancy, just wanted to see some trends and compare different players. It was cool to actually see the numbers come to life, you know?
Here’s a quick rundown of the libraries I used:
- requests: For grabbing the HTML from the websites.
- BeautifulSoup: For parsing the HTML and extracting the data.
- pandas: For organizing and manipulating the data.
- matplotlib: For creating some simple visualizations.
All in all, it was a fun little project. Definitely learned a few things about web scraping and data cleaning. And hey, now I know a little more about the Kia Tigers too!
