Alright, let’s dive into this. So, I had this itch to try and predict the Hearts vs. Celtic match. I know, I know, football predictions are a mug’s game, but hey, gotta try, right?

First things first: Data Gathering
- I started scraping data from a couple of sports websites. Stuff like recent form, head-to-head records, goals scored, goals conceded – the usual suspects.
- I also dug around for team news – injuries, suspensions, any key players missing. That can really swing a game.
Then, the Fun Part: Feature Engineering
This is where I tried to make sense of the data. I didn’t just want raw numbers; I wanted to create features that meant something. For example:
- Form Factor: Instead of just looking at the last 5 games, I weighted them. The most recent game got the highest weight, the oldest, the lowest.
- Home Advantage: I added a simple boolean – 1 if Hearts were at home, 0 if not. Obvious, but important.
- Attack Strength: Goals scored divided by games played. Simple, but effective. Did the same for defense.
Model Time!
I went with a Logistic Regression model. Why? It’s relatively simple to understand and interpret, and for a binary outcome (win/lose/draw – I actually treated “draw” as a separate category at first), it’s a decent starting point. I used Python and Scikit-learn, obviously.
from *_model import LogisticRegression
Training and Testing
Split the data into training and testing sets. I used about 80% for training, 20% for testing. Then I fed the model the training data and let it do its thing. After that, I tested it on the testing data to see how well it performed.
Results (brace yourself…)

Okay, so the accuracy wasn’t exactly mind-blowing. It was hovering around 60-65%. Not terrible, but not good enough to bet the house on! It correctly predicted Celtic as the favourite, but it failed to pick up a couple of shock draws that actually happened.
What Went Wrong? (Probably everything!)
- Limited Data: I didn’t have a ton of data to work with. More data is always better.
- Feature Selection: Maybe my features weren’t the best. I could have tried adding more advanced metrics or looking at player-specific stats.
- Model Choice: Logistic Regression might not have been the right choice. A more complex model, like a Random Forest or a Neural Network, could potentially have performed better.
- Luck: Let’s be honest, football is unpredictable! Sometimes, the better team just doesn’t win.
What I Learned
Even though the prediction wasn’t perfect, it was a fun exercise. I got to practice my data scraping, feature engineering, and model building skills. Plus, it reinforced the idea that predicting football matches is HARD! But hey, I’ll probably try again next week. Maybe I’ll even use a different model. Who knows, maybe I’ll get lucky!
Next Steps
- Try a different model. Maybe a Random Forest or Gradient Boosting.
- Gather more data. Maybe from more sources.
- Look at player-specific stats and individual matchups.
- Accept that I’ll probably still be wrong most of the time!
That’s my take on the Hearts vs Celtic prediction. It wasn’t a roaring success, but I had a good time trying. Let me know if you have any tips or suggestions for improving my approach. Always up for learning something new.