Which horse is the GOAT in PFL?

Which horse is the greatest horse to ever enter a race in PFL? This post borrows from another genre using Trueskill to assign a rating to each horse and evaluate what we find.

Petrocker

5/10/20247 min read

a jockey in red silks
a jockey in red silks

How do you go about figuring out which is the greatest horse of all time (GOAT) in PFL? We could look at wins, or win percentage, or career earnings, or profitability or return on investment? We could use the winning of the epic events like the virtual Kentucky Derby. That would make for quite a dull post, and you could just go lookup these kind of stats on the stats page of the PFL website (here).

Rating Horses With Trueskill

Have you ever played a competitive online game like Overwatch, or League of Legends, and complained about the rank you have been given? I should be a top 500 player but the algorithm doesn't like me and that's why I am in Bronze rank 5. If only we had a tank who knows what they are doing...

Those games use various algorithms to assess your level of skill and attempt to match you will similar skilled players so that your games are competitive, each victory feels hard won and deserved, and ultimately you remain sticky to the game, just trying to eke out another promotion through the ranks before the end of the season.

For many the first such algorithm is the ELO rating, originally used in chess. We need a slightly more sophisticated approach here because PFL is a many-versus-many PvP game (whether you like it or not). To get accurate ratings we need something that can handle varying amounts of runners in races and come up with a rating for each horse for each of its runs.

Enter Trueskill, developed by Microsoft for online Xbox game matchmaking. Trueskill is also handily available as a Python library, which makes for easy implementation. Trueskill calculates two simple metrics, an average score (mu) and its standard deviation (sigma). Each new horse is initiated with a mu of 25 and a sigma of 8.3333. After every run, each horse's score is updated based on which horses it beat and lost to, and the respective scores of those horses. Winning against a strong horse increases your score more than beating a weak horse and vice versa. Over time as more observations are gathered the sigma decreases representing the model becoming more conifident it its predictions. To calculate the Trueskill score we use Trueskill = mu - (3 x sigma). So each fresh horse starts with a score of 0 (25 - 3x 8.3333).

Here is the career of Kan Kan Pork Chop illustrated through Trueskill. See how the confidence band tightens up as we get more observations and the mean score varies less and less over time. This typically how the system works - potentialy inaccurate to begin with but over time we get a clearly picture.

After 40+ races, the rating and the standard variation doesn't really move very much.

Applying Trueskill to the entire racing population

So, according to Trueskill which horse is the GOAT? The data covers all runs by all horses and finds their maximum rating, their peak performance (remember mu minus 3 x sigma), and we are comparing the peaks of every horse. We could create the current ratings for the horse population but that wouldn't allow us to compare current and retired horses.

Below is the entire population plotted in a scatterplot of Trueskill versus the number of races a horse has run. I have only included horses with a minimum of 10 races (to remove the noisy outliers).

scatterplot of trueskill scores
scatterplot of trueskill scores

A few things to note on this chart. Firstly as previously discussed the data is messy on the left and over time converges on the right. More horses have raced a few times and a few horses have raced many times (100+). Also it is likely that only good horses are continue to be run over a long period. Obviously mares are likely to be retired early, and sometimes really successful stallions are retired early for breeding purposes.

The interesting horses for us are the ones that appear at the top of the chart, that stand above the masses. Who are these horses, and what can we learn from this we couldn't see from the stats page?

The top rated horse is Mr Wonderful. A 3 year old colt out of Bender Wins stable with 14 wins from 19 starts including a Grade I this season. It was originally bought for 499,000 Derby on 4th March 2024, the new owner attempting to flip it for 675,000 for two days before coming to his or her senses. Shades of American Pharoah IRL there. In those first 19 races, Mr Wonderful has generated a purely Derby racing profit of 600,000. Will Mr Wonderful maintain this level over time or will it regress closer to the mean?

At the other end of the races axis is Kan Kan Pork Chop. Together with One .21 Gigawatts, they represent the previous generation of horses who have been races for a long time and still maintain at extemely high Trueskill. The previous chart calls out the top 20 horses by peak Trueskill and their dots are coloured red.

The top 20 horses by Trueskill with some more traditional stats are:

top 20 horses by trueskill
top 20 horses by trueskill

You might find that some of the lesser run horses will regress towards the mean as they run more races and are exposed to more diffferent horses. For Kan Kan and .21 Gigawatts to be so close to the top with over 100 races, is an impressive achievement.

If 19 races isn't enough for you to call Mr Wonderful the GOAT, then you might consider Kan Kan Pork Chop with his 71 wins and 120 podiums in 133 races, and 1.75MM in racing profit from Derby, as the real GOAT.

If we remove the SS-, S+ and S horses, which horses have the highest Trueskill score of the lower grades?

Show Me The Money is an exceptional A+ 4 year old stallion, with a 47% win rate, a 2.45 ROI and a very high Trueskill rating. That rating would go down if he raced against S+ horses all the time, but the reality is most of his victories will have been against similar graded horses. After showing some initial success on the track, Show Me The Money was sold for 80,000 Derby, raced a few times more and then bought by SLV Stables LDF for 200,000 Derby in late March 2024.

Gebrasalong appears to be a an A graded horse at the top end of the grade with a 52% win rate and an ROI of 1.67. Not bad for an A graded horse. Finally Burma is the top rated A- horse, with almost a 50% win rate across 120+ races, generating over 125,000 in Derby profits. A gen zero horse with a strong roll for finish, it was a 12F monster at the lower grades, also owned by SLV Stables.

What did we learn?

Trueskill was designed for online multiplayer PvP games to isolate the skill of the player. It can fulfil a similar role in PFL, given that it itself is an online multiplayer PvP game. Its a single metric that enables us to compare the relative strength of horses. Single metric systems are useful but sometimes misleading. Anyone who has ever had the misfortune of having to analyse NPS data will know what I mean.

Having scores like this may allow us to spot potential breeding options before horses retire. We can create rankings and compare horses from different generations that have never faced each other. We can spot high potential horses that may become the next Kan Kan. Maybe start with the top 20 horses and see how many you can safely breed with, should they appear in the breeding stable.

Perhaps the most useful application is as an input to prediction models for picks. Whilst of some use on its own, it can add insight into selections. Yes horses run against each other in more and less favourable conditions, but ultimately the cream will rise to the top. On its own it's a form of handicapping model, that can help to understand the overall quality of a horse.

As I have said before, the cliche goes "all models are wrong, some are useful". Trueskill is useful. I definitely shouldn't be in the bronze lobby for healers in Overwatch 2, so maybe its not that accurate? We all need to come up with ways to quickly rate horses and make snap decisions - should i claim it, should I buy, where should I race it, when should I retire it and how much should I charge for stud fees? You don't need to use Trueskill, but having a rating system so you can benchmark your horses and those of your rivals is probably a good idea.

Join the fun and put these insights into practice at PhotoFinish.Live and if you are considering starting your own stable please consider using my referral code: PADDOCK or just click on this link: https://signup.photofinish.live/?referralCode=8EUMC4P2

Please remember this is a web3 game where your spend your own money. Nothing I write about should be considered financial or investment advice.