Trying to understand breeding in PFL
How does breeding work? I don't know. So let's take a look at what we can quickly figure out and go on a journey of discovery together.
Petrocker
5/4/202410 min read
Genetics. Sounds complicated and probably is. It feels like everyone has some knowledge of breeding in PFL, but there is very little common knowledge. The best place to start is the official blog (“read the documentation” as a any software developer will tell you). You can find them here in three parts:
https://thirdtimegames.com/pfl-inside-track-genes-matches/
https://thirdtimegames.com/pfl-inside-track-part-2-gen-0-traits/
https://thirdtimegames.com/pfl-inside-track-part-3/
The post is not a discussion of these articles – but reading them is a good idea if you want to formulate your own opinions on how breeding works, and we will reference some of the topics here. And if you are lazy just cut and paste them into your favourite large language model (LLM) and ask it to tell you what it says and speculate on the meaning.
What we are going to look at today is what we can literally see in breeding reports and the data. I start this process not knowing anything about breeding – so please bear in mind that this is post is more speculative than some of the others. Does it really matter how breeding works if we can successfully predict outcomes regardless? What if we have a much simpler view of the world that kind of works and informs our breeding investment decisions? That would probably be a pretty good outcome. So let’s figure some basics out.
Understanding Breeding Reports
At 4000 DERBY breeding reports are not cheap for the average player. But they clearly contain a great deal of value. Today we are going to focus on attributes (start, speed, stamin, finish, heart and temper) and leave preferences (the stars - more about preferences here).
The first thing of interest is that the breeding report is probabilistic, not deterministic. What does that mean? Well the report is taken from 1,000 trials and the results are the distribution of outcomes from those trials. Deterministic means under a consistent set of circumstances were expect a given answer. That is not the case here. So lesson one - we are at the mercy of a random process, regardless of how much planning we do. There is no guaranteed outcome.
If you read the official blogs linked earlier in the post you will see an example given where each horse has a sequence of eight integers that may represent the gene for an attribute. If we believe the example, an offspring has a 50/50 chance of inheriting either of its parents integers across the eight digit sequence. That means there are (2^8) 256 different combinations of outcomes. Why is this important? Well we have 1,000 trials, so the chance that a specific combination of genes will not appear on a breeding report is (circa) 2%. You have 6 independent events - one for each attribute - so the chance of a potential roll for at least one attribute not appearing on the report is 11.4% (1-(98%^6) ~ ish). The reason for going down that statistical rabbit hole was to illustrate that these reports still have the ability to surprise you.
The reports themselves contain a lot of data that isn't obvious to the naked eye. But with a few tweaks, we can fill in the blanks. Firstly I think reporting make much more sense when you have a grid overlaid on them.
I feel better already. I just used PowerPoint to literally draw a grid and now I can see that the boxes and whiskers of the boxplots do not neatly align with my grid. The boxplots, as with other boxplot in the game display for 4 points. The whiskers are the extremes and the edges of the box represent the IQR (interquartile ranges). 50% of observations will ocurr within the box and 25% each on the whiskers on the side. Here we see that the boxes and the whiskers are of different lengths, and the whiskers themselves are not symmetrical. We are missing two key data points that we already know the answer to, so let's add them...
Why don't breeding reports contain this information out the gate? There is important information that these points provide. To be clear these dots are the sire (blue) and dam (pink) - sorry for the stereotypifying - of the parents. Here temper is interesting. The pink dot is within the box and the blue dot is at an extreme. It looks like the offspring cannot improve its temper combining the best elements of its parents, because the sire already has the better set of values.
It also feels, looking at a number of such reports, that behind the grades, is a level of fidelity that we don't get to see. If we changed the grade letters to a number scale from 1 to 20, where S- is 13 (the 13th grade starting at D- =1 and SSS=20), it feels like S- actually represents 12.5 to 13.4, given where the whispers don't align with letters exactly and the neither do the boxes. Some slight movement being allowed within the boxes would account for this, where there are stronger and weaker grades (say to 1 decimal point) not displayed by the game to the player. This would also go some way to explaining why some horses are graded they way they are despite have individual attributes that predominantly above or below the average grade.
Now that we have built out the chart, we can go back to the problem of not all outcomes being covered. The above example shows a heart outcome where the best possible outcome looks worse that the sire's starting heart. If breeding is as described this shouldn't be possible. A roll of all sire genes would mean the same grade as the sire. So just remember the reports don't tell the whole truth all of the time. And anyone can add some grid lines (you can do it with a pen) and some coloured dots to give you more information.
Our preference for variance
Ultimately a large number of players want their foals to gain grades, to compete at the highest levels of the game. There are two factors at play here, the length of the whiskers representing high compatability between horses, and the difference between starting grades of the parents.
Compatability here, taking the insight from the official blog, suggests an alignment of those 8 digit strings where each parents has high values in different locations on the string of numbers, and a good roll would select all the high numbers from each parent. Two equal quality parents with the high numbers in the same spots would provide short whiskers, and low variance. This is to be avoided if we want to have a chance of increasing the grade of the offspring. In this light, my take is that when people talk about "introducing quality" blood to a bloodline they are referring to a high value in a certain location in the gene sequence that is maybe not commonly available in say RTS or LDF, and the end goal of breeding science is to try to figure out which horses have high values in which chain of the sequence. Doing this will allow you to maximise the grade of a horse. However, this is my speculation and there is probably much more to what parents pass on than pure grade score.
The second factor is the difference between parents' starting grades. It would appear that the wider the gap between starting grades, the less chance the offspring has of achieving a higher grade of its dominant parent.
The previous image from the same breeding report shows the impact of the gap on starting grades. For speed, it is impossible for the foal to be better than its sire. The sire is 5 grades higher than the dam. The boxplot suggests there is a high likelihood for the foal to land right in the middle. For stamina however there is a 1 level gap, and a 3+ grade whisker up or down. This feels like the first example of high compatitability. The parents are similar grades but possibly in completely different ways. Get the right roll (all the high scores) and we will see a maybe 3 grade increase over either parent. Most people like the long whisker distributions because it might lead to a significant jump in grade, desirability and value.
I am sure it would be possible, given enough breeding reports to start to map the genetics of each horse and bloodline, but that is another problem for another day, and a serious undertaking. If you yourself have several breeding reports for the same mare, perhaps see which potential matches provide the longest whiskers and figure out which bloodlines give you the greatest upward momentum. This is a serious undertaking, and tools being created by the teams at sites like https://gapdata.racing/ and https://photofinishedge.com/ which can help players plan their breeding strategies.
Enough speculation, what does the data say?
Good question. Based on some data that is a few weeks out of date (so will need to be updated with all the newly retired horses) there are clear patterns to breeding. Below is a table that needs so explaining so here goes...
The tables below contain data from retired horses where we can see the attributes for the horse and both parents. Some simple analysis has been performed on this data to compare the parent's attributes versus the horses. The vertical axis looks at the gap between the parents' starting grades for start, speed, stamina and finish. The horizontal axis looks the different between the offspring's grade and that of the higher of the two parents, regardless of gender. The percentages add up horizontally.
The pattern is the same for each attribute. Only attributes where both parents have the same starting grade can the offspring have +3 grades (and there were no examples of this for stamina). Remember this data is a few weeks old - so it may have happened since. Taking the first row with 0 difference between the parents' starting grades, there is almost a 50% chance that the grade will be the same, and then pretty much a normal distribution between the other options high or low. As the starting grades start to diverge it becomes hard to overachieve the dominant parent's grades. There is some noise on the negative side, I would put this down to inbreeding - but I haven't analysed it at all.
Looking at another example on the speed chart, if the gap in speed is 4 grades between the parents then there is a 5% chance that the offspring will have the same speed as the dominant parent, 23% of one lower, 43% of being in the middle, 23% chance of being one above the lower parent and 5% chance of being the same as the lower parents. Perfect symmetry. Those are the odds.
Knowing some of these percentages might help you when planning your own breeds. Below are a list of those horses who received a 3 grade bump from their parents' matching grades - there aren't many.
start: Wax On Wax Off , BettyBoo DoinTheDo , Green River Dawn , Infinity
speed: Bilog Ann , That's A Fake Beard
Maybe you can see the patterns in the bloodline that caused this - but my skeptical view is that these are lucky rolls. I haven't included heart and temper in this analysis, but I am sure we would find similar outcomes.
This post has just scratched the surface of how breeding works. I am sure this is a topic we will come back to in the future as our understanding grows. However there is an alternative to trying to figure how breeding works, and that's to just follow the money.
Just give me the answer already
Some people believe there is a hidden performance gene. Some say that genetics are irrelevant, just look at who wins and breed with that stud. Ultimately if the point of breeding is to produce better horses then if I breed with the best horses am I not gaining all their superior genetics? We are in Occam's Razor territory here.
The stable owners that promote their studs do a tremendous job. Just in the last week, the quality of graphics being produced on Twitter/X keeps getting better. There is also a great channel on the official game discord servers for both breeding and studs advice. So I am sure you know all the usual suspects, and know who you can and cannot breed with.
We will look at foal performance of parents in a subsequent post. Today let's look at the best performing male S+ horses from season 14 so far. Some of these horses will be put out to stud, and if you believe that the simplest solution to a problem is the best, then you might want to skip the previous analysis and just try to breed with the highest performing horse you can find. The following table is a list of the top 20 performing male horses ranked by ROI (purely DERBY). Many of these horses will not retire - but some surely will, with no track record of their progeny to back them up - so this might be an opportunity to get ahead of the curve:
Its not all about S+ studs, so here are some others worth enquiring about:
Breeding is a complex topic and there is a rabbit hole you can disappear down very easily. There is lots of nuance that is obviously missing in a post like this. Speculation is high, I have no hands on experience of successful breeding myself, I am just trying to understand how it works, share that understanding and help others make better informed decisions. So please take all of this with a pinch of salt. Engage the Discord channels, use the community tools, and think before you spend.
Join the fun and put these insights into practice at PhotoFinish.Live and if you are considering starting your own stable please consider using my referral code: PADDOCK or just click on this link: https://signup.photofinish.live/?referralCode=8EUMC4P2
Please remember this is a web3 game where your spend your own money. Nothing I write about should be considered financial or investment advice.