Build your own AI harness racing machine learning betting model
by Brett Sturman
Much has been made through the years of the gap between the commercial betting syndicates and computer-assisted wagering (CAW) operations, compared to your typical retail harness player. On one side is the average harness handicapper armed with nothing more than a race program format that’s been unchanged for 100 years. And on the other side has been an army of groups with unlimited resources, data scientists, and access to proprietary data.
It’s always been an impossible gap between the two groups, but times are changing fast and dramatically.
The latest versions of AI tools have helped level the playing field in ways that wouldn’t have even been imaginable up until very recently. I would say any person who has knowledge of the sport (and if you’re reading this, that’s you) combined with curiosity and a foundational understanding of data – can now build something that in the past would have taken a team of developer’s months or years to build.
Easily the most impressive AI tool I’ve seen is Claude, by Anthropic. To the extent it can take domain knowledge provided by a motivated handicapper and use it to build a working predictive model is truly remarkable. Even as of a year ago using an AI tool to help build a serious model for a non-developer would have been significant work. But now, everything from transforming data into readable formats, building end-to-end code in any programming language, implementing the correct type of predictive methods, building diagnostics, and validations to troubleshoot – it’s almost entirely automatic.
Here’s a real-life guide as to what it would look like.
Using TrackMaster as the data provider in this example, go to the TrackMaster website and rather than purchasing a standard program in a PDF file format, instead download the data file in an XML format. What the XML provides is all the data that’s behind the standard race program – but also with some additional data not always seen in the program page. Once you get into data at this level, you’ll start to understand the advantages that these larger groups have had all these years, and this only scratches the surface.
Using Claude or your choice of AI, you’re able to upload the data file and direct it to parse through all the thousands of lines of code, tell it what data you want extracted, and transform it into essentially a standard Excel file. I did this the other day with a file from The Meadowlands last Friday (March 13) and the raw data is illuminating.
In this particular file, which contained over 1,600 rows of data with each row reflecting individual past performance lines, what I found most useful was a field for class rating. Each record had a class rating assigned to it – which is also shown in a standard TrackMaster program next to the speed rating – but being able to have a model analyze it in a read-able format opens more ways the figures can be analyzed.
On any given race, depending on the quality of horses in that specific race, an MADC 1 race could be considered exactly the same as a TM 71 field. But some MADC 1 races are lower, and some TM 71 races are higher and vice versa – and it all depends on which specific horses were in those races. Which $10,000 claiming race was rated higher quality, what about a particular N/W $2,500 L5 or that random NJ SDF race? This type of information takes all of the guesswork out of it, and now you could have a model treat these races differently based on rating even though they all look like the same class at first glance.
From there, use the data any way you see fit. You can tell the AI to incorporate not only the class ratings, but speed ratings, driver and trainer tendencies, track and post position statistics, and keep going from there. It’s solely up to you how you want to weight recent races, treat races differently where horses broke stride, incorporate pace projections, build custom calculations, barn changes, which trainers usually go easy with a horse following a qualifier – there’s no limitation. All you do is tell the AI what you’re thinking, and it goes ahead and writes the programming code to execute it.
One major caveat that I’ll say is that there’s only so much a model can do with raw data, especially in harness racing. As opposed to its sister sport, thoroughbred racing, where these models are highly prevalent and where vast amounts of data points can be used to build a well-constructed model, harness racing has more “feel” to it in my opinion and thus greater volatility.
In a thoroughbred race if any horse is fast enough, it stands a chance to be put into the race and win. But in harness racing, it’s much more difficult to model for a driver who has shown early speed with the same horse five straight races but tonight decides to randomly take back to ninth at odds of 3-1, or a horse that sits in from fourth and gets shuffled, or a horse that’s further back than planned because he let two horses drop in front of him. The human factors go on and on.
This is purely speculation, but it’s for those human reasons that I believe that the best commercial operations using their models in harness racing must have an element of direct driver and trainer information that go into their calculations. It’s just too important of a factor not to be included, and that still might be a proprietary edge that these groups have. But that said, if you’re someone closer to the physical horses than I am and you have your own information, then by all means go ahead and incorporate that information into your model.
All of this is to say that technology has never been close to what it is today for you to go out and build your own model. If built with sound principles and then layered in with your own knowledge and judgement on human interactions within the sport, there’s genuine value that can be created.
For all the massive edges in big data that the larger operations have had for years, this is the first time a regular player can begin to bridge that gap.
















