The Forecasters Challenge 2009–final results

I’ve written about Tom Tango’s Forecasters Challenge here before. Tom asked many of us to contribute our preseason rankings of baseball players based on a metric he devised to calculate a player’s contributions on the field. His plan was to run thousands of drafts from these lists. The team that performed best would be judged to be the best, most useful projection system.

There was lots to like in this approach, though as Tom details in the report linked to here, there were also some surprises. He writes about some of the key structural ones, which have led him to run other iterations of the drafts, trying to find a format that gives a more nuanced judgment of the relative lists.

There are three other points that I think should be made.

First, there is a good chance that the weighting between hitting and pitching is off. This is certainly true of my team (which finished fifth of 22 in the original contest). Whether this is because I weighted hitting and pitching the same, which I did, or because I didn’t discount pitchers for their unreliablity, which I didn’t, or because I just undervalued hitters, something was off. Looking at the two components individually, which Tom has said he will do, should help us better understand how the original contest worked.

Secondly, not everyone used straight projections. Some systems weighted for position scarcity. This wasn’t prohibited, so I’m not complaining, but when it comes time to analyze the results it should be understood that in at least a few cases sardines are being compared to mackerels. A simple correlation of all the projections systems to the final actual ranking would be of interest.

Thirdly, as Tom notes about how Marcel handles players with no ML playing time, all systems use a sort of generic noise projection for the marginal players. This means in a correlation study that the noise can overwhelm the estimates of what players expected to have regular playing time will do. For this reason, I don’t think it would be a bad idea for Tom to run the drafts using a 12 or 15 team league format, so that not every projection system is in every league. This would mitigate the problem of small ranking differences being exagerated by the draft procedure, and may give us a better result. His head-to-head matchups are interesting, too, especially since so many ranks changed dramatically, but another angle of analysis on the data would certainly help us figure out what is better.

These notes are not meant to be critical in any way. Tom’s enterprise has thrown off a whole bunch of interesting data, which I hope he will keep returning to all winter long. Once the magazine is done I expect to dig in, too. He deserves a mountain of credit for conceiving this project and seeing it through. Ideally, we’ll be able to do it again next year with a better idea of what we’re going for. Thanks Tom!