The Forecasters Challenge 2009–final results

TangoTiger.net

I’ve written about Tom Tango’s Forecasters Challenge here before. Tom asked many of us to contribute our preseason rankings of baseball players based on a metric he devised to calculate a player’s contributions on the field. His plan was to run thousands of drafts from these lists. The team that performed best would be judged to be the best, most useful projection system.

There was lots to like in this approach, though as Tom details in the report linked to here, there were also some surprises. He writes about some of the key structural ones, which have led him to run other iterations of the drafts, trying to find a format that gives a more nuanced judgment of the relative lists.

There are three other points that I think should be made.

First, there is a good chance that the weighting between hitting and pitching is off. This is certainly true of my team (which finished fifth of 22 in the original contest). Whether this is because I weighted hitting and pitching the same, which I did, or because I didn’t discount pitchers for their unreliablity, which I didn’t, or because I just undervalued hitters, something was off. Looking at the two components individually, which Tom has said he will do, should help us better understand how the original contest worked.

Secondly, not everyone used straight projections. Some systems weighted for position scarcity. This wasn’t prohibited, so I’m not complaining, but when it comes time to analyze the results it should be understood that in at least a few cases sardines are being compared to mackerels. A simple correlation of all the projections systems to the final actual ranking would be of interest.

Thirdly, as Tom notes about how Marcel handles players with no ML playing time, all systems use a sort of generic noise projection for the marginal players. This means in a correlation study that the noise can overwhelm the estimates of what players expected to have regular playing time will do. For this reason, I don’t think it would be a bad idea for Tom to run the drafts using a 12 or 15 team league format, so that not every projection system is in every league. This would mitigate the problem of small ranking differences being exagerated by the draft procedure, and may give us a better result. His head-to-head matchups are interesting, too, especially since so many ranks changed dramatically, but another angle of analysis on the data would certainly help us figure out what is better.

These notes are not meant to be critical in any way. Tom’s enterprise has thrown off a whole bunch of interesting data, which I hope he will keep returning to all winter long. Once the magazine is done I expect to dig in, too. He deserves a mountain of credit for conceiving this project and seeing it through. Ideally, we’ll be able to do it again next year with a better idea of what we’re going for. Thanks Tom!

The Cluelessness of WHIP

Tout Wars AL Standings- CBSSports.com

For years, in the Fantasy Baseball Guide (which I edit), we ran the pitching stat called Ratio. Every year, people would complain and tell me that in their league they used the pitching stat called WHIP, and ask why we didn’t publish that instead.

For years, I replied that:

1) Ratio (((Hits+Walks)*9)/IP)) is much more descriptive/granular than WHIP ((Hits+Walks)/IP), and that,

2) Ratio looks better, since it’s on the same scale as ERA.

I then usually also note that I used Ratio in the leagues I played in, and if they had a problem they should do the same.

I didn’t win this argument. Many readers said they saw my point, but even if they agreed with me, the other people in their league did not, and so weren’t inclined to change. After a lengthy discussion with such readers a few years ago, I changed the magazine. We now publish WHIP instead of Ratio.

To ease the transition, the first year I included a handy WHIP to Ratio converter to cut out of the magazine, which I assume some people are still using. It featured a bodacious picture of WHIP kitten Anna Benson. Unfortunately, I’ve lost mine.

I bring this up now because I was looking at the Tout Wars AL standings just now and was struck by the WHIP category:

Team WHIP Pts Dif
Siano – MLB.com 1.32 12 0
Colton/Wolf – RotoWorld 1.34 11 0
Sam Walker – FantasyLandtheBook.com 1.34 10 0
Moyer – Baseball Info Solutions 1.37 9 2.5
Erickson – Rotowire.com 1.37 8 -1
Michaels – Creative Sports.com 1.37 7 0.5
Berry – ESPN.com 1.37 6 -2
Shandler – Baseball HQ 1.37 5 0
Peterson – STATS LLC 1.38 4 0
Collette – OwnersEdge.com 1.38 3 0
Grey – ESPN 1.41 2 0
Sheehan – Baseball Prospectus 1.42 1 0

My first reaction, assessing the three-way race between Siano, Michaels, and Shandler, is that this is unbearably close. After all, there are five teams at 1.37 and two more at 1.38. Siano is safely atop the category, but couldn’t Michaels easily gain two points? Couldn’t Shandler easily gain four?

In both cases, such gains would erase Siano’s lead. And certainly the numbers say it’s that close. It’s a virtual tie, for pete’s sake.

In fact, it’s not, but WHIP isn’t granular enough to tell you that. Here is the same rankings using Ratio.

Team Ratio Pts Dif
Siano – MLB.com 11.84 12 0
Colton/Wolf – RotoWorld 12.02 11 0
Sam Walker – FantasyLandtheBook.com 12.10 10 0
Moyer – Baseball Info Solutions 12.292 9 2.5
Erickson – Rotowire.com 12.294 8 -1
Michaels – Creative Sports.com 12.312 7 0.5
Berry – ESPN.com 12.330 6 -2
Shandler – Baseball HQ 12.367 5 0
Peterson – STATS LLC 12.387 4 0
Collette – OwnersEdge.com 12.451 3 0
Grey – ESPN 12.65 2 0
Sheehan – Baseball Prospectus 12.75 1 0

I went to the third place among the “tied” teams to show a little more information. To show how much distance there is between these tied teams, here are few facts, looking at Shandler since he’s the last of the teams with a 1.37 WHIP:

If Shandler gets 10 innings with no hits or walks his Ratio drops to 12.263, enough to pass everyone, and his WHIP drops to 1.363.

If Shandler gets 10 innings with 10 hits+walks, a pretty good performance, his ratio drops to 12.338, and he gains no points.

What if Shandler pitches 25 innings the rest of the way, with an excellent Ratio of 9.00 (a WHIP of 1.00) which would be way good, his Ratio would end up at 1.366, which would gain him two points but would still look like 1.37 on the CBSSports reports. His Ratio would drop to 12.297.

The point is that using WHIP, especially displayed to the second place, it looks like there’s a virtual tie, when the reality is that the standings are close, but it would take an extraordinarily good effort for one team to break ahead of the others. Ratio better illustrates this and it provides better and more information, which is why I still think it is a vastly superior stat.

Which is why I think you should change. Let me know when you do.

I Love New Metrics!

Except when I don’t.

This story is about O-Swing %, which measures the number of times a batter swings at pitches out of the strike zone. The writer says that O-Swing % is really interesting, and then goes on to prove (unless his numbers are wrong) that it is pretty much meaningless.

What is actually interesting is that the writer does a decent job of demonstrating why the apparently broad swing in O-Swing % numbers is meaningless. It boils down to the fact that some batters swing more, and so they hit the ball more. While some batters swing less, and hit the ball less. Consider 0-Swing % exhausted, at least for now.

When there is reliable pitch location information there will doubtless be information derived from these numbers that will be of interest, but it certainly won’t be simple or absolute. The game isn’t simply a matter of cause and effect, but a complex system of adjustments and readjustments that change how everything happens. It seems to me the miracle is that the game is played on the same sized field now as it was 100+ years ago. In that context, the variation in results should lead us to explore what changes have been made.

But that has nothing to do with O-Swing %.

Origins of Major League Starting Pitchers, 2008 –

 Minor League Ball John Sickels looks at all the starting pitchers with 10 or more win shares in 2008 and looks at where they were at when they stepped over to the professional game. First rounders have a big edge, but what stands out is that successful pitchers come from everywhere.

A similar list tracking the last 20  or more years would be of great interest, if anyone has time tomorrow (or the next day), since the list itself isn’t exactly objective. I would assume that the way scouts and organizations work has changed over the years, and this would be reflected. Or, more tantalyzingly, maybe not.

Crawford v. Papelbon

I don’t have ESPN and the Sunday late games are blacked out on mlb.tv, so I listened to the end of the Jays-Sox tilt tonight.

Well, I didn’t only listen. I also watched the data flow on mlb’s Gameday app, which shows location and speed and percentage of break and in what direction. I’m not a total believer in ths technology, though the potential is obviously huge. The problem is that the margin of error is great.

In tonight’s climactic Crawford versus Papelbon at bat, two out, men on second on third, the Rays down a run, according to Gameday, Papelbon didn’t throw a strike, but Crawford swung through three  high balls out of the zone.

I’m not saying this didn’t happen, I’m sure it has, but Crawford’s aggressiveness up and out of the zone shown makes me think the zone shown isn’t kosher. I have had the same problem with similar technologies on Fox and ESPN.

The point is, if you show this display but the game doesn’t follow it, all you’ve done is undermine the game. Umpires are far from perfect, but we have to assume they’re doing their best. Data presented as “objective” that doesn’t hew to the common perception is a problem.

Maybe Carl Crawford swung at all those high strikes, I’ll have to go back to the archive to see. If they let me. But until we can have a high standard of confidence in the recent hi-tech tools MLB is selling us, be a little skeptical. It’s what the umpire says, after all, that matters.

Eriq Gardner Looks at Player Raters

THT Fantasy Focus

My April prices will finally get posted here, later today. They are like a primitive version of a player rater like those found at ESPN, Yahoo, CBS and Rototimes, the kind Eriq Gardner writes about at Hardball Times today. Primitive in the sense that the sophisticated big-media versions calculate their values automatically, plugging the stats into a formula and spitting it forth, while I cut and paste stats into a rather elaborate spreadsheet, make some adjustments because of the number of samples, and then generate a report.

Gardner’s point about what the player raters are actually measuring is a good one. Head to head values are a lot less useful to a classic Rotisserie player than straight 4×4 values. And vice versa. And, it doesn’t need to be repeated, values generated from current stats measure what has happened, not what will happen. But I think Eriq misses the main point with the raters and why they’re of value: they synthesize (or should synthesize) the fantasy categories into one score.

Looking at the stats of two players with different profiles, it can be hard to judge which is more valuable. A player rater that properly reflects the values of your league (or at least lets you know what it is measuring) let’s you assess the aggregated value of a player in all the categories. That Jason Frasor is more valuable than Felix Hernandez thus far tells us something about the teams these guys are on, and it tells us something about their stability in the standings.

The player rater also tells us which players are running ahead of pace, and which are running behind. If we know that Ian Kinsler is currently earning $52, we can judge that he is likely to earn relatively less for his team the rest of the way than he has thus far. If we know that Big Papi is earning $2 right now, we can hope that his contribution is going to increase dramatically the rest of the way (though, if we expected him to earn $20 on the year, and he’s earning $2 now, he needs to earn about $23 each of the remaining five months to get back on par). 

I don’t think you could tell the difference between a set of $20 and $23 stats, unless they were side by side.

Player Raters are overrated and underrated, too, it seems. Like most tools, it depends on what you do with them.

Is it too late to win?

KFFL – Baseball HQ

There is something weird going on at KFFL. Old stories from BaseballHQ are showing up, which is fine, but with new dates. This story is from 2003, I think, but the data is important. I’m not so sure about the conclusion.

It is good to know when you can count on the overall volatility of the standings to have “setttled.” I’m not sure I wouldn’t have guessed mid May, but I like some evidence.

I’m also sure that the volatility by category indexes, showing that stolen bases and saves change the least, is counterintuitive and correct. Alas, I’m pretty sure that the article’s conclusion, that this means buy steals and saves on draft day and trade for power later, is wrong, for all the reasons the article points out these categories are the most stable.

Still, despite its date of birth, this and probably other baseballHQ goldies are well worth checking out at KFFL.

Alex says the software is the Cadillac

The Final Update was posted at 9pm on April 9th. Both software and data packages are updated, with lots of adjustments because of the spring surprises. I mean, Andres Torres? That said, he’s coming off a fine season, so who knows?

One player I didn’t update in the update was Emilio Bonifacio. His claiming of the 3B job in Florida was a surprise, as have been his heroics thus far. I probably should have bumped him up to 375 at bats (he’s in there for 275 now), but I don’t like to react too strongly to first week events by changing prices. And I had gotten Bonifacio up to 275 AB because I was high on him as a super utility guy, who steals bases but doesn’t field well enough to hold down a full time job. I still thing that’s what he’s going to end up being. 

This year we created a data only Patton $ Software product, for those who didn’t want to use the software. You can buy either by visiting askrotoman.com/patton but if you’re undecided which product fits your needs better, go to Alex’s pitch for the software at Patton & Co.

Thanks to all who purchased this year’s software, and special thanks to the incredible group who have been buying it year after year after year. Your loyalty is a great compliment. Have a great season! Peter and Alex

An article about Road Home Run Rates

Derek Carty THT Fantasy Focus

The Hardball Times’ fantasy writer looks at which teams and players have the biggest changes in the home run rates of their road ballparks in the coming season. As he says at the end of the story, this is fun stuff, especially if you learn that one of your freezes (Josh Hamilton, let’s say) had one of the toughest road park schedules for homers last year. On the other hand, the team that gains the most this year is the Phillies, up 2.2 percent!

If they hit 105 road homers last year, this information suggests that this year they might hit 107! The last three years the Phillies have averaged 102 road home runs. Make of this what you will.

If the season were a horse race. Isn’t it?

BaseballRace.com

When I was a kid I had a toy race track and I spent an inglorious number of hours turning the dice to see which horse prevailed in that race.

As we all know now, but I didn’t as a magical thinking second grader, the winners came completely at random (though I may have given blue an advantage, since it was my color).

Baseballrace.com animates each season’s pennant race, so you can see in a picturesque display how far ahead the front runners were and how far behind were the laggards.

I’m not sure there’s much actual utility here, but the imaginative display of information may well help you or me or someone else to come up with an idea that changes the way we think. And even if it does not, coming up with something no one else is doing is reason enough to be proud. And wouldn’t it be a great idea for him to license the software to fantasy league stats providers, so that we can live and relive the year of our grief in a horse racey animation?

Okay, maybe not. But maybe.