2009 Luckiest and Unluckiest Hitters and Pitchers

Tristan Cockcroft for ESPN.com

I missed this story by the esteemed Tristan Cockcroft in February, and mention it now only because despite his consumer warning at the start (a low BABIP doesn’t necessarily mean that a hitter has been unlucky), and because of his interesting use of Expected BABIP, I have some concerns.

1) Tristan’s Expected BABIP is calculated without regard for a pitcher’s defense or a batter’s speed. No wonder Jarrod Washburn had a low BABIP last year in Seattle (as Tristan points out), he was pitching in front of a dee that turned hits into outs. Sticking with Seattle, isn’t it clear from Ichiro’s career BABIP that his expected BABIP, calculated from the components of his AB, is wrong? In this context, what use is the expected BABIP? Maybe some, but since it tells us less than it promises, it seems a little dangerous.

2) Component stats are useful tools, but they are subject to random variation, too. Just because you’re measuring the type of hit by a batter or a pitcher doesn’t mean that the results will hew to the expected number of hits and outs. A small sample is a small sample, and there will be error. How much and in which direction is impossible to say, which is a good reason not to count on players regressing to the mean based on expected BABIP.

3) But they do. Robert Sikon, at Fantasybaseballtrademarket.com, did some studies looking at 2008 BABIP and determined whether unlucky players improved the next year and lucky players batting averages declined. He reports that 64 percent of unlucky hitters improved the next year, and 90 percent of lucky hitters declined.

4) In 2008, Chris Dutton and Peter Bendix published at the hardballtimes.com an improved version of Expected BABIP. This was improved over Dave Studeman’s original formula, which was a rather simplistic Line Drive Percentage + .120. Dutton and Bendix ran regression analysis on years of data to determine which inputs were relevant and they claim their formula explains 39 percent of the variance in BABIP. They don’t publish the formula in this paper, however, so I don’t know how it has stood up, and can’t personally test it.  They do have a online tool to calculate xBABIP, which Derek Carty wrote about last year, but you have to enter the info by hand.

I think this BABIP work is really important and I’m glad smart people are working on solving it, but it seems worthwhile to point out that all conclusions are somewhat tentative at this point. We’re still working out how much genuine info is found in these data, and how much it will help us improve our projections, for isn’t that its real value?

Cardrunners: My first auction of the year

I joined a new high-stakes 5×5 AL only auction league this year. Some of the prize money is put up by a poker education site, cardrunners.com, and some by the participants, who are a mix of fantasy experts, professional poker players, and financial pros. There are only 10 teams, but you can spend your money on all 28 of your rosterable players (you don’t have to, there is a draft when all teams are out of money). This changes the endgame some, as Rotowire’s Chris Liss notes in his post at Rotosynthesis (where he also posted the draft results).

Another wrinkle is that you can buy NL players. I spent some time trying to figure out what Adrian Gonzalez would be worth, and considered throwing him out early, but someone (I don’t remember who) beat me to it. My back of the envelope calculation was that a 50/50 chance for half a season of Gonzalez was worth a blank $8, though that calculation would change as the auction progressed. As teams recognize their strengths and weaknesses it might make sense to bid more for the high risk play. The gambit of coming out early could mean a bargain. In fact, I bumped a $3 bid to $4 and Daniel Dobish, Dave Gonos’ partner, muttering “I’m not letting him go to someone for free,” bid $7 and won him. Not a huge risk, but a nick in his budget he’ll feel if Gonzalez doesn’t come over.

There was a similar calculation in my most uncharacteristic moment in the auction. After adjusting my prices for the smaller league I was pleased in nearly every case but one (there was also a blip in the late early part of the auction where the price of outfielders who steal, namely Ichiro and Denard, went for scandalously low prices) that they were accurately describing the action. The difference came with the catchers, where huge draft inflation persisted all night. The action players, at least the guys who won the high-priced catchers through most of the auction, were the non-experts. They took Mauer to $40 and Victor Martinez to $35, and Napoli and Suzuki to $18. Even at the low end, guys I had listed for $2 were going for $5. Matt Wieters name was called fairly late, but there was still plenty of money around. His price surged past my $16 bid limit, but I had money to spend and when the bidding slowed at $20 I bid $21 and won the sophomore backstop. The move effectively changed my team from Nolan Reimhold and two scrub catchers to Wieters, Jose Guillen and a scrub who turned out to be Brayan Pena.

I don’t remember who had the penultimate bid on Wieters, but if it was one of the Cardrunners boys my brash reach means I wrecked the purity of a position-scarcity experiment, with the so-called experts buying cheap catchers and the so-called amateurs buying the pricey ones. All of them, as noted before, were inflated.

This morning I ran the projected stats of all the teams using the CHONE projections, mostly because I have Chone Figgins on my team. The key is to avoid testing your team using your own projections, since they naturally flavor the players you pick up. I don’t want to give up any competitive edge this exercise offers in its details, but I’m delighted to share for posterity the final standings, which surely won’t look anything like this next October.

TEAM PTS
Phipps 62
Carty 56
Rotoman 53
Hastings 51
Chad 50
Gonos 49
Wiggins 49
Eric 46
Liss 40
Erickson/
Sheehan 38

Since these include active rosters and reserves, and NLers Gonzalez and Ricky Nolasco, and Chone’s projections are generous with the playing time, upping the value here of guys who may not even play, they should be taken with a grain of salt. But they’re a start while we wait for games that matter.

John Burnson’s The Graphical Player

I am a big fan of John Burnson’s Heater Magazine, a weekly pdf of baseball stats and analysis that makes the Sports Weakly baseball stats pages look like the Weekly Reader.

John sent me a copy of his annual book, The Graphical Player, in January, when it came out. I glanced at it then, but I was busy and it ended up on a shelf and I didn’t write about it then, which is too bad. Like Heater Magazine, the Graphical Player is crammed full of information. John is evolving a set of graphical rules for presenting data that makes it increasingly useful and understandable, and helps put a player’s skills in the context of his team and of the game as a whole.

This is not a book to use to look up a fact, though there are plenty of those in here. This is a book to browse through, to hunt for patterns in, to savor as a baseball fan the way a gourmand might taste a sauce. The good news, even at this somewhat frantic moment, is that much of the information in the Graphical Player will still pertain after the season starts. If you want to see if a player has historically been a slow starter, this book has graphs that show that he has been or hasn’t. Once you get used to the way the information is presented, this sort of research is a pleasure. The data and its context are presented as a picture.

Other features of note: John asked three writers who follow prospects to name their 60 top rookies for this year. He has compiled their rankings and notes for these 111 ROY-eligible players, with their stats (presented in a very useful format) for the last three years. This is a very helpful survey of this year’s top prospects, though it does omit my decidedly dark horse candidate Thomas Neale (who didn’t make The Guide, which shows just how dark a horse Neale is).

I also think, as documentary, that the team profile pages in the back of the book are full of useful information. They won’t surprise readers of Heater, but as with much of the book, once you get past the sheer data density you’ll be surprised how satisfying it is to see a chart of who played what position the most each month for each team. And the charts that compare each team’s production in different categories to the league average spark only ideas thus far, but clearly they help us understand what was going on. This is a new way to experience this data, and an invigorating one.

I’ve only scratched the surface of the types of information included in the Graphical Player. Some is of help analyzing baseball, while other stuff is geared totally to fantasy players. I don’t want to be grandiose, but it is an amazing accomplishment.

UPDATE: So I posted the above glowing review only to find out that the only copy of the book you can buy at Amazon currently costs $91. It’s worth every penny, of course, but that’s a little steep. It seems the Graphical Player is also sold out at Acta Publishing, the company that published it. Barnes and Noble doesn’t have it. I’ll tell you what, I’ll sell my copy to the first bidder for $75. And in the meantime, I hope this means that John Burnson sold out his print run and made a small fortune.

Get Off My Lawn – Minor League Ball

by John Sickels

John writes one of those tough screeds that sound, about halfway through, like the complaining crap of an old man. But John isn’t nearly as old as he thinks he is, and what he’s writing about is something I hope all of us who care about baseball and stats and the data have already thought about.

The point is that thanks to Pitch FX and the efforts of BIS and MLB and everyone else scoring baseball games,we’re getting a ton more information about every pitch in every major league game. And the automation of this process promises even more in the coming years.

Much of this data, thanks to MLB by the way, is available to everyone, and so it has become a happy sandbox for baseball fans with a fondness for math.

John’s gripe, if you can call it that, is that all these analysts are sorting through the data and ending up with micro conclusions that don’t really mean much to someone watching any particular game.

What I would add is that we know an awful lot about baseball because of the things we’ve learned before this great outpouring of pitch by pitch data. Much of what we learn after all the new data has been processed and tested and used is going to support the observations of those who watched the game closely before all the data was known.

When I’m grumpy I wonder why I’m reading yet another study that confirms what we already knew about this or that baseball situation. But that doesn’t mean those studies aren’t important. We gain the most knowledge by testing everything, each situation and contingency and viewpoint, and then see what shakes out. Confirmation means as much as a fresh idea.

Despite all the noise out there, that’s what’s happening now. John recognizes that, but he’s honest enough to point out that it makes him weary. Me, too.

Forecaster and Handbook are out!

I got my copy of the Baseball Forecaster about 10 days ago, but closing the magazine meant not cracking it, even though I’ve got a short bit in it (which happened to run here first, about WHIP v. WH/9), until now.

Ron’s lead essay is very smart. It’s about how wrong we are about players, year after year, and he wonders why we pursue exacting but nearly always wrong projections. Then he comes up with something new, called the Mayberry Method.

There’s a lot to like about the way the MM summarizes a player’s skills in a descriptive way. Yet despite it’s simplicity, I’m not convinced it is going to catch on. New stuff often doesn’t, even when it has real merit. On the other hand, the benchmarks MM describes so succinctly are becoming increasingly entrenched as leading indicators, making me wonder why–if we’re getting better at defining leading indicators–we’re not getting better predicting breakouts.

As Ron says in the piece, we may be smarter now than we were 20 years ago, but that may not be such a good thing.

Steve Moyer always gives us so-called experts a copy of the hot-off-the-press Bill James Handbook at First Pitch Arizona, for which I am very grateful. Not that I wouldn’t buy it, I have many times, but this way it ends up in my hands even sooner.

The book continues to grow, with increased focus on the defense awards and rankings, focus on baserunning skills, and the ever useful park factors. I’m a great fan of baseball-reference.com and fangraphs.com, both of which I use all day long, but I sit and read the Bill James Handbook, poring over its pages as if it were a ripping good yarn, which in many ways it is.

I’m glad for both these books and recommend them highly.

The Forecasters Challenge 2009–revisited again

The Forecasters Challenge 2009

Tom Tango said today he’ll be running the Forecasters Challenge again in 2010. The primary judging will come from the Pros-Joes format, which is described in the link above. The idea, basically, is to have each pro draft against 21 inferior lists. In last year’s challenge my projections ranked 3rd using this method.

For the record, using the 22 pros against all the other pros, my projections ranked 5th.

In the head to head scoring system, I was second division.

Overall, Rotoworld and John Eric Hanson seemed to score the best.

salary vs performance

ben fry

There has been a lot of talk about the Yanks buying the pennant, which ignores the fact that for eight years they bought the pennant but lost. I have a hard time working up to umbrage, but I do think it’s hard to judge the Yanks a great team because of all the extra money they spent.

Or rather, they may be a great team, but that’s because of all the money that was spent. The good news is that Cashman finally got it kind of right.

Ben Fry has charted the standings for the 30 teams based on their standings throughout the season. I’m not sure you learn anything concrete from this, but it’s a beautiful chart nonetheless. Have fun with the slider up top.