Tristan Cockcroft for ESPN.com
I missed this story by the esteemed Tristan Cockcroft in February, and mention it now only because despite his consumer warning at the start (a low BABIP doesn’t necessarily mean that a hitter has been unlucky), and because of his interesting use of Expected BABIP, I have some concerns.
1) Tristan’s Expected BABIP is calculated without regard for a pitcher’s defense or a batter’s speed. No wonder Jarrod Washburn had a low BABIP last year in Seattle (as Tristan points out), he was pitching in front of a dee that turned hits into outs. Sticking with Seattle, isn’t it clear from Ichiro’s career BABIP that his expected BABIP, calculated from the components of his AB, is wrong? In this context, what use is the expected BABIP? Maybe some, but since it tells us less than it promises, it seems a little dangerous.
2) Component stats are useful tools, but they are subject to random variation, too. Just because you’re measuring the type of hit by a batter or a pitcher doesn’t mean that the results will hew to the expected number of hits and outs. A small sample is a small sample, and there will be error. How much and in which direction is impossible to say, which is a good reason not to count on players regressing to the mean based on expected BABIP.
3) But they do. Robert Sikon, at Fantasybaseballtrademarket.com, did some studies looking at 2008 BABIP and determined whether unlucky players improved the next year and lucky players batting averages declined. He reports that 64 percent of unlucky hitters improved the next year, and 90 percent of lucky hitters declined.
4) In 2008, Chris Dutton and Peter Bendix published at the hardballtimes.com an improved version of Expected BABIP. This was improved over Dave Studeman’s original formula, which was a rather simplistic Line Drive Percentage + .120. Dutton and Bendix ran regression analysis on years of data to determine which inputs were relevant and they claim their formula explains 39 percent of the variance in BABIP. They don’t publish the formula in this paper, however, so I don’t know how it has stood up, and can’t personally test it. They do have a online tool to calculate xBABIP, which Derek Carty wrote about last year, but you have to enter the info by hand.
I think this BABIP work is really important and I’m glad smart people are working on solving it, but it seems worthwhile to point out that all conclusions are somewhat tentative at this point. We’re still working out how much genuine info is found in these data, and how much it will help us improve our projections, for isn’t that its real value?