Well, this actually broke a month ago, by Mike Fast at Baseballprospectus.com, but I just saw it. In a nutshell, Mike uses pitch data to show that pitchers and hitters influence the Horizontal Speed of the Ball off the bat. This metric is derived, if I’m understanding correctly, by dividing the distance the ball traveled by the time it takes to go that far. Hard hit balls get places faster, more softly hit balls get places slower. Pop ups, almost always outs, don’t go far at all and take a long time to get there. By virtue of some commonsensical tests, Mike shows that pitchers have some effect on how hard the ball is hit against them. This appears to be groundbreaking work that confirms and quantifies what all of sensed intuitively: That BABIP isn’t completely random for pitchers. Good stuff.
Deadspin has published a takeout of “King of Sleaze Mountain” super agent Dan Lozano based on anonymous files it was sent recently. It is no endorsement of Lozano and his behavior over the years to say that this story of a preternaturally adept chameleonic salesmanship and cheesy hooker procurement leaves one feeling a little dirty, because there are only two real issues that seem to have legal legs and a sports implication:
Does Alex Rodriguez own part of Lozano’s business? This is not allowed under the rules of baseball, I gather, for all the obvious ethical reasons you can imagine, but there is some evidence that he does, and Deadspin doesn’t dig beneath the surface of the accusation to find out if that evidence is real or not. And,
Did Lozano push Albert Pujols into an ill-advised contract back in 2004 in order to generate cash flow he needed personally? Again, Deadspin makes the accusation and leaves it at that.
My friend, Cardinals-watching Brian Walton, didn’t leave it at that, and takes a look at the facts of what happened in 2004 with Lozano, the Cards and Pujols at scout.com. You can read his excellent piece at stlcardinals.scout.com.
All we can say to Deadspin is, That wasn’t hard now, was it?
I found this 2007 paper by David A. Levine at the Retrosheet.org site. It makes big use of the at bat data there, and is very energetically argued and fun to read. Can you trust the answer? I think so, since it agrees with Bill James’ earlier conclusion, which if I’m remembering correctly used different methodology (and didn’t have access to all that data).
Every ballpark, every year, every dimension, every park factor. The new Seamheads Ballpark Database promises to be an invaluable resource.
Having just finished and released my projections for the Patton $ Online Software product I’m thinking about the accuracy and usefulness of projections more than usual (and I usually think about this subject a lot).
Those of us who make projections want our projections to be the most accurate, but it turns out that measuring a set of projections versus what actually happened is a complicated business. Just how complicated becomes clear if you read the first two parts of Tom Tango’s analysis of five different projection systems from 2007-2010.
But you don’t have to, Tom says you can skip those parts, and you’ll still appreciate the results, which show that CHONE was probably the best projection system in recent years, but that it wasn’t much better than Marcel, which Tango invented as a simple baseline projection that could be measured against more sophisticated systems to evaluate them. If they don’t do better, they aren’t adding value.
The question is how much value any of the systems is adding. The answer depends on what you’re looking for, but the assertion by one of the commenters that accurate projections probably matter most to fantasy players rubs me that raw way. As the survey results show, using projections to value players for your fantasy league isn’t going to get you very far. The margin of error for each projection is far wider than the range of projections from all the various sources.
Different projection systems incorporate different aspects of baseball analysis. My projections use complex regression analysis of previous performance, filtered first by age, and then by my tweaking.
Other systems use other inputs. PECOTA draws on similar player/career arcs to project into the future, for instance, while ZIPS and CHONE incorporate some of the newer stats to establish complex systems of regressing outlying performance to the mean.
I have my doubts how far such empirical formulation will take us toward the grail of accurate projections, the ball hasn’t moved much in recent years despite lots of new data, but all the work is necessary to tease out what real information there is to be found in the numbers. Tango’s report and the many comments that follow it are invaluable for showing what the challenges are, and perhaps eventually suggesting a way forward.
Based on this story by Yahoo’s fine Jeff Passan, Voros McCracken leads a Defense Independent Life.
Like much writing on the internet, this story is probably twice as long as it should be, and because of repetition suffers from a sentimentality that makes me less sympathetic than I might be. But it is a good and sad story, and helps explain that whole Voros thing that always gets folks worked up, and puts a human face on it, too.
I do think that it isn’t reasonable to expect to make a living from thinking about baseball, or, for instance, inventing a game like Rotisserie. It could happen, but more than likely won’t. Them’s the breaks.
UPDATE: A story in Slate today looks at efforts to discover Moneyball-like efficiencies in soccer stats. Curiously, these efforts are led by Billy Beane, and the story ends noting that Voros is working on soccer these days. But the real insight is that while efforts to decode baseball are largely open source, the push into soccer (which has no meaningful collective “sabermetrics”) are being led by proprietary interests, just as Voros’ revolutionary insight was made in public, and his work life these days for a European soccer club is private.
For much of my long adult life, Murray Chass wrote about baseball for the New York Times, my hometown paper. His old-school ways provoked the enmity of bloggers and sabermetricians and a few years ago the Times chose not to continue to employ him. But thankfully Murray soldiers on, because despite his myopia about the numbers of baseball, he is a fine prose stylist with a well-stocked rolodex of baseball contacts. His voice is of value, even if he’s not au courant.
I’m writing this because of a recent Chass post on his website (at which he writes short articles about things that interest him twice weekly while abjuring blogs) about the relationship between the Hall of Fame Ballot, which was due last Friday, and stats like Wins Above Replacement, which try to objectify a player’s value to his team. You can read Murray Chass’s blog post, er, article here.
I don’t read Murray Chass’s site regularly, and in fact came to this story via Tom Tango’s The Book Blog, where Tom tried to answer some of Murray’s questions about WAR the other day. Interestingly, his post provoked an avalanche of debate at Baseball Think Factory about whether Tom’s tone was inclusive or condescending.
Tom says he was trying to be helpful. Murray says he thought Tom was trying to be helpful. Case closed. But the lengthy discussion reveals lots about the issues. We love baseball because it’s a game played by humans, in all their variety, that excites us because of the skill of the players.
But we also love baseball because it is a game played outdoors in warm weather. Baseball provides spectacle to fan and family member who couldn’t care less alike about the actual game, but enjoys the experience visiting the ballpark provides.
And some of us love baseball because it is a closed statistical system, that allows us to munch and crunch the numbers in many clever ways to discover things that may not be directly related to describing the humans who play the games, but does give us insight into the way the game works.
I think Murray Chass is wrong about bloggers and sabermetricians, but I think bloggers and sabermetricians are wrong about Murray Chass. We need all the voices who know anything at all about baseball contributing if we’re to get our analysis and history right.
You should read all of David’s summaries of the Abstracts, and you should read all of Bill James, from the Abstracts to after.
I hope you knew that, but if you didn’t, now you do.
David’s summary of Bill James’s last Baseball Abstract is most excellent. A place to start if you don’t know all this stuff, and a place to collect your thoughts if you already do.
BTW, I have probably written about this post multiple times before. Nuff said.
Ps. One of the greatest insights in this piece is Bill James’s notice of how great an influence defense has on pitchers. We’ve all been noticing this the last few years, and major league teams have been acting on this idea, but Bill James pointed it out 22 years ago. Plus, he could write.
In a previous post I wrote about the Cardrunners League I’m playing on, pitting quants vs. so-called fantasy experts. This has become a rather unwieldy mess, in part because the central issues keep erupting into flashes of debate about whether analysis or intuition matters more. The funny thing is that even when there is too much blather in this pissing match, there are interesting issues that come up about what we know and what we don’t know about the game of fantasy baseball itself.
Now, some THTFantasy writers are weighing in at their own site. Derek Carty is also a Cardrunners League competitor, but I like Derek Ambrosino’s take, which makes many of the points I’ve been trying to make, often with more wit. Derek also quotes a Mike Podhorzer piece about what makes an expert, which is a must read. Paul Singman also talks about the problem of identifying which players and which fantasy strategies actually work, which is certainly a huge issue. How do you decide what works if there’s no definitive way to test it?
For my part, I would love a tool that let me test different strategies in thousands of runs, to see what range of possibilities there really are. But I think the Derek defines the nature of the game in a most instructive way when he compares it to chess (a head to head game) and the stock market (a one against many game with many winners and many losers). Roto is a game of one against many with only one winner, which is different. Setting yourself apart would seem to be essential to win, but how is this done? The quants seems to think incrementally, by buying value. I think the so-called experts see more need for radical action, though it is certainly open to debate whether these are genius picks or zagging while others zig. All in all, a fascinating discussion if you have the time.
I missed this story by the esteemed Tristan Cockcroft in February, and mention it now only because despite his consumer warning at the start (a low BABIP doesn’t necessarily mean that a hitter has been unlucky), and because of his interesting use of Expected BABIP, I have some concerns.
1) Tristan’s Expected BABIP is calculated without regard for a pitcher’s defense or a batter’s speed. No wonder Jarrod Washburn had a low BABIP last year in Seattle (as Tristan points out), he was pitching in front of a dee that turned hits into outs. Sticking with Seattle, isn’t it clear from Ichiro’s career BABIP that his expected BABIP, calculated from the components of his AB, is wrong? In this context, what use is the expected BABIP? Maybe some, but since it tells us less than it promises, it seems a little dangerous.
2) Component stats are useful tools, but they are subject to random variation, too. Just because you’re measuring the type of hit by a batter or a pitcher doesn’t mean that the results will hew to the expected number of hits and outs. A small sample is a small sample, and there will be error. How much and in which direction is impossible to say, which is a good reason not to count on players regressing to the mean based on expected BABIP.
3) But they do. Robert Sikon, at Fantasybaseballtrademarket.com, did some studies looking at 2008 BABIP and determined whether unlucky players improved the next year and lucky players batting averages declined. He reports that 64 percent of unlucky hitters improved the next year, and 90 percent of lucky hitters declined.
4) In 2008, Chris Dutton and Peter Bendix published at the hardballtimes.com an improved version of Expected BABIP. This was improved over Dave Studeman’s original formula, which was a rather simplistic Line Drive Percentage + .120. Dutton and Bendix ran regression analysis on years of data to determine which inputs were relevant and they claim their formula explains 39 percent of the variance in BABIP. They don’t publish the formula in this paper, however, so I don’t know how it has stood up, and can’t personally test it.Â They do have a online tool to calculate xBABIP, which Derek Carty wrote about last year, but you have to enter the info by hand.
I think this BABIP work is really important and I’m glad smart people are working on solving it, but it seems worthwhile to point out that all conclusions are somewhat tentative at this point. We’re still working out how much genuine info is found in these data, and how much it will help us improve our projections, for isn’t that its real value?