LINK: OBP’S Lawr’s Lawr Michaels tackles on-base percentage, as Tout Wars’ AL and NL leagues transition to the greatest stat ever! Um, to a better stat than Batting Average. Read it here.

The Future of Sabremetrics

I’m sure the SABR Analytics conference would be great fun.

For one, Phoenix in March is baseball heaven, if you can get tickets.

Plus, as this story by Christina Kahrl makes clear, there are lots of smart baseball analysts in one place talking about the game and the analysis of its numbers. But two things struck me about the answers Christina got about sabrementrics, evolution or revolution?

First, that this year’s big story was pitch framing. I don’t know what was presented at the conference, but what I like about much of the pitch framing work I’ve seen is just how teased out it is. Like a detective story or  a bit of counterhistory, the idea has existed for a long time. The PitchF/X numbers don’t obviously lead to framing data, but when churned and cleaned, new information emerges. That’s neat.

The other is more important. A few of the respondents talk about the importance of the PitchF/X data and some mention the BIS fielding data, the importance of which cannot be overestimated. To the extent that data is available and more eyes see it and are inspired to work with it, the more real information is developed.

Which is why MLB’s VP of Stats Cory Schwartz’s statement seems like the most significant in the piece: “I think once we are able to roll out the complete field-tracking system and start to introduce some of that data into public space to whatever extent it might be, I think that will further increase the pace of evolution and perhaps bring about what we would consider revolutionary turning points.”

Emphasis mine. Some of that data, to whatever extent it might be, these qualifiers are going to make a huge difference to the future of the analytic community in the coming years. MLBAM surely recognizes the incredible dynamic force they unleashed by making the PitchF/X data available, something we should not fail to remind them every chance we get.

RESOURCE: The Hardball Times Correlation of Every Batted Ball Tool

I sure hope I linked to the Hardball Times Correlation of Every Pitched Ball Tool last year. It is a web app that helps you compare two stats and see how they correlate, either in one year or compared to the next or preceding years.

It is an easy way to quickly test ideas, to see whether the data supports that one stat is a leading indicator of another.

Steve Staude has just released a hitting version of the tool.

Accompanying the two tools (and an accompanying spreadsheet with all the data) he explores some fundamental issues about batted ball data and strike zone data that point to all sorts of evidence to support or crush conjectures.

I happen to like this chart, which shows various rates on batted ball types.

Screenshot 2014-03-01 15.27.07

Steve points out that BABIP on Fly Balls compared to Ground Balls makes it look like Ground Balls are better, but reminds us that since Home Runs aren’t included in BABIP, it is misleading. All the other stats give a better idea of the relative value of Fly Balls and Ground Balls.


ASK ROTOMAN: My League Is Using New Categories. Help!

Dear Rotoman,

My 5×5 Rotisserrie – 10 team NL-Only Yahoo League is switching categories this year:  New Categories are XBH, OBP and E, replacing HR, BA & SB to go with RBI and R for five categories.  In pitching we are keeping W, Sv, ERA & Whip and replacing K with K/BB.  How do I project what I will need in categories without a previous history of scoring?

“Categorically Insane”

Dear CI:

Wow. I’m a big fan of experimentation and innovation, and I love the fact that your league is jumping into it head first, but I’m sorry to inform you that you are uncorking an Albert Belle bat’s worth of complication with your changes (the least of which is projecting how much of any category you’re going to need to win). Here’s why:

Valuing stats is easy. Knowing how many you’ll need to win isn’t, but isn’t necessary unless your league doesn’t allow you to trade. And even then you’ll be better off knowing how much each player is worth than targeting category totals.

Your goal is to amass value, which means buying stats that others are undervaluing. Targeting category totals too often leads to teams overbidding to reach their goals.

Obviously, there is a point when too much is too much, when you have way more steals or saves than you can gain points for in roto scoring, but common sense should be enough to guide you there. In the meantime, collect value.

The problem for your league is that some of your changes are provocative and disrupt the way we usually play the game.

Not XBH, which is just like HR, only it rewards Doubles and Triples hitters. And not OBP, which is just like BA, but rewards guys who take a walk. But Errors? Hell yes.

Errors is a backward category. The lower the number, the better. The problem is that fielders make errors not only in proportion to how many they make, but by how much they play. The more they play, the more errors they make.

More playing time has long been a key strategy for 5×5 roto. You want to win the AB race, even though AB isn’t a category, because the more AB your team puts up the more Runs and RBI and HR it will accrue.

So, if we look at the top 15 NL shortstops last year in fewest Errors allowed (200 AB minimum), they averaged 621 innings played and 7 errors (84 innings per error), while the top 15 NL shortstop last year in Offensive contribution (not including steals, which you’re replacing), averaged 990 inning played and 12 errors (83 innings per error).

As you can see, there’s almost no difference in quality as a group, but the heavier offensive contributors play more and hurt more in your Error category.

While there are clear winners (Troy Tulowitzki, maybe Jose Iglesias) and losers (Jonathan Villar! Dee Gordon!), it isn’t clear to me how you go about choosing whether to roster Brandon Crawford, good defender but makes errors because he plays a lot and is a marginal offensive talent, or Daniel Descalso, who played much less, contributed less offensively, but hardly made half as many errors.

And since the player pool determines the value of players, every change to the pool has the potential to shift all the prices. Fascinating stuff. And good luck with it.

(SIDEBAR: To value the reverse category you would credit each player with Each Error He Didn’t Make. So, Starlin Castro made the most errors as a SS in the NL last year, with 22. Every other player Didn’t Make 22-the number of errors he did make.)

Converting from Strikeouts to Strikeouts Divided By Bases on Balls is a whole ‘nother matter. Here you’re switching from a quantitative stat that measures playing time almost as much as quality, there are many leagues that play with IP as their fifth 5×5 category rather than strikeouts. Put this together with ERA and WHIP, also qualitative stats, and you’re almost begging for teams to try pare their innings pitched to a minimum.

Remember that no starters earn Saves, and few closers rank highly in Wins, so you’re basically measuring pitchers on their quality innings. I’m a bit skeptical about this innovation being a good idea, but if you have a stringent minimum IP limit it might work.

Still, if you’re playing with real Yahoo rosters, guys who qualify as SP but work in relief are going to be gold.

To get back to your question. In standard roto leagues, a good benchmark for last place in the qualitative categories is the major league average. Players who do better than that are some roto help. In your somewhat smaller league the right number is going to be better. To figure out K/BB I recommend sorting last year’s stats based on different IP threshholds.

With a minimum IP of 40 last year, 22 of the Top 30 pitchers in K/BB were relievers.


Playing for the Middle

A few years after I joined the American Dream League I had a decent season and finished in fourth place. I’d battled pitcher injuries and failures all year long, and had the sense that–in this 4×4 league–I was losing ground in wins. I had struggled valiantly to hang in there.

A few days after the season was over I received (in the mail! that’s how long ago this was) the final report from Heath Data, which confirmed the numbers those of us who were in the hunt got when we updated the stats each day manually during the final week. I wasn’t happy about the result, but I was glad to be in the money. But then I looked at the report that Heath called the hypothetical standings, based on our teams’ rosters coming out of the draft, as if we’d made no moves all season long. The hypotheticals are a good way to look at the team you were dealt, as it were. If you suffered a lot of injuries or PEDs suspensions that year your team would suffer in the hypotheticals.

But the draft day hypotheticals are also a good way to see how you played during the year. If you suffered a lot of injuries but turned your poor hypothetical showing into a strong real showing, you did good. Or vice versa, which is what happened to me in that long ago year. In fact, when I looked more closely, I discovered that if I ranked my draft day stats in the actual end of year standings against the actual non-hypothetical stats of the other ADL teams, I’d bought enough stats in the draft to have won the league. Another way to put it: if a safe fell on my head as I left the draft at O’Reilly’s that year, I would have finished first even with the other teams picking up injury replacements and making trades all season long.

Instead I had finished fourth. I found this to be profoundly depressing.

Screen Shot 2013-09-21 at 7.43.29 PMI bring this up now because something similar is happening in the American Dream League this year., our current stats service, sensibly offers up hypothetical standings all season long. Push a button and you get the draft day standings up to that date. I was aware all season long that I was hypothetically doing much better than I was in the real game, but I chalked that up to a disastrous series of moves I made back in May. That was when I traded Elvis Andrus and Junichi Tazawa for Justin Verlander. I knew I had a big surplus in steals and I thought adding the best pitcher in baseball would help my team. I was able to leverage Tazawa’s presumed role of closer to swing the deal, and was glad to see my hunch pay off and Tazawa didn’t hold the job. Unfortunately, Verlander didn’t do the job–perhaps distracted by girls–but that wasn’t my mistake. I made the right move but it didn’t work out. My actual mistake came from the blind side.

At that point in the season, mid May, my pitcher Felix Doubront was looking dismal. His velocity was down, his control stunk, his ERA was something above 6.00 with a giant WHIP. I had liked him a lot coming into the season, but I despaired that he was damaged and killing my pitching stats. Also, a few weeks before, I’d picked up Cleveland’s Corey Kluber, who had looked like a strong strikeout pitcher with good skills for a few games, but then he got pounded in a game and his ERA ballooned up above six, too. Which pitcher would he be going forward? I figured with Verlander and Shields I didn’t need the risk.

Another factor was our rule that teams that don’t get seven saves during the season get zero points in the saves category. This isn’t a huge deal, but points are points and having just traded Tazawa, who I assumed would get a few, I was scouring the waiver wire and found two potential sources in Oliver Perez in Seattle, where Tom Wilhelmsen was struggling, and fireballer Josh Leuke in Tampa Bay, where Fernando Rodney had suddenly gone all, um, historical-Rodney-like with his control. I decided to go after these putative closers, deciding to release Doubront and Kluber–who each had two-starts against tough teams coming up that week–if I got them (another rule we have limits you to three Special Reserves a season of players who aren’t on the DL or in the minors). I got them.

Both Doubront and Kluber pitched surprisingly well in their four games that week, but to my satisfaction didn’t win any of them. The next week I made a substantial bid on Kluber but was beat out. I didn’t bid on Doubront because I was still convinced he was damaged, but he soon showed he wasn’t. He immediately, punishingly, became the pitcher I had frozen to start the year, thinking, hoping, I had a major breakout candidate. Kluber pitched very well until he got hurt, and certainly would have helped my staff a lot if I’d Special Reserved him, and until a rough patch in September, Doubront was excellent. Perez and Leuke turned out to be nothing, which is why I thought pitching was the problem with my team. But when I looked more closely at the hypotheticals today, they show something different:

My team has 14 fewer homers and 18 fewer steals now than I bought on draft day. I do have 18 more wins, but my ERA and WHIP are both higher than the team I bought back then, even after trading for Verlander. That’s in part because when Verlander was on my team he had a 3.91 ERA, just barely better than my team’s, and a hurtful 13.24 Ratio. (I subsequently traded him for Chris Sale, who will be a decent freeze next spring.) But the real damage to my team came to my offense, which was the result of a series of trades with which I intended to add homers and batting average.

Following Andrus and Tazawa for Verlander and Marwin Gonzalez I did the following:

jacoby_ellsburyI traded Jacoby Ellsbury and Mike Zunino for Alex Rios, Ryan Lavarnway and John Jaso. Rios had 10 homers and 10 steals at that point, Ellsbury 1 homer and 24 steals, and didn’t hit for much power last year. I needed homers badly, I thought, even though I expected the weakling speed-corner Hosmer to hit a few and the feckless DH Butler to get going at some pointuy-=. Nothing happened for a week, and then Rios started running like crazy and hit no homers for the 89 at bats I had him for. Ellsbury went crazy and hit .350 over the next month, with a few homers, and has now hit seven homers for the Peppers. I needed catcher at bats, too, and while Lavarnway was a long shot Jaso was getting them. Zunino was not hitting in Triple-A, so who knew when he’d get the call, and even if he did, he might not hit .200.

Five weeks later I traded Rios, Tommy Milone and Jimmy Paredes for Ian Kinsler, Erasmo Ramirez and Leury Garcia. Kinsler would surely hit some homers, I thought, and with him and Zobrist and Asdrubal Cabrera my infield was pretty strong. There was a lot of griping in the league about the Milone for Ramirez component, the team that got Milone badly needed pitching and had for some reason had been talking up Ramirez up like a huckster, but I thought there was a pretty good chance Ramirez would be the better of the two the rest of the way. I wanted him in the deal and I was right, though Milone set a low bar landing in the minors for most of the second half.

A few weeks later I traded the newly FAAB-acquired speedster Jonathan Villar for Derek Jeter, who was just back off the DL. That didn’t work out, since the Captain proved unable to play, but I’m still second in steals at this point, so it wasn’t too costly.

The problem is that if I’d stuck with my draft day team I would have seven points in HR, instead I have two, and I would have 12 points in BA, instead I have eight. My draft day team inserted into the current real world standings against the actual stats of the other ADL teams would have 60 points, a solid fourth place, instead of 52 and a mad six-team scramble for places four through nine.

That team includes Al Albuquerque, Jake Arrieta and Brett Anderson, active all year, as well as Tommy Milone and Fernando Martinez a big hole on offense. Just adding Cory Kluber to the mix for Arrieta (AL stats only) and sticking with it would move the team up a few points and into solid contention with the BBs, Veecks and Jerrys for the championship. Yeesh.

The real mystery is how did I add 225 AB on the year and lose 14 homers (and seven RBI) and .007 of BA when what I was trying to do was add homers and BA? The answer is timing.

In 470 or so AB for my team, Jacoby Ellsbury, Elvis Andrus and Ryan Flaherty hit one homer. In 900 at bats not for my team, that trio has hit 19 homers. Plus Alex Rios, who hit zero homers in the 90 AB while I had him, hit 11 in the 246 AB before I acquired him, and six in the 250 at bats since I traded him. There was a power outage at Bad K Park this year, to be sure, and to catch up I churned the waiver and FAAB wires, trying to add homers, and didn’t succeed while damaging my BA.

The result is that rather than fighting with three other teams for the money spots, I’m in a wrangle with six teams for fourth place. I’m not sure what the lesson of this is. I played aggressively and got burned by bad timing. Verlander’s ERA when I traded for him was 3.17. For me he was 3.91. Since I traded him he’s been 3.74, but his WHIP during since I traded him was by far his best of the year (1.15). Whether or not it was my fault, clearly my activity was damaging.

Maybe next year I’ll practice stillness, and see how that works out.

All Star Stats Is Down. For the count.

Perhaps because of Rotoman’s kvetch last May about the terrible and terribly expensive service All Star Stats was providing, All Star Stats announced today that they are shuttering their terrible stat service.

They’ve made a deal with to provide services free in 2013. I haven’t used’s stat service in a couple of years, but then it was not friendly for league commissioners. It does have a lot of bells and whistles and I can say that they were trying to improve things, so they may be fine.

What I can say for sure is that is an excellent baseball stat service with a good price and lots of fun features. I can’t recommend them highly enough.

KVETCH: All Star Stats is Horrible!

It is 9 am on Thursday morning. My coffee is gone and my preworkday ritual of checking my fantasy baseball teams has been disrupted because All Star Stats hasn’t updated yesterday’s stats. Instead I see this (click to expand):

This is the third day this week ASS (as we mockingly call them) has failed to do what every other stat service seems able to do; that is, update the standings before midmorning. Yesterday they didn’t update until after 11 am!

This is absurd because ASS is an expensive service. The book rate to run a league for the season is more than $500. We complained a few years ago and they cut the price by a couple hundred dollars, but we still pay a premium. (I should note that the screen capture is from the XFL, which is comped by ASS because we’re a longtime “industry” league. I can’t complain about that. It is the American Dream League that is being bled by the service thieves at All Star Stats.)

If this was the first time there were service problems, so be it, but I was just searching through my email and discovered that we were complaining about this exact same issue in 2007! There have been many days with the same problem every year! Many days. Yes, we are idiots.

What has kept us at ASS all these years is what kept us at USA Stats in the years before that company was bought by All Start stats. (Before that we were with the venerated and brilliant Heath Data Services, perhaps the game’s first stat service, which has not ever been matched, but was sold to USA Stats in the mid-90s.) That is inertia. A league full of older guys fears having to learn a new system. The discomfort zone is high, even when the company delivering the goods now is doing a terrible job of it, costing 350 percent more for fewer features and less reliability.

I can promise you, we will not be with ASS next year. Even the most frightened of our cohort is realizing that this level of indifference is degrading, insulting, cannot be tolerated by reasonable people. Not being able to get our stats for a few hours from time to time isn’t the biggest deal in the world, for sure, but why should we pay extra for this? I should note that the ASS support people write apologetically very well, what with all the practice they’ve had.

ASS is owned by NBC, which is owned by Comcast, neither of which has any organic connection to the fantasy baseball world. They are playing us for fools, and it is well past time we all move on.

John Burgeson on the variability of baseball statistics

This collection of listserv posts from 2000 is the clearest expression of how baseball stats don’t tell the whole story about a ballplayer’s talents. Nor are statistics the last word about teams. Must reading for anyone interested in baseball statistics and sabremetrics.


DIL: The Voros McCracken Story

Based on this story by Yahoo’s fine Jeff Passan, Voros McCracken leads a Defense Independent Life.

Like much writing on the internet, this story is probably twice as long as it should be, and because of repetition suffers from a sentimentality that makes me less sympathetic than I might be. But it is a good and sad story, and helps explain that whole Voros thing that always gets folks worked up, and puts a human face on it, too.

I do think that it isn’t reasonable to expect to make a living from thinking about baseball, or, for instance, inventing a game like Rotisserie. It could happen, but more than likely won’t. Them’s the breaks.

UPDATE: A story in Slate today looks at efforts to discover Moneyball-like efficiencies in soccer stats. Curiously, these efforts are led by Billy Beane, and the story ends noting that Voros is working on soccer these days. But the real insight is that while efforts to decode baseball are largely open source, the push into soccer (which has no meaningful collective “sabermetrics”) are being led by proprietary interests, just as Voros’ revolutionary insight was made in public, and his work life these days for a European soccer club is private.

2009 Luckiest and Unluckiest Hitters and Pitchers

Tristan Cockcroft for

I missed this story by the esteemed Tristan Cockcroft in February, and mention it now only because despite his consumer warning at the start (a low BABIP doesn’t necessarily mean that a hitter has been unlucky), and because of his interesting use of Expected BABIP, I have some concerns.

1) Tristan’s Expected BABIP is calculated without regard for a pitcher’s defense or a batter’s speed. No wonder Jarrod Washburn had a low BABIP last year in Seattle (as Tristan points out), he was pitching in front of a dee that turned hits into outs. Sticking with Seattle, isn’t it clear from Ichiro’s career BABIP that his expected BABIP, calculated from the components of his AB, is wrong? In this context, what use is the expected BABIP? Maybe some, but since it tells us less than it promises, it seems a little dangerous.

2) Component stats are useful tools, but they are subject to random variation, too. Just because you’re measuring the type of hit by a batter or a pitcher doesn’t mean that the results will hew to the expected number of hits and outs. A small sample is a small sample, and there will be error. How much and in which direction is impossible to say, which is a good reason not to count on players regressing to the mean based on expected BABIP.

3) But they do. Robert Sikon, at, did some studies looking at 2008 BABIP and determined whether unlucky players improved the next year and lucky players batting averages declined. He reports that 64 percent of unlucky hitters improved the next year, and 90 percent of lucky hitters declined.

4) In 2008, Chris Dutton and Peter Bendix published at the an improved version of Expected BABIP. This was improved over Dave Studeman’s original formula, which was a rather simplistic Line Drive Percentage + .120. Dutton and Bendix ran regression analysis on years of data to determine which inputs were relevant and they claim their formula explains 39 percent of the variance in BABIP. They don’t publish the formula in this paper, however, so I don’t know how it has stood up, and can’t personally test it.  They do have a online tool to calculate xBABIP, which Derek Carty wrote about last year, but you have to enter the info by hand.

I think this BABIP work is really important and I’m glad smart people are working on solving it, but it seems worthwhile to point out that all conclusions are somewhat tentative at this point. We’re still working out how much genuine info is found in these data, and how much it will help us improve our projections, for isn’t that its real value?