Science – Ask Rotoman

The lede of this Daily Beast story gets it right: There were two big winners on election night. I don’t recall writing much about Barack Obama on this site in the past, but Nate Silver has made regular appearances over the years, first because of his PECOTA baseball projection system, and then because of his efforts to clarify political polling.

For all the discussion about Nate’s innovations in baseball projection and political polling, one rather significant point has been missed: Nate Silver is much more a marketing guy than a statistician.

In fact, you’ll find critics all over the web who point out that Silver isn’t a statistician at all. But they’re missing the point. What Silver did with PECOTA and fivethirtyeight.com (now fivethirtyeight.blogs.nytimes.com) was to present fairly mundane “projections” and “polls” in an invigorating and easily digested way.

With PECOTA, Silver created fairly traditional weighted-average player projections, similar to Tom Tango’s famous MARCEL projections (so simple to compute they’re named after the monkey on Friends). These are solid middle-of-the-road projections. But Silver went one step farther. He then compared each player to historically similar players and used those similar players’ historical outcomes to create a wide range of possible projections (plus percentage calls for a player to Breakout or Fail) for each current player. He then assigned confidence intervals for the various outcomes, which brilliantly turned the language of predicting on its ear.

Rather than say, “the predictive model failed to account for half the home runs Player X hit,” Nate could say, “Player X hit the 20th percentile of his home run projection, perhaps because pitchers discovered he was slow identifying sliders and saw a steady diet of those all season long.” Suddenly, the predictive model was a benchmark to help identify aberrant player performance, not a faulty prediction.

Sidenote: A great deal of baseball player performance is determined by luck, so all player projections are going to deviate widely from actual performance. Accounting for that deviation while propping up the projection itself was a brilliant stroke.

With 538, Silver did something that was so obvious that others were already doing it–averaging public opinion polls. He also managed to create not only a rather successful business, but also transformed the way people are looking at journalism these days.

No doubt part of his success with 538 was the hard work he put in finding good weights for each of the polls he sampled, but his real innovation was the creation of the Chance of Winning numbers. Chance of Winning is both an easily digested number that tells you something concrete over time, in the Chance of Winning graph, and in the moment, when it lets you know the current odds that a candidate will win on election day.

On election morning this year, Silver gave President Obama a 91 (actually 90.7) percent chance of winning. He says this number is derived by running simulations, which I think must be random resets of each state’s results inside the margin of error for all the state polls he collects (I haven’t seen this process explicitly described, though it may well have been). This is a clever way to create a horse-race number out of a lot of small-differences-in-the-states contests.

There is nothing statistically bold about either PECOTA or 538, but there is lots that is informationally clear and valuable about both. That’s right in line with Silver’s thinking about predicting future events, as he makes clear in his new book, The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t.

Silver’s interest is in identifying and isolating the knowable empirical information in a system, be it baseball, political voting, Oscar voting, or real estate preference, and then creating a model that objectively weights discrete values so that changing conditions lead to useful predictive outcomes. The most interesting thing about this is that Silver is completely upfront about the limitations. In many cases, as he details in the book, there is not enough signal to escape the noise’s gravity. That doesn’t mean we shouldn’t try to make predictions, or figure out what useful information is known about a system, but that we should honestly detail the limits we’re dealing with. Transparency, up to a point, is king.

Which seems to me remarkably clearheaded and honest and kind of brave, because contingent thinking and analysis is often looked at as dull or unimportant. Shades of gray, except when there are 50 of them, can be soporific, but Silver (almost a shade of gray himself, namewise) is usually a clear and energetic enough writer and correspondent to make his book a pleasure, if you’re ready to hear that there are limits to predictive systems. If you’re not, you should think again, because Silver’s big point is about how much we don’t know.

Which means that most of the noise about his achievements is because he presents such a clean signal. That’s the marketer in him, an affable everyman who isn’t afraid to look like a nerd (maybe he can’t help it, maybe it’s part of his method), who has figured out ways to popularize his way of looking at the data. It doesn’t hurt that he’s careful to make sure that his numbers add up.

Nate Silver Really is a Data God.

The break of the curveball