But I Regress…

Dave Studeman — The Hardball Times

I always thought that Regression Analysis took its name from the fact that you start with the outputs and determine by regressing to see how important the inputs are, though I now have no idea whether that’s based on anything but my own magical thinking.

Dave Studeman’s story here is fascinating for the historical content (which has nothing to do with baseball, a little to do with statistics and much to do with other things) and because he does such clear work showing the dynamics of regression to the mean (which may well be the origin of the term) as they pertain to baseball.

At the end he references a story by Chone Smith about player projection which turns out to be an interesting rabbit hole in its own right, but that’s for another time.

Marcel 2007

The World’s Greatest Baseball Performance Projection System

Those of us who spend a lot of time making baseball projections have to tip our cap to Marcel the Projection Monkey. Marcel doesn’t spend that much time, even though time doesn’t mean that much to monkeys.

The point of Marcel the Monkey projections is that most of the information a projection system can contain is available in three year weighted numbers, with factors that adjust for age.

Those of us who make better projections (usually) than Marcel have to be humbled by how close he comes to besting us.

The Official Matsuzaka Prediction Thread

BBTF’s Newsblog Discussion

Somebody will average these all out, dropping the guy who’s predicting 29 wins (and a 1-0 loss on the last day of the season). I remember that last year around this time we were having similar discussions about Johjima (except we weren’t sure if his last name had an “h” or not). I’ve been working on an updated set of predictions for all players (not just positively valued ones) for a game mlb.com is putting out, and discovered a little too late that I didn’t have a Dice-K projection (he was far from signed when the magazine went to bed). So, here it is:

200 IP, 3.74 ERA, 17 wins, 9 losses, 55 walks, 180 strikeouts, 23 homers, 1.21 WHIP.

That’s a tweener, probably worth about $25 in an AL only league. I think he could be much better, but injury risk and the real chance that he’s not going to dominate would cause me not to chase him. I have him in the magazine at $14, but now that he’s signed and the adrenaline is flowing I’d go to $18 probably. Depending on what we see in spring training.

Milledge caught looking

New York Daily News

I saw Lastings Milledge play two games in the Arizona Fall League last November, and the observation I came away with was that he was not as fast as advertised. Obviously a couple games here, another game there, with the youngster getting caught stealing and thrown out at the plate a couple times will shape your opinions. And maybe he did spend too much time spectating on this play. But I think anyone expecting five-category production out of Milledge is going to be disappointed. He’s not as fast as he looks like he should be.

Games played streak ends after Matsui breaks wrist

ESPN.com – MLB

One of the reasons I went after Hideki Matsui this year in the American Dream League (AL only) is because he’s been so consistent. Reliable. No more. Which got me thinking about a couple of attempts to gauge reliability that have surfaced in recent years.

One of these is Sig Mejdal’s Injury Projections. Mejdal list percentages of chance for players to get hurt. His Top 10 Hitters Most Likely to Get Hurt (published last November) was: Griffey (just getting off DL), Jordan (not yet), Cliff Floyd (playing like it), Gary Sheffield (on DL), Rondell White (playing like it), Sammy Sosa (retired), Reggie Sanders (gets off DL tomorrow), Jose Valentin (29 AB so far), Alomar Jr (back spasms and shoulder pain because he could’t play every fifth day), Geoff Jenkins (okay so far).

On the pitching side: Kerry Wood (like fish in a barrel), Orlando Hernandez (DL), Wade Miller (DL), Carl Pavano (DL), Jaret Wright (only 16 IP so far), Oscar Villareal (healthy), Randy Wolf (DL, out for season), Matt Mantei (DL), Rudy Seanez (ineffective, but pitching), Brad Penny (sharp).

That’s a lot of hits so far, but I’m not sure how useful that is.

Ron Shandler gave Matsui an 88 reliability score, on a scale of 100, reflecting his consistent performance and health over the past three years. Ken Griffey, on the other hand, scored a 7. Nomar? 22. All will have spent time on the DL this spring.

This Week’s Ask Rotoman

Major League Baseball Fantasy

I didn’t anticipate that with Ian Kinsler going down with a bum finger that Gary Matthews would be called up right away. Knowing what I know now, especially given Matthews’ three-run triple tonight, take him over the recommended Kevin Millar. . . Wait! I recommended Millar instead of Matthews and Millar hit two homers, drove in four runs. Sweet! As usual, you should sort it all out for your league.

Elsewhere in this week’s model, frank discussion about Jeff Francoeur, the breakout (break down?) of a trade of big players (for educational purposes only), and some chatter about some Hots and Nots.

Be there! As some of us used to say in college.

Ask Rotoman!

Major League Baseball : Fantasy : Fantasy

Dear Rotoman:

While I have a great deal of respect for anyone that
compiles as much data for baseball fans and fantasy
addicts as you, I did want to point out some major
flaws with your projections for 2006 regarding
pitchers’ statistics. I am assuming you use some sort
of stat compiler or program you invented to project
season statistics. Of the pitchers you estimated
statistics for, you’ve projected only eight MLB
pitchers totaling 14 wins or more in 2006. Was this an
oversight? Colon finishes with 16, Oswalt and Santana
with 15 each, and Suppan, Sabathia, Lee, Prior and
Halladay with 14. I understand that its difficult to
pay individual attention to each pitcher, but there
must be a way to plug in the frequency of 20-game
winners, 19-game winners, 18-game winners, etc., into
whatever formula you are using to project stats. It’s
unrealistic to project such marginal stat lines, and
its a disservice to your fans to kick out these
figures like a robot, especially if you were being
intentionally conservative. SOMEBODY in the majors is
going to win at least 17 games. I’m willing to bet my
life savings on it. Wouldn’t you?

“Go for Broke”

Dear Go:

Absolutely, someone will win 18 games this year, but what good does it do to project the wrong guy to win 20 or 19 or 18 games?

It’s fun to model the whole year so it looks like the whole year (with 20 game winners and guys with 50+ homers) but it also means being wrong more often and by a larger amount, which doesn’t do anyone playing our game much good.

My projections are based on regressions of historical data modified by a few factors, the biggest one being age, combined with a different set of rate calculations, all of which are combined with a mechanical estimate of times at bat. I then go in and adjust based on probably changes in playing time and assessments of talent that don’t seem to be reflected in the projection. I call this tweaking, and it makes the boring regressions a little more lively and the overall correlation of the projections a good bit higher.

Then I scale everything so that the top 400 players projections are similar in total to the top 400 players actual stats. But the extremes in each stat just aren’t there, because the leaders in all categories usually change from year to year.

Thanks for writing,
Rotoman

Ps. I would not bet my life savings on the accumulation of any counting stat in a year when the Basic Agreement between the players and owners expires.

Rotoman!

You also have the MLB HR leader with 39 home runs.
When was the last time NOBODY hit 40 home runs?

“GFB”

Dear Go:

One of the byproducts of the system is that the AB of regular players are reduced about 10 percent off their usual peaks, with a similar reduction in the other stats. So all the guys who would project to 43 homers end up at 39 or 38.

The reason the AB get reduced is because about 10 percent of expected AB from year to year are lost to injury or other problems. This doesn’t happen evenly, but it happens consistently. There are two ways to handle this. One would be to ignore it and give all players their full measure of AB and stats. The other is to scale the pool of projected stats to the actual stats that will be produced by the pool.

I do the latter, because it gives a more accurate assessment of how groups of players perform on the year. The other looks better when everyone stays healthy, but since they never all do my method measures as more accurate.

Peter

Major League Baseball : Rotoman’s Projections

Major League Baseball

MLB.com has been running my player projections for the past five years, usually a set at the end of February and an update just before the start of the season. This year they asked for more categories (doubles, caught stealing, among others) and for a set at the end of January, which I delivered. And then nothing. I was scheduled to deliver an update the first week of March, but in all the busy-ness of things didn’t get to it until last week, when I also finally asked my editor what happened to the first set of projections.

It turns out they’re being used in a game. And now for the first time the MLB.com Rotoman projections are posted at mlb.com, along with bid values for 4×4, 5×5, and mixed leagues. There will be an update March 29, for posterity’s (and late drafters’) sake.

New Version of Patton $ on Disk

Support Ask Rotoman Page

A weekend of bug squishing led to the release of an updated and improved Patton $ on Disk.

The program includes updated projections, bid prices from me and Alex Patton for 4×4 and Mike Fenger for 5×5. It is a great program for sorting lists and pricing players in the traditional 4×4 and 5×5 formats. The ease of updating projections and prices, the auction manager with bid values and all that make it useful for smaller mixed formats, too, but the pricing is not adjustable.

There is also an Excel worksheet with all the data available, and text and Word files will be out tonight.

The price: $25.