a baseball stats database and search engine

I received a press release today about enth.com, another database of baseball stats since 1871. The interesting things they’ve got going:

They’ll have daily updates with inseason stats from Stats Inc., and

They have developed a natural language query to get at the data.

Which means you can ask Which player hit the most Home Runs in 2006 and played more than 20 games at shortstop, and get a list of the top 10. The problem right now is if you ask the same question but substitute 2003 in the query you get no answer.

But that’s what beta software is for. Will this prove to be a better approach than those used at baseball-reference.com and baseballmusings.com? I think I’m always going to prefer a forms based system, but I’m not ruling out the possibility. And it’s always great to see smart people giving us a chance to explore new systems.

Good luck, enth.com.

HitTracker :: Home run tracking and distance measurement

HitTracker :: Home run tracking and distance measurement

Greg Rybarczyk, an engineer with an interest in the flight of major league baseballs, has created a rather amazing trove of information about last year’s home runs, including his special formula for determining the actual (he calls it “true”) distance a ball would have flown if it didn’t land above grade in the seats or hit a light standard or the glass wall of a right field dining establishment.

This sort of information is very important when we look at other people trying to track the steroid era or the juiced ball era or what have you based on homer distances.

Greg includes his weather and altitude correctors so that other adjustments can be made.

I’m not really sure how much real value this is going to have in its present form, but I hear that he’s hoping to enter this information for all batted balls in 2007. While that will duplicate at least some of the information that Baseball Info Solutions is keeping, it’s hard to argue that we don’t want multiple sources for what is inevitably less-than-objective data.

We’ll leave it to the next generation to figure out how to get all the data keepers interested in sharing.

Marcel 2007

The World’s Greatest Baseball Performance Projection System

Those of us who spend a lot of time making baseball projections have to tip our cap to Marcel the Projection Monkey. Marcel doesn’t spend that much time, even though time doesn’t mean that much to monkeys.

The point of Marcel the Monkey projections is that most of the information a projection system can contain is available in three year weighted numbers, with factors that adjust for age.

Those of us who make better projections (usually) than Marcel have to be humbled by how close he comes to besting us.

Should be: The Indispensible Baseball Musings

Baseball Musings

DAVID PINTO WROTE: “Update: Jason Marquis is allowed to take a beating for the second time this year. He gives up two hits in the sixth before he comes out of the game. Just to finish his night off, the bullpen allows the runner they inherited from Jason to score. He’s charged with 12 runs. He allowed 14 against the White Sox earlier this season. Almost 30% of the runs Marquis allowed this year came in those two games.”

Pinto has created a baseball news site with fantasy relevance, excellent data tools, and it’s all free. Unless you do the right thing and pony up some cash, if you feel the way I do. I sent money last year and I’m not bragging, it wasn’t really enough. So I’m sending more this year.

Highly recommended.

As for Marquis, he’s killing me. Or Tony LaRussa is. I’d been riding the matchups the last couple of weeks (since the last time he was left in to take a beating) and it’s worked out well, so I didn’t see the spot to dump him. Mercy.

Midseason Fantasy Prices

MLB.com Fantasy

As usual, my midseason fantasy prices (single and mixed leagues) are up at mlb.com. Even if you don’t play by the exact rules you can make use of them by comparing the preseason prices (guesses) and the actual thing. Most surprising to me is just how valueless Dontrelle Willis’s season has been so far, but pitching prices swing wildly when they run hot or cold.

Baseball Buzz Bot!


I don’t think I’ve given proper respect to Ballbug, an excellent news feed aggregator. Ballbug collects the big baseball stories of the day from the mainstream media, and augments them with a healthy collection of related blog entries. It sometimes takes a little longer than I would like for the latest news to cycle to the top of the page during the day, at least sometimes, but it’s a great place to start your daily baseball reading. Highly recommended.

David Appelman: Pujols’ hot spots

SI.com – MLB  Wednesday May 17, 2006 12:26PM

When I was a boy perhaps the most influential thing I read was the issue of Sports Illustrated excerpting Ted Williams’ book about hitting, Science of Hitting. Most notably, a chart that showed his batting average when the ball was thrown in each spot in the strike zone and out.

I’m a little embarrassed now that I have no idea how the data for that chart was compiled and whether it was even real. Collecting such data in the early 60s was a lot harder than it is today. David Appelman is one of a growing number of baseball analysts who are drawing on the ever expanding trove of data Baseball Info Solutions has been collecting, and his Fangraphs.com site has been linked to here before.

These hitter charts are of interest, of course, but it seems to me that they tell the wrong half of the story. Player performance isn’t a constant, and wouldn’t it be really interesting to be able to see the distribution of pitches when Adrian Beltre was going bad and compare it to when he was going good?

The other thing that should be noted is that BIS derives most if not all of it’s data off of television broadcasts. While I trust that a reporter’s mark showing where the ball crossed the plate will be sort of accurate (and I believe the company employs multiple reporters for each game), there are plenty of reasons to suspect that they won’t be pinpoint. And if the analysis is meant to show scintillating differences in performance based on pitch location (remember, that the camera distance and angle is different in every ballpark), the noise of subjective judgement is likely to wipe out the little differences.

This isn’t to derogate Appelman’s work, or to impugn the value of what BIS is doing. But it is important to remember that better and more finely grained data isn’t necessarily objective data. Enjoy these excellent visuals, and imagine what they tell us about these hitters, but don’t imagine this is the end. In some ways it is just the beginning.

Games played streak ends after Matsui breaks wrist

ESPN.com – MLB

One of the reasons I went after Hideki Matsui this year in the American Dream League (AL only) is because he’s been so consistent. Reliable. No more. Which got me thinking about a couple of attempts to gauge reliability that have surfaced in recent years.

One of these is Sig Mejdal’s Injury Projections. Mejdal list percentages of chance for players to get hurt. His Top 10 Hitters Most Likely to Get Hurt (published last November) was: Griffey (just getting off DL), Jordan (not yet), Cliff Floyd (playing like it), Gary Sheffield (on DL), Rondell White (playing like it), Sammy Sosa (retired), Reggie Sanders (gets off DL tomorrow), Jose Valentin (29 AB so far), Alomar Jr (back spasms and shoulder pain because he could’t play every fifth day), Geoff Jenkins (okay so far).

On the pitching side: Kerry Wood (like fish in a barrel), Orlando Hernandez (DL), Wade Miller (DL), Carl Pavano (DL), Jaret Wright (only 16 IP so far), Oscar Villareal (healthy), Randy Wolf (DL, out for season), Matt Mantei (DL), Rudy Seanez (ineffective, but pitching), Brad Penny (sharp).

That’s a lot of hits so far, but I’m not sure how useful that is.

Ron Shandler gave Matsui an 88 reliability score, on a scale of 100, reflecting his consistent performance and health over the past three years. Ken Griffey, on the other hand, scored a 7. Nomar? 22. All will have spent time on the DL this spring.