Saturday, June 25, 2011

Near Miss

You may have heard that Asteroid 2011 MD will pass by Monday morning. And by pass by, we mean within 8,000 miles.

That's pretty close. How close? Well, have a look at these animations:

Asteroid 2011 MD Flyby

Sunday, June 12, 2011

An Update on the Average Major League Hitter

This is the promised update to my table of pretty good averages for everyday baseball players at the major league level. As before, I consider an everyday ball player to be one who was eligible for a league batting title in a given year, which from 1996-2010 means that he made at least 502 plate appearances in a given year.

The raw data for this study was taken from the mySQL database thoughtfully provided by the folks at Baseball-DataBank.org. As outlined in yesterday's post, I used mySQL to pull out the appropriate data, this time using the command:

select b.yearID as Year, m.nameLast as Last, m.nameFirst as First,
b.teamID as TEAM, b.G, b.AB+b.BB+b.HBP+b.SH+b.SF as PA, b.AB, b.R, b.H, b.2B,
b.3B, b.HR, b.RBI, b.SB, b.CS, b.BB, b.SO, b.IBB, b.HBP, b.SH, b.SF, b.GIDP
from Batting b inner join Master m
where b.playerID=m.playerID and b.yearID>1995 and
b.AB+b.BB+b.HBP+b.SH+b.SF > 502
order by b.yearID ASC, m.nameLast, m.nameFirst;

to get a list of all players from 1996 on who were eligible for a batting title. I then used mysql-query-browser to export all of the data into a spreadsheet, and there computed all of the averages and standard deviations. All the calculations are the same as in my original post, except:

  • I added the batting data for the 2009 and 2010 seasons.
  • My original post fraked up David Smyth's Base Runs statistic. I used the right formula (the second one on the page), but miscalculated total bases by forgetting that doubles, triples, and home runs are already counted as hits. So the numbers found in my earlier study are too high.
  • I dropped 1995 from the study this time because only 144 games were scheduled for each team, so the batting eligibility criterion was 144 × 3.1 = 447 plate appearances. Just lazy.

Saturday, June 11, 2011

Take Me Out to the SQualL Game

As some of you know, I have a moderate interest in Baseball and Baseball statistics. However, I'm not a database programmer, and the little bit of DB manipulation I've looked at has left me hopelessly confused.

Several years ago I did buy a copy of Baseball Hacks, but I never really got started with it. Last weekend I had some time on my hands and started playing around with it. Turns out you actually have to try to do something with a program in order to learn it (who'da thunk?). And it also turns out the Baseball Hacks, even though it's an O'Reilly book, is pretty oriented to Windows/Microsoft Access, although it does have substantial hints for Linux and Mac users, not to mention an online collection of scripts from the book (ZIP file), along with some other stuff.

So what should we do? As a first shot, how about updating my table of averages for several modern baseball statistics to include 2009 and 2010? I did the previous tables by doing some judicious editing of the Batting.csv file in Sean Lahman's baseball database, but the folks at Baseball-Databank.org have all of the data packaged neatly into a mySQL database (ZIP file), so we'll use that.

So as a start, we're going to

  • Install appropriate parts of the mySQL database program in Ubuntu,
  • Set it up to read the database file,
  • Find the eligible batters for 2009 and 2010,
  • Get their batting data into a spreadsheet,
  • Find the appropriate averages, runs created, etc., and
  • add the results to the appropriate tables.

That should be enough for one day. It's going to be a long journey, though, so when you've got some time join us after the break.