Friday, February 22, 2013

Pitcher's In-Game Peak

With the new season right around the corner, I can't manage to keep my brain from wandering towards baseball. So - after downloading 2012 play-by-play data from Retrosheet - I'm ready to tackle an idea I've toyed with for some time: pitchers' in-game peaks.

In an article I wrote about a year and a half ago I looked into the ability of some pitchers to start a game as strongly as one can: with a no-hitter. Rather than look at elite pitchers who start games exceptionally strong, I wanted to analyze at what point in the game each individual pitcher peaks. The idea of a pitcher being a strong starter or a strong finisher is certainly not a new one and oftentimes an announcer will make mention of a starting pitcher "hitting his stride" at some point in the middle innings. I wanted to develop a way to quantify this effect and to play around with the results.

This is the first of (probably) several posts exploring this effect. For starters, we'll just look at the 2012 season and include all pitchers who started at least 20 games. To find the peak point in each game, I thought 7 consecutive batters sounded about right. This is approximately 2 innings, which when I think of a pitcher hitting his stride seems to fit with how the phrase is typically used. So for every game a pitcher pitched, the stats for every 7-batter period were calculated then added together for the entire season. As an example, the aforementioned Justin Verlander's breakdown looks like this:

BattersStats
1-755.2 IP, 48 H, 18 R, 17 ER, 18 BB, 52 SO, 2.75 ERA,1.186 WHIP,.230/.288/.378
2-858.1 IP, 42 H, 15 R, 14 ER, 17 BB, 59 SO, 2.16 ERA,1.011 WHIP,.200/.258/.286
3-959.0 IP, 42 H, 15 R, 14 ER, 15 BB, 65 SO, 2.14 ERA,0.966 WHIP,.197/.248/.277
4-1059.0 IP, 44 H, 17 R, 16 ER, 13 BB, 67 SO, 2.44 ERA,0.966 WHIP,.204/.248/.301
5-1160.1 IP, 40 H, 14 R, 13 ER, 11 BB, 68 SO, 1.94 ERA,0.845 WHIP,.185/.227/.264
6-1258.2 IP, 45 H, 19 R, 19 ER, 12 BB, 63 SO, 2.91 ERA,0.972 WHIP,.208/.253/.338
7-1359.0 IP, 45 H, 19 R, 19 ER, 11 BB, 64 SO, 2.90 ERA,0.949 WHIP,.207/.249/.341
8-1457.2 IP, 49 H, 22 R, 22 ER, 12 BB, 62 SO, 3.43 ERA,1.058 WHIP,.227/.271/.389
9-1556.1 IP, 53 H, 23 R, 22 ER, 12 BB, 59 SO, 3.51 ERA,1.154 WHIP,.247/.288/.409
10-1655.1 IP, 59 H, 25 R, 24 ER, 10 BB, 55 SO, 3.90 ERA,1.247 WHIP,.271/.304/.440
11-1755.2 IP, 56 H, 21 R, 20 ER, 12 BB, 55 SO, 3.23 ERA,1.222 WHIP,.259/.300/.398
12-1856.1 IP, 58 H, 23 R, 22 ER, 12 BB, 50 SO, 3.51 ERA,1.243 WHIP,.266/.303/.422
13-1957.2 IP, 51 H, 20 R, 19 ER, 13 BB, 51 SO, 2.97 ERA,1.110 WHIP,.236/.281/.338
14-2055.1 IP, 56 H, 21 R, 18 ER, 14 BB, 48 SO, 2.93 ERA,1.265 WHIP,.260/.307/.367
15-2157.0 IP, 50 H, 18 R, 14 ER, 15 BB, 49 SO, 2.21 ERA,1.140 WHIP,.235/.290/.305
16-2257.1 IP, 47 H, 19 R, 15 ER, 17 BB, 47 SO, 2.35 ERA,1.116 WHIP,.222/.286/.297
17-2358.2 IP, 43 H, 21 R, 17 ER, 16 BB, 48 SO, 2.61 ERA,1.006 WHIP,.202/.264/.296
18-2458.2 IP, 43 H, 24 R, 20 ER, 15 BB, 47 SO, 3.07 ERA,0.989 WHIP,.201/.260/.327
19-2557.0 IP, 41 H, 22 R, 17 ER, 15 BB, 49 SO, 2.68 ERA,0.982 WHIP,.194/.254/.308
20-2656.2 IP, 42 H, 17 R, 12 ER, 13 BB, 51 SO, 1.91 ERA,0.971 WHIP,.201/.251/.301
21-2757.0 IP, 35 H, 14 R, 11 ER, 11 BB, 54 SO, 1.74 ERA,0.807 WHIP,.173/.223/.267
22-2853.1 IP, 36 H, 11 R, 9 ER, 10 BB, 55 SO, 1.52 ERA,0.863 WHIP,.185/.228/.277
23-2948.2 IP, 35 H, 12 R, 9 ER, 8 BB, 53 SO, 1.66 ERA,0.884 WHIP,.192/.230/.280

Besides demonstrating in yet another way that Justin Verlander is a pretty good pitcher, his results are very interesting in that his performance to start and end each game is much stronger than his mid-game showing. Is this a trend for him year-to-year? No idea... at least not yet; seems like a good question to investigate down the road. As a quick note on where to stop the anaysis for each pitcher: I found the last batter that a pitcher made it to in at least half of his starts and used this as his endpoint. Verlander's last batter analyzed was #29, indicating that he pitched to at least 29 batters in more than half of his starts but pitched to at least 30 batters in less than half of his starts. This seemed like a good cut-off. 

Now that we have the 7-batter summaries for each pitcher, we can find their best 7-batter range. "Best," of course, is highly subjective and depends on what statistic(s) we use. I toyed with using Runs Created but settled on FIP (Fielding Independent Pitching) as the single number to use. The best 7-batter range, then, is the one with the best FIP. FIP is an estimate of what a pitcher's ERA should have been based on only those things that a pitcher has been proven to have control over: home runs, strikeouts and walks. While imperfect, I settled on it over raw ERA since we're dealing with such small sample sets and also because at the end of outings, ERA becomes skewed by a good or bad bullpen and I wanted a number to isolate the starting pitcher's performance only.

Without further delay, I'll present the results. I intend to refine this method a bit and explore historical results (with many thanks to Retrosheet's tireless work, we're looking at a good chunk of play-by-play data all the way back to 1945 now... huzzah!) in the future but this will serve as a good introduction. In the results tables below, you will find:
- The pitcher's name
- Their peak batter range
- How many runs better (using FIP) this pitcher was during their peak than the season as a whole:
- How many total starts that pitcher made
- The last better analyzed for that pitcher. ie: The last batter the pitcher made it to in at least half of his starts.

Early Starters (Best stretch = Batters 1 to 7)
Pitcher
Peak Batters
Runs Better
Starts
Last Batter
Tommy Hunter
1-7
1.50
20
26
Jon Lester
1-7
1.28
33
27
R.A. Dickey
1-7
1.24
33
28
Matt Moore
1-7
1.08
31
25
Marco Estrada
1-7
1.08
23
24

Strong Closers (Best stretch = Last 7 batters faced)
PitcherPeak BattersRuns BetterStartsLast Batter
Kyle Kendrick20-261.342526
Jose Quintana20-260.772226


In addition to their peak period, I also determined their valley period: the 7-batter range in which each pitcher struggled the most.

Slow Starters (Worst stretch = Batters 1 to 7)
PitcherWorst BattersRuns WorseStartsLast Batter
Kyle Kendrick1-7-2.312526
Jeff Francis1-7-1.962421
Yovani Gallardo1-7-1.363327
Jose Quintana1-7-1.242226
Barry Zito1-7-1.223226
James McDonald1-7-1.152924
Cliff Lee1-7-1.063029
Homer Bailey1-7-1.013328
Gavin Floyd1-7-1.002926
Brandon Morrow1-7-1.002126
Josh Johnson1-7-0.903126
Doug Fister1-7-0.832626
Jaime Garcia1-7-0.792027
Kevin Correia1-7-0.752823
Alex Cobb1-7-0.582326


An intriguing (to me) idea I have here is to use this data as a way of judging a manager's skill in pulling a pitcher at the right time. That Zach McAllister was over three and half runs per game worse during the last seven batters he faced probably indicates that Terry Francona's hook should be a little quicker than Manny Acta's was.

Bad Finishers (Worst stretch = Last 7 Batters Faced)
PitcherWorst BattersRuns WorseStartsLast Batter
Zach McAllister19-25-3.622225
Chris Young19-25-2.722025
Luis Mendoza21-27-2.242527
Trevor Cahill20-26-1.683226
Felix Doubront19-25-1.582925
Marco Estrada18-24-1.482324
Derek Lowe21-27-1.452127
Joe Blanton21-27-1.433027
R.A. Dickey22-28-1.343328
Jordan Lyles19-25-1.342525
Rick Porcello20-26-1.193126
Wei-Yin Chen20-26-1.043226
Bruce Chen19-25-0.963425
Jason Marquis20-26-0.962226
Scott Diamond21-27-0.862727
Gio Gonzalez19-25-0.823225
Michael Fiers19-25-0.732225
Kevin Millwood19-25-0.582825
Jeremy Hellickson18-24-0.573124
Blake Beavan19-25-0.392625

All pitchers, 2012:
Pitcher
Peak Batters
Runs Better
Worst Batters
Runs Worse
Starts
Last Batter
Henderson Alvarez
5-11
1.15
19-25
-0.80
31
26
Bronson Arroyo
3-9
0.65
10-16
-1.28
32
27
Homer Bailey
6-12
1.08
1-7
-1.01
33
28
Blake Beavan
14-20
1.93
19-25
-0.39
26
25
Josh Beckett
2-8
1.12
10-16
-1.34
28
27
Erik Bedard
3-9
0.90
10-16
-1.77
24
24
Chad Billingsley
9-15
1.05
2-8
-0.74
25
26
Joe Blanton
14-20
1.29
21-27
-1.43
30
27
Clay Buchholz
22-28
0.98
2-8
-1.56
29
29
Mark Buehrle
14-20
0.51
19-25
-1.10
31
27
Madison Bumgarner
5-11
0.92
20-26
-1.25
32
27
A.J. Burnett
13-19
0.93
20-26
-1.47
31
28
Trevor Cahill
5-11
1.71
20-26
-1.68
32
26
Matt Cain
17-23
0.81
21-27
-1.10
32
28
Chris Capuano
6-12
0.78
19-25
-1.32
33
26
Wei-Yin Chen
5-11
0.96
20-26
-1.04
32
26
Bruce Chen
14-20
1.38
19-25
-0.96
34
25
Alex Cobb
14-20
0.77
1-7
-0.58
23
26
Bartolo Colon
15-21
0.70
9-15
-1.11
24
28
Kevin Correia
7-13
1.57
1-7
-0.75
28
23
Johnny Cueto
20-26
0.44
13-19
-0.78
33
27
Yu Darvish
3-9
0.94
10-16
-1.05
29
28
Ryan Dempster
3-9
1.43
10-16
-1.63
28
27
Ross Detwiler
16-22
0.71
10-16
-0.55
27
24
Scott Diamond
6-12
0.94
21-27
-0.86
27
27
R.A. Dickey
1-7
1.24
22-28
-1.34
33
28
Felix Doubront
5-11
0.99
19-25
-1.58
29
25
Nathan Eovaldi
15-21
1.44
10-16
-0.98
22
25
Marco Estrada
1-7
1.08
18-24
-1.48
23
24
Scott Feldman
12-18
0.98
4-10
-2.10
21
25
Michael Fiers
4-10
0.69
19-25
-0.73
22
25
Doug Fister
17-23
0.59
1-7
-0.83
26
26
Gavin Floyd
14-20
0.84
1-7
-1.00
29
26
Jeff Francis
14-20
1.77
1-7
-1.96
24
21
Yovani Gallardo
16-22
2.08
1-7
-1.36
33
27
Jaime Garcia
17-23
0.70
1-7
-0.79
20
27
Gio Gonzalez
15-21
0.85
19-25
-0.82
32
25
Zack Greinke
15-21
0.82
20-26
-0.67
34
27
Jeremy Guthrie
18-24
1.93
11-17
-2.98
29
26
Roy Halladay
6-12
0.85
20-26
-1.97
25
27
Cole Hamels
5-11
0.55
16-22
-1.01
31
28
Jason Hammel
15-21
0.69
8-14
-0.47
20
24
Tommy Hanson
5-11
1.27
10-16
-1.51
31
24
J.A. Happ
14-20
1.15
2-8
-1.15
24
24
Aaron Harang
5-11
0.91
11-17
-1.08
31
26
Dan Haren
10-16
0.75
3-9
-0.91
30
25
Lucas Harrell
4-10
0.91
10-16
-1.36
32
26
Matt Harrison
3-9
0.67
21-27
-0.99
32
28
Jeremy Hellickson
5-11
0.42
18-24
-0.57
31
24
Felix Hernandez
5-11
0.72
15-21
-0.52
33
29
Luke Hochevar
14-20
0.89
7-13
-1.55
32
26
Derek Holland
2-8
1.46
20-26
-2.14
27
28
Tim Hudson
3-9
0.88
20-26
-1.62
28
27
Phil Hughes
4-10
1.39
9-15
-1.59
32
27
Tommy Hunter
1-7
1.50
13-19
-3.35
20
26
Edwin Jackson
4-10
1.03
11-17
-1.11
31
27
Ubaldo Jimenez
4-10
1.17
11-17
-0.86
31
26
Josh Johnson
18-24
0.84
1-7
-0.90
31
26
Kyle Kendrick
20-26
1.34
1-7
-2.31
25
26
Ian Kennedy
3-9
1.03
10-16
-1.59
33
28
Clayton Kershaw
4-10
0.98
11-17
-1.38
33
28
Hiroki Kuroda
14-20
0.87
20-26
-1.15
33
27
Mat Latos
5-11
1.52
19-25
-1.03
33
26
Mike Leake
4-10
1.26
11-17
-1.52
30
27
Cliff Lee
14-20
1.01
1-7
-1.06
30
29
Jon Lester
1-7
1.28
11-17
-2.38
33
27
Tim Lincecum
15-21
0.68
2-8
-0.84
33
26
Francisco Liriano
7-13
1.13
3-9
-0.94
28
23
Kyle Lohse
15-21
1.03
11-17
-0.59
33
26
Derek Lowe
5-11
1.40
21-27
-1.45
21
27
Jordan Lyles
15-21
1.23
19-25
-1.34
25
25
Lance Lynn
4-10
1.34
10-16
-1.17
29
25
Paul Maholm
3-9
0.79
11-17
-1.58
31
26
Shaun Marcum
15-21
0.90
2-8
-1.16
21
26
Jason Marquis
5-11
1.87
20-26
-0.96
22
26
Justin Masterson
2-8
1.47
12-18
-1.56
34
27
Zach McAllister
11-17
1.89
19-25
-3.62
22
25
James McDonald
5-11
1.26
1-7
-1.15
29
24
Luis Mendoza
6-12
0.85
21-27
-2.24
25
27
Wade Miley
5-11
1.33
16-22
-1.61
29
27
Kevin Millwood
5-11
0.65
19-25
-0.58
28
25
Tom Milone
14-20
0.31
11-17
-1.28
31
26
Mike Minor
5-11
1.26
13-19
-1.22
30
24
Matt Moore
1-7
1.08
10-16
-1.46
31
25
Brandon Morrow
11-17
1.43
1-7
-1.00
21
26
Jonathon Niese
15-21
1.02
2-8
-0.94
30
27
Ricky Nolasco
5-11
0.75
10-16
-0.55
31
27
Bud Norris
6-12
0.79
13-19
-1.39
29
26
Ivan Nova
5-11
0.87
17-23
-0.98
28
27
Jarrod Parker
11-17
0.57
5-11
-0.87
29
27
Jake Peavy
14-20
1.02
21-27
-1.64
32
28
Drew Pomeranz
5-11
2.20
11-17
-3.30
22
19
Rick Porcello
5-11
1.07
20-26
-1.19
31
26
David Price
7-13
0.42
16-22
-0.95
31
27
Jose Quintana
20-26
0.77
1-7
-1.24
22
26
Clayton Richard
8-14
1.88
19-25
-2.05
33
28
Wandy Rodriguez
18-24
1.20
2-8
-2.32
33
27
Ricky Romero
9-15
1.26
3-9
-1.06
32
27
C.C. Sabathia
2-8
0.67
9-15
-1.17
28
30
Chris Sale
7-13
0.89
19-25
-1.85
29
27
Jeff Samardzija
17-23
1.09
10-16
-1.35
28
26
Anibal Sanchez
5-11
1.06
18-24
-0.98
31
27
Ervin Santana
14-20
1.70
8-14
-1.43
30
27
Johan Santana
7-13
2.02
10-16
-0.76
21
24
Joe Saunders
3-9
1.59
11-17
-3.09
28
27
Max Scherzer
12-18
0.29
16-22
-1.07
32
26
James Shields
22-28
0.67
16-22
-1.12
33
29
Stephen Strasburg
3-9
0.56
10-16
-1.12
28
24
Jason Vargas
5-11
1.57
16-22
-1.42
33
27
Justin Verlander
3-9
1.26
10-16
-0.90
33
29
Ryan Vogelsong
17-23
0.58
10-16
-0.84
31
26
Edinson Volquez
15-21
0.58
11-17
-0.69
32
26
Chris Volstad
16-22
1.49
10-16
-1.19
21
25
Adam Wainwright
6-12
1.06
11-17
-0.75
32
26
Jered Weaver
7-13
0.97
11-17
-0.89
30
27
Jake Westbrook
6-12
1.26
17-23
-1.02
28
27
Alex White
14-20
0.52
11-17
-0.75
20
21
C.J. Wilson
7-13
0.66
12-18
-1.05
34
27
Randy Wolf
7-13
1.00
19-25
-1.18
26
26
Travis Wood
3-9
0.85
10-16
-1.33
26
25
Vance Worley
4-10
2.27
14-20
-1.78
23
27
Chris Young
5-11
2.11
19-25
-2.72
20
25
Carlos Zambrano
16-22
0.51
19-25
-1.29
20
26
Jordan Zimmermann
4-10
0.62
16-22
-0.86
32
26
Barry Zito
17-23
1.00
1-7
-1.22
32
26

Tuesday, June 12, 2012

Wasted HRs

During a recent Tigers broadcast, the on-air conversation turned towards what has become a popular topic: possible explanations for Detroit's struggles. A graphic was shown indicating the percentage of Tigers home runs that were solo shots, with the implication that it was more than usual. I had no basis for what was a good or bad percentage so I decided to explore it a bit, resulting in this post. I'll start off using this year's Tigers team as an example but will move on to all-time rankings.

As of June 11th, the Tigers have hit 60 home runs in 2012. 40 of these were solo, 15 came with one man on, five came with two on and they are yet to hit a grand slam. In order to compare this percentage historically I decided to use two different measures: the percentage of all home runs that are solo and the average number of runs scored on each HR.
- Solo HR %: 66.7%
- Runs Per HR: 1.42

The next step was to go through all game logs from 1919-2011 to see what kind of numbers we'd expect a team to have. The following graph gives an idea of the trend over time. The data were smoothed by taking the three-year average of each year in order to eliminate some of the year-to-year noise. As we'd expect, Runs per HR and Solo HR % are very closely related:


The Tigers 2012 numbers (66.7% solo, 1.42 runs/HR) are much worse than we'd expect from the 2009-2011 MLB three-year average (58.2% solo, 1.58 runs/HR). But how much impact has this had on their run scoring? For the purpose of this article, we're not trying to determine why a team under-performs but to what extent. I would guess that the why could be explained by some combination of team-wide OBP, lineup construction and pure luck but we'll leave this question for another day.

To determine how the Tigers poor HR performance has impacted total run scoring, we'll first calculate the number of runs they've scored on HRs:

Detroit 2012 Runs on HRs = 40 + (15*2) + (5*3) = 85 Runs

Now, we'll compare this to how many runs we'd expect them to have scored based on the three-year AL average performance of 1.57 runs per HR.

Detroit 2012 Expected Runs on HR = 60 * 1.57 = 94.4 Runs

So Detroit's HRs have under-performed in 2012 by 9.4 runs. We can go further and convert these runs into wins by using baseball's pythagorean theorem to determine how many games a team should win based on runs scored and runs given up.

Detroit Expected Record with Actual Runs Scored (263) = 28.3 - 31.7
Detroit Expected Record with Expected Runs on HR (272.4) = 29.3 - 30.7


Therefore, we would have expected the Tigers to win one full game more had they scored the expected number of runs on HRs.

We could explain the Tigers 2012 HR performance thusly:
HR Runs vs. Expected (60 games): -9.4
HR Runs vs. Expected (projected full season): -25.4
HR Wins vs. Expected (60 games): -1.0
HR Wins vs. Expected (project full season): -2.7


Now, having defined our two metrics, let's look at historic results to see how Detroit compares. I took the process described above and applied it to every team's season from 1919-2011. Each team was compared to its three-year league average (AL and NL calculated separately). The +/- runs listed are the actual values while the +/- wins have been scaled to a 162 game season to put each team on a level playing field. I have listed the best 10 and worst 10 teams all-time by each measure. Since our +/- metric is cumulative, the top 10 list is dominated by recent teams (more HR = higher/lower cumulative values) so I have also provided the top 10 by rate.

If you'd prefer to explore the numbers yourself, you can find all team seasons in spreadsheet form here.

How do the 2012 Tigers compare?
- Runs/HR of 1.42 is bad enough to measure 34th worst all-time and worst since the 2001 New York Mets.
- Solo HR % of 66.7% is 55th worst all-time and worst since the 2005 Washington Nationals
- Run Differential at current pace of -25.4 would rank 9th worst all-time and worst since the 2009 Texas Rangers.
- Win Differential at current pace of -2.7 would rank 10th worst all-time and worst since the 2006 Devil Rays.

So while there are many factors contributing to the Tigers struggles, wasting home runs by hitting them without any baserunners is a not-so-insignificant one.

Worst Runs/HR

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Brooklyn Dodgers19211.2977.97%59-19.98-2.24-0.339
Cincinnati Reds19241.3375.00%36-11.20-1.27-0.311
Chicago White Sox19431.3372.73%33-8.19-1.02-0.248
Chicago White Sox19321.3369.44%36-11.66-1.22-0.324
Boston Red Sox19291.3671.43%28-9.16-1.04-0.327
Boston Red Sox19271.3671.43%28-8.60-0.95-0.307
Pittsburgh Pirates19571.3667.39%92-17.90-2.15-0.195
Houston Astros19801.3672.00%75-16.24-1.84-0.217
Philadelphia Phillies19441.3670.91%55-16.32-2.07-0.297
St. Louis Cardinals19911.3769.12%68-14.45-1.62-0.212


Best Runs/HR

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Washington Senators19351.9734.38%32+9.44+0.91+0.295
Pittsburgh Pirates19431.9538.10%42+13.90+1.56+0.331
Detroit Tigers19491.9242.05%88+21.69+2.24+0.246
Cincinnati Reds19311.9033.33%21+5.23+0.62+0.249
Milwaukee Brewers19951.9043.75%128+35.02+3.80+0.274
Cleveland Indians19261.8929.63%27+6.52+0.67+0.241
Cleveland Indians19231.8844.07%59+14.32+1.30+0.243
Boston Red Sox19191.8842.42%33+7.64+1.01+0.231
Brooklyn Dodgers19451.8838.60%57+12.62+1.25+0.221
Boston Red Sox19381.8746.94%98+15.26+1.37+0.156


Worst Solo HR %

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Brooklyn Dodgers19211.2977.97%59-19.98-2.24-0.339
Boston Red Sox19201.4177.27%22-5.24-0.60-0.238
Cincinnati Reds19241.3375.00%36-11.20-1.27-0.311
Philadelphia Athletics19431.4673.08%26-3.12-0.39-0.120
Chicago White Sox19431.3372.73%33-8.19-1.02-0.248
Montreal Expos19981.3972.11%147-28.43-3.12-0.193
Houston Astros19801.3672.00%75-16.24-1.84-0.217
Boston Red Sox19271.3671.43%28-8.60-0.95-0.307
Boston Red Sox19291.3671.43%28-9.16-1.04-0.327
Philadelphia Phillies19441.3670.91%55-16.32-2.07-0.297


Best Solo HR %

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Cleveland Indians19261.8929.63%27+6.52+0.67+0.241
Cincinnati Reds19311.9033.33%21+5.23+0.62+0.249
Washington Senators19351.9734.38%32+9.44+0.91+0.295
Detroit Tigers19331.8135.09%57+7.28+0.78+0.128
Washington Senators19391.7336.36%44+1.72+0.19+0.039
Chicago White Sox19311.8137.04%27+3.77+0.38+0.139
Chicago White Sox19391.7237.50%64+1.96+0.20+0.031
Pittsburgh Pirates19201.6937.50%16+1.21+0.16+0.075
Pittsburgh Pirates19431.9538.10%42+13.90+1.56+0.331
Pittsburgh Pirates19361.8238.33%60+9.90+0.97+0.165


Worst Performances by Run Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Texas Rangers20011.4665.45%246-34.61-3.08-0.141
Cleveland Indians19871.4169.52%187-30.99-3.00-0.166
Tampa Bay Devil Rays20061.4661.58%190-30.34-3.15-0.160
Kansas City Athletics19571.4168.67%166-29.67-3.59-0.179
Montreal Expos19981.3972.11%147-28.43-3.12-0.193
Toronto Blue Jays19981.4863.80%221-27.77-2.63-0.126
Texas Rangers20091.4764.73%224-26.57-2.59-0.119
New York Mets20011.4168.03%147-25.57-2.87-0.174
Toronto Blue Jays19911.4263.91%133-24.96-2.67-0.188
Minnesota Twins20011.4568.29%164-24.74-2.45-0.151


Best Performances by Run Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Cleveland Indians19991.8343.54%209+45.72+3.78+0.219
Milwaukee Brewers19951.9043.75%128+35.02+3.80+0.274
Seattle Mariners19961.7650.20%245+34.38+2.90+0.140
New York Yankees19621.7447.24%199+31.78+3.04+0.160
Minnesota Twins19701.7844.44%153+31.54+3.21+0.206
Seattle Mariners20001.7650.51%198+29.92+2.66+0.151
St. Louis Cardinals20001.7154.04%235+29.39+2.66+0.125
Anaheim Angels20041.7846.91%162+29.36+2.79+0.181
San Francisco Giants19981.7747.83%161+29.34+2.75+0.182
Minnesota Twins19621.7447.57%185+27.95+2.75+0.151


Worst Performances by Win Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Kansas City Athletics19571.4168.67%166-29.67-3.59-0.179
Tampa Bay Devil Rays20061.4661.58%190-30.34-3.15-0.160
Baltimore Orioles19941.4564.75%139-24.15-3.12-0.174
Montreal Expos19981.3972.11%147-28.43-3.12-0.193
Texas Rangers20011.4665.45%246-34.61-3.08-0.141
New York Giants19561.3966.21%145-24.24-3.07-0.167
Cleveland Indians19871.4169.52%187-30.99-3.00-0.166
Chicago Cubs19941.4067.89%109-19.36-2.88-0.178
New York Mets20011.4168.03%147-25.57-2.87-0.174
Toronto Blue Jays19911.4263.91%133-24.96-2.67-0.188


Best Performances by Win Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Milwaukee Brewers19951.9043.75%128+35.02+3.80+0.274
Cleveland Indians19991.8343.54%209+45.72+3.78+0.219
Minnesota Twins19701.7844.44%153+31.54+3.21+0.206
New York Yankees19621.7447.24%199+31.78+3.04+0.160
San Diego Padres19891.7850.00%120+25.52+2.96+0.213
Seattle Mariners19961.7650.20%245+34.38+2.90+0.140
Cleveland Indians20021.7253.13%192+27.38+2.82+0.143
Anaheim Angels20041.7846.91%162+29.36+2.79+0.181
San Francisco Giants19981.7747.83%161+29.34+2.75+0.182
Minnesota Twins19621.7447.57%185+27.95+2.75+0.151


Worst Performances by Runs/HR Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Brooklyn Dodgers19211.2977.97%59-19.98-2.24-0.339
Boston Red Sox19291.3671.43%28-9.16-1.04-0.327
Chicago White Sox19321.3369.44%36-11.66-1.22-0.324
Cincinnati Reds19241.3375.00%36-11.20-1.27-0.311
Boston Red Sox19271.3671.43%28-8.60-0.95-0.307
Philadelphia Phillies19441.3670.91%55-16.32-2.07-0.297
Washington Senators19311.3969.39%49-14.09-1.29-0.288
Chicago White Sox19431.3372.73%33-8.19-1.02-0.248
Boston Braves19221.4168.75%32-7.75-0.87-0.242
Cleveland Indians19421.3868.00%50-11.92-1.44-0.238


Best Performances by Runs/HR Diff

TeamYearRuns/HRSolo HR%Total HRsRun DiffWin DiffRuns/HR Diff
Pittsburgh Pirates19431.9538.10%42+13.90+1.56+0.331
Washington Senators19351.9734.38%32+9.44+0.91+0.295
Minnesota Twins19761.8640.74%81+23.04+2.40+0.284
Milwaukee Brewers19951.9043.75%128+35.02+3.80+0.274
Cincinnati Reds19311.9033.33%21+5.23+0.62+0.249
Detroit Tigers19491.9242.05%88+21.69+2.24+0.246
Cleveland Indians19231.8844.07%59+14.32+1.30+0.243
Cleveland Indians19261.8929.63%27+6.52+0.67+0.241
Boston Red Sox19191.8842.42%33+7.64+1.01+0.231
Pittsburgh Pirates19621.8143.52%108+24.24+2.60+0.224

Wednesday, May 9, 2012

Lopsided Batter/Pitcher Matchups

Over at The Hardball Times I recently explored the idea of lopsided matchups between batters and pitchers: those instances where a batter performed worse than expected against a particular pitcher over the course of their careers. It is a 4-part series with each part containing the top 5 lopsided matchups for a decade. I wanted to give a more complete list of the matchups, so I'm posting the complete Top 500 (1956-2011) here. One of the readers at THT commented that the process is flawed since it doesn't incorporate righty/lefty splits. I believe that the my original process is still the best (you can read my thoughts - lucky you - in the comments section of the first article) but have provided links to the results when controlling for handedness as a comparison.

Method / 1956-1969

1990-2011

  • Will be posted here after article is run at THT

Tuesday, April 24, 2012

High Heat Stats: DOA, Continued


On Sunday, birtelcom at High Heat Stats wrote an article trying to come up with a way to better measure the skill that RBI attempts to measure and called this new metric "DOA" (Driven-in Over Average). I wrote a program to generate this "DOA" for every season dating back to 1956 and posted the data.

In the interest of furthering this idea, numerous suggestions were made on how to better the metric. This post contains five different data sets incorporating these suggestions to help determine the best course.

The following changes were made to the original process and used in the generation of all five of the following data sets:

  • Originally there were 8 base runner states used to determine RBI expectation. These were split into those situations with 2 outs and those with less than 2 outs to create 16 states to better approximate RBI expectation. Initially the plan was to only introduce this distinction with a man on third, but after looking at the data, I determined there to be a significant difference in expectation with a man on second as well - due to the increased frequency of scoring on a single with 2 outs - so I split all situations by outs to arrive at 16 total states. 
  • All plate appearances ending in an intentional walk or hit by pitch were removed from the data set. This means that they were not used to determine the baseline RBI expectation nor were they used in the calculation of each player's DOA.

With some disagreement on whether unintentional walks should be included in the data set, I have provided both cases below. In those instances marked "UIBB: No", unintentional walks were not used to determine the baseline RBI expectation nor were they used to calculate each player's DOA.

There was also some disagreement on whether to include the RBI a player receives from their own run when hitting a HR. The argument against would be that the RBI would have occurred regardless of situation and thus is unimportant in determining how well a batter hits with a chance to drive in runs. The argument for would be that if we're measuring the ability to drive in runs, why would we remove the very best way of driving them in? Regardless of your position, both cases are listed below. In those marked "HR RBI: No", the single RBI from the batter is removed both in determining the baseline RBI expectation and in calculating each player's DOA. In the case of a grand slam, the batter still receives credit for 3 RBI, just not the 1 he gets for driving in himself.

In addition to the above changes, I also listed one further data set that has a restriction on the generation of the baseline RBI expectation. In calculating the expectation, only those batters with at least 200 plate appearances in the season will be used. This will remove NL pitchers from the equation and eliminate the slight skew they introduce. I wasn't sure if this was worth investigating too much so I've only provided one example to get an idea of the difference involved.

To aid comparison, you can check out this spreadsheet which lists the RBI expectation for each of the 16 states for the 5 data sets listed below. This was generated using every plate appearance between 1956-2011.

A few further notes on data:
  • I've added a "DOA %" column which is calculated simply as [Actual RBI/Expected RBI] to give a rate version of DOA where 1 is average.
  • The second row of each spreadsheet gives the RBI expectation for each of the 16 states. In the yearly spreadsheets this is calculated from that year only. In the "Cumulative Situations" spreadsheets, this is calculated as the average across all plate appearances from 1956-2011.
  • In the "Cumulative Situations" spreadsheets, even though the multi-year average is shown, each player's DOA was calculated by treating every season independently and using the yearly RBI expectations... the multi-year average is presented as a guide but was not used in any calculations.
  • Due to the nature of the metric, play-by-play data is necessary to calculate DOA. I set 1956 as my starting year but Retrosheet is missing some play-by-play data so there are some plate appearances missing from our data. For this reason, you will notice that players from the 1950's and 1960's have RBI numbers that differ slightly from their official totals; poor Hank Aaron lost 62 RBI to lost play-by-play data but can comfort himself with the knowledge that he is still the all-time DOA leader. To give an idea of the amount of play-by-play data that we're working with: In 1956 we have 95.7% of all PAs, in 1957, 97.2% and in only one year after that do we have less than 99% (98.4% in 1968). Starting in 1974, we have 100% of play-by-play data for every season.

There you go... I think I'm done talking. Enjoy the numbers!

UIBB: Yes  |  HR RBI: Yes
Yearly Data1956-1959  |  1960-1969  |  1970-1979  |  1980-1989  |  1990-1999  |  2000-2011
Player Careers: Cumulative Situations  |  By Age


UIBB: Yes  |  HR RBI: Yes  |  Restrict Baseline to Batters with at least 200 PA
Yearly Data1956-1959  |  1960-1969  |  1970-1979  |  1980-1989  |  1990-1999  |  2000-2011
Player Careers: Cumulative Situations  |  By Age

--------------------------------------------------

UIBB: No  |  HR RBI: No
Yearly Data1956-1959  |  1960-1969  |  1970-1979  |  1980-1989  |  1990-1999  |  2000-2011
Player Careers: Cumulative Situations  |  By Age

--------------------------------------------------

UIBB: No  |  HR RBI: Yes
Yearly Data1956-1959  |  1960-1969  |  1970-1979  |  1980-1989  |  1990-1999  |  2000-2011
Player Careers: Cumulative Situations  |  By Age

--------------------------------------------------

UIBB: Yes  |  HR RBI: No
Yearly Data1956-1959  |  1960-1969  |  1970-1979  |  1980-1989  |  1990-1999  |  2000-2011
Player Careers: Cumulative Situations  |  By Age