By Bill James

April 6, 2020

OK, I am ready to move forward here, I think, although the course ahead of us is not exactly what I had originally planned, and also I am not really happy with it. I should warn you at the beginning that I am still working on this, so it is entirely possible that later in this week, I’m going to throw all of this (today’s work) away, and give you some different formula. You’re warned.

But for what it is worth, I’m going to give you a formula for Runs Prevented here (RP), a different formula from the one we have been using, and it won’t really make sense, but it will make sense later, I hope. ER in the formula below is the expected run context—that is, the league average of Runs per 9 Innings, modified by the Park Adjustment. Runs Prevented is:

RP = ([4.46 – ER) /2 + 4.46} + ER] /9 * innings – Runs Allowed by the team

I may have messed up that formula somewhere; I know what I meant to say but I’m not sure it is stated exactly right. Also, I would guess that some of you could simplify that formula so that it’s calculation is easier; I was never good at that. Back in the late 1970s, when Dallas Adams and I were exchanging letters about issues of this nature, I would send him a formula like that and he would strip it down to half as many elements.

Anyway, 4.46 is the average number of runs per nine innings allowed by a team over time; it actually isn’t 4.46; it’s actually 4.48, but 4.46 works better in the formula for some reason. So you take the average number of runs per game, 4.46, and subtract the number of expected runs per nine innings that this team should allow, which can give you a positive or a negative number. If the team plays in a high run context (think: steroid era). . . if the team plays in a high run context, then this will give you a negative number, leading to fewer runs prevented; if they play in a low run context like the 1960s, it will give you a positive number, leading to more runs prevented.

To this you add the 4.46 back in, and you add the team’s expected runs allowed per nine innings. You divide that by 9, and multiply by the team’s innings pitched. The result is the "ceiling" against which we measure the team’s defensive components. That’s the new zero-value defense line.

This process gives you a number of potential runs allowed for each team which is around 9 per game, actually around 8.92 per game, but somewhere in there. This number will be HIGHER than 9 in a high-run context, and LOWER than 9 in a low-run context—but not as much so as the zero-competence lines that we had before, and the fact that it is not as much so as the line that had before means that low-run teams will get more credit for pitching and defense than they did before, and high-run teams will get less.

I’ll do the 2019 Pittsburgh Pirates for illustration. The Pirates pitched 1,440 innings. For the National League as a whole there were 11,449 runs allowed by pitchers in 21,732.2 innings. At that rate, we would expect the Pirates to allow 759 runs. Their Park Factor was 101, which creates a Park Adjustment of 1.0046, which increases this to 762 runs, or 4.76 runs per nine innings. That is ER in the formula above; expected runs allowed per nine innings.

That is higher than 4.46, so when you subtract that from 4.46 you get a negative number, -.30. Divide that by 2; you’ve got -.15. At that to 4.46, you have 4.31, and add the run context (4.76), you have 9.07. What we are saying is that if the 2019 Pittsburgh Pirates had no competent pitching at all, and no competent defensive players at all, they would have allowed 9.07 runs per nine innings, or 1,451 on the season.

While the Pirates were not a good team and were not a good defensive team, they did not allow 1,451 runs. They allowed 911 runs. That means that they PREVENTED 540 runs.

If you add up all of the Run Prevention elements that we have calculated before, they do not add up to 540 runs. They add up to 506 runs. However, we’re 7% low on the estimates, in a global sense. That means that we still have other "errors" to fix, other discrepancies to address. Once we fix those things, the Pirates’ Runs Saved should creep upward as other teams do, so these two numbers should converge on a common point.

That’s what we need; we need for two different numbers, derived from different sources, to converge on a common point. The two numbers are:

1) An estimate of the Team’s Runs Prevented derived from their League and Park Run Context, and

2) An estimate of the Team’s Runs Prevented derived from the 11 categories of Run Prevention that we have been studying for the last two or three weeks, however long it has been.

If we can get those two things to converge on a common point for teams and leagues across the course of 120 years, then we know that we are measuring something real, something which has an external reality. In essence, we are predicting how many runs the team will allow, based on their strikeouts, their DER, Fielding Percentage, Double Plays, their Walks and Wild Pitches, etc. But also, and crucially, we are predicting that *through the pathway of how many they WOULD have allowed if they had no competent pitching or defense*. If there wasn’t some external reality to it, some grand pattern to it, then why would one set of numbers predict the other? And that enables us to measure that pathway, measure how many runs they WOULD have allowed without any pitching or any defense—and the fact that we can do that, or will be able to do that, enables us to assess the value of each of those run prevention elements. Make sense?

Don’t tell me if it doesn’t; I don’t really care. The Pirates in the old system, that I was using two or three days ago, would have had a zero-competence line of 9.52 runs per game, which is twice their expected runs allowed per game (4.76). We have reduced that from 9.52 to 9.07, thus reducing the number of runs that we credit them with preventing. I’m not happy with the new formula; I expected to be able to do better than that, and I worked on it for dozens of hours trying to find a better process, but I never could. I’d work three hours on a new process, then I would test it, and it would turn up with the same error range as the last one, so I’d start over. I wrote this article on Saturday, and then wasted all of Sunday on unsuccessful efforts to find a better formula. Almost all of the most serious errors are either (a) Federal League teams, or (b) Washington Senators teams of the 1901-1910 era.

©2021 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy

## COMMENTS (13 Comments, most recent shown first)

KaiserD2Enthusiasm makes the world go round.

DK

7:50 AM Apr 8thbjamesMy last comment is this: you are using the park-adjusted league average of runs allowed in the formula. I take that as evidence that you have been forced to conclude, consciously or unconsciously, that measuring against average is the only sensible thing to do, even though you have explicitly opposed doing so many times, including in this series. As one of those who has always believed in measuring against average, I am gratified.

I appreciate the enthusiasm of your misunderstanding.

10:53 PM Apr 7thKaiserD2jgf, you're right, it doesn't matter when you do the division, I was wrong about that one.

DK

3:10 PM Apr 7thjgf704David Kaiser wrote:

Second comment: Instead of dividing by 9*innings, don't you want to multiply by innings/9?Actually, what Bill wrote:

/9 * inningswould be interpreted as "divide by 9, then multiply by innings", so that part is fine.

2:43 PM Apr 7thKaiserD2Dear Bill:

I'm trying to play Dallas Adams here. I remember the passage from one of the self-published abstracts when you realized that the Pythagorean formula and the Log5 formula were in fact the same formula.

Like Frank D, I noticed that your parentheses/brackets didn't match. You wrote:

RP = ([4.46 – ER) /2 + 4.46} + ER] /9 * innings – Runs Allowed by the team

I'm guessing that you meant:

RP = [(4.46-ER)/2 + 4.46+ ER]/9*innings - Runs Allowed by the team.

Now doing the same simplification that Frank D did, the sum within the brackets can also simply be written 6.69 +ER/2. I do not in the least understand why that would be a meaningful number.

Second comment: Instead of dividing by 9*innings, don't you want to multiply by innings/9? That would convert a per-game number to a number for the whole season.

My last comment is this: you are using the park-adjusted league average of runs allowed in the formula. I take that as evidence that you have been forced to conclude, consciously or unconsciously, that measuring against average is the only sensible thing to do, even though you have explicitly opposed doing so many times, including in this series. As one of those who has always believed in measuring against average, I am gratified.

David Kaiser

8:17 AM Apr 7thFrankDI fell for it .... I should have realized when you didn't integrate around the poles. The Nazis did and still lost. Plus you didn't look at baseball in time from minus infinity to plus infinity, that would have shown that on average, baseball didn't exist.

10:32 PM Apr 6thFrankDI can't follow your equation with the miss-matched symbols for the brackets nor the too many facing one way. Are the brackets supposed to be the same?

10:24 PM Apr 6thBrianSenators in Washington DC have never given an honest accounting of their errors or the impact of those errors.

10:14 PM Apr 6thCharlesSaegerBetween the era and the records, I think the issue the Senators have is the same as the Federal League: not really Major League Baseball.

9:02 PM Apr 6thjgf704Another way to write

(BASE - ENV)/2 + BASE + ENV

is

2*BASE+ (ENV - BASE)/2

with BASE= 4.46, and ENV is the league runs adjusted to the team's park. I like this way because it shows the baseline as 2*BASE, and this number is adjusted by an amount equal to (ENV-BASE)/2.

8:16 PM Apr 6thjrickertThe ([4.46 – ER) /2 + 4.46} + ER] can be simplified to (13.38+ER)/2, where the 13.38 is 3*4.46.

7:05 PM Apr 6thBrianI believe that the formula works the same if you double the expected runs and then go 1/4 of the way from 8.92 to that number. 9.07 is 1/4 of the way from 8.92 to 9.52 in your example. I tried it for a couple of other numbers and it worked for those.

6:49 PM Apr 6thevanecurbI wonder what it is about the Washington Senators of the 1900-1910 era that makes their errors stand out?

3:17 PM Apr 6th