Modeling Playoff Appearances in Major League Baseball
Major League Baseball consists of two leagues, the American, with 14 teams, and the National, with 16 teams. Beginning with the 1998 season the American League has had three divisions; the East with 5 teams, the Central with 5 teams, and the West with 4 teams. The National League uses the same geographical names and has 5 teams in the East, 6 teams in the Central, and 5 teams in the West. Each season each league has four teams make the playoffs, the three division champions and the wildcard team (the team with the best winning percentage among the non-champions). So each National League team has an expected value of 2.5 playoff appearances per decade, while each American League team has an expected value of 2.67 appearances per decade. This table lists the number of playoff appearances for each of the 30 major league teams in the decade consisting of the seasons from 2000 through 2009, inclusive.
Table 1.
Playoff Appearances by Team, 2000-2009
|
National League |
American League |
||
|
Team |
Appearances |
Team |
Appearances |
|
|
7 |
|
9 |
|
|
6 |
|
6 |
|
|
4 |
|
6 |
|
|
3 |
|
5 |
|
|
3 |
|
5 |
|
|
3 |
|
3 |
|
|
3 |
|
2 |
|
|
3 |
|
2 |
|
|
2 |
|
1 |
|
|
2 |
|
1 |
|
|
2 |
|
0 |
|
|
1 |
|
0 |
|
|
1 |
|
0 |
|
|
0 |
|
0 |
|
|
0 |
|
|
|
|
0 |
|
|
I devised spreadsheet models based on random number selections to model the process of selecting four playoff teams from each league each year. I wanted to see if the distribution over a decade would look something like the above, or if the actual occurrence was unusual. Should we expect a team to have 9 appearances in one decade? Should we expect even 7 appearances for one team? How unusual is it that the American League (with two fewer teams) had one more team not make any appearance?
I used the period 1998 to 2009 as the base. Over this time the American League had 14 teams and the National League had 16 teams in their current divisional alignment and each year both leagues had their division champions and one wildcard team in the playoffs. Of the 66 division winners in the seasons 1998 through 2008 inclusive, 33, exactly half, won their division the next season. The wildcard experience varied in the two leagues. Of the 11 teams which received the National League wildcard in the seasons 1999 through 2009 inclusive, 8 had not made the playoffs the year before, one had been a division winner, and two had been the wildcard team the year before. In the same time span four of the 11 American League wildcard teams had been the wildcard team the year before and three had been a division winner the year before. Therefore I wrote programs which chose the National League wildcard and the American League wildcard in different ways.
The assumption underlying the Excel program which chooses the four National League playoff teams is that the division winner from the previous season has probability .5 of winning the division again, while the other teams have probability either .1 (Central Division) or .125 (East and West Divisions). After the three division winners had been selected each remaining team was assigned probability 1/13 of being named the wild card team. (During the period 1999 through 2009 the East Division supplied three wildcard teams while the other two divisions produced four each.)
I wrote the program to choose the four American League playoff teams so that the previous division winners each had probability .5 of winning the division again. The other teams had probabilities .125 (East and Central) or 1/6 (West). For the wildcard spot the wildcard team from the year before had initial probability 4/9 of winning it again (assuming it had not picked up a division appearance), a division winner from the previous year had initial probability .2 of being named the wildcard (again, assuming it was not a division winner in this year), and all other teams had initial probability 4/97 of being named the wildcard. (These ratios match the occurrences of the period 1999 through 2009 inclusive. In those seasons four of nine teams which had been the wildcard the previous season and did not win a division title that season won the wildcard in that season, three of fifteen teams which had won a division title in the previous season but failed to repeat as division winner that season won the wildcard spot, while the remaining four seasons had the wildcard won by one of a total of 97 teams which had not made the playoffs the previous season.) The program chose a random number between 0 and the total of the initial probabilities, so these probabilities were re-weighted to sum to one.
For each league I used the initial values as they would occur for the 2000 season; that is, the initial probabilities were based on the results of the 1999 season.
Each program had space for 240 seasons of playoff choices, or 24 decades. I ran each program 42 times so each league has 1008 decades of results.
This table shows the results for 1008 decades of National League playoff choices. The first column shows the 16 teams from those with the most appearances to the least. The second column shows the averages over 1008 decades of the ordered teams. For example, the 1008 teams with the most appearances averaged 6.025794 appearances per decade, the 1008 teams with the second most appearances in each decade averaged 4.984127 appearances per decade, and so on. The third column gives the rounded values. There is a notable occurrence for the team with the fifth most appearances. The usual rounding procedure produces just 39 average appearances per decade, so one value must be rounded up even if the fractional part is below .5. I kept track of the cumulative totals (6.025794, 11.00992, and so on), and arranged the rounding so the cumulative total was usually correctly rounded. (Following this method of rounding the cumulative total produced an anomaly of the team with the fifth most appearances having 3 and the team with the sixth most having 4, so even then adjustments had to be made.) The final column gives the actual results from the 2000-2009 decade.
Table 2.
National League
2000 to 2009 Actual vs. 1008 Decade Average
|
National League |
|||
|
Order |
Average |
Rounded |
2000 to 2009 |
|
1 |
6.025794 |
6 |
7 |
|
2 |
4.984127 |
5 |
6 |
|
3 |
4.31746 |
4 |
4 |
|
4 |
3.78869 |
4 |
3 |
|
5 |
3.366071 |
4 |
3 |
|
6 |
3.032738 |
3 |
3 |
|
7 |
2.74504 |
3 |
3 |
|
8 |
2.393849 |
2 |
3 |
|
9 |
2.089286 |
2 |
2 |
|
10 |
1.877976 |
2 |
2 |
|
11 |
1.623016 |
2 |
2 |
|
12 |
1.311508 |
1 |
1 |
|
13 |
1.052579 |
1 |
1 |
|
14 |
0.808532 |
1 |
0 |
|
15 |
0.459325 |
0 |
0 |
|
16 |
0.124008 |
0 |
0 |
This table gives similar results for the American League.
Table 3.
American League
2000 to 2009 Actual vs. 1008 Decade Average
|
American League |
|||
|
Order |
Average |
Rounded |
2000 to 2009 |
|
1 |
6.943452 |
7 |
9 |
|
2 |
5.747024 |
6 |
6 |
|
3 |
4.91369 |
5 |
6 |
|
4 |
4.279762 |
4 |
5 |
|
5 |
3.738095 |
4 |
5 |
|
6 |
3.246032 |
3 |
3 |
|
7 |
2.825397 |
3 |
2 |
|
8 |
2.382937 |
2 |
2 |
|
9 |
1.945437 |
2 |
1 |
|
10 |
1.60119 |
2 |
1 |
|
11 |
1.189484 |
1 |
0 |
|
12 |
0.770833 |
1 |
0 |
|
13 |
0.350198 |
0 |
0 |
|
14 |
0.066468 |
0 |
0 |
It is clear the actual results in the American League for the decade 2000 to 2009 were skewed towards the most successful teams. The five clubs with the most appearances made the playoffs 31 times, the average for the model is 26 appearances by the top five clubs. Thus it is not surprising that 4 teams were shut out of the playoffs entirely. Of particular note is that these five teams with the most appearances racked up 8 of the 10 wildcard spots. Contrast this to the National League, in which the five teams with the most appearances had 5 wildcard spots, and that is only if one uses Houston (2) and San Francisco (1) for the two teams with 3 appearances; choosing any two of Chicago, Arizona, and Philadelphia reduces that to 2 wildcard spots.
For each league in each of 1008 decades I compiled the number of teams which failed to make the playoffs at least once during the decade. The results indicate that it is not unusual for the American League to have more teams missing out than the National League. This may be expected since the assumptions underlying the two models derive from a decade in which the American League had more teams missing the playoffs.
Table 4.
Number of Decades per Number of Teams Missing Playoffs
|
|
Number of Decades |
|
|
Number of Teams Missing Playoffs |
American League |
National League |
|
0 |
67 |
125 |
|
1 |
279 |
334 |
|
2 |
355 |
330 |
|
3 |
215 |
168 |
|
4 |
74 |
46 |
|
5 |
15 |
5 |
|
6 |
3 |
0 |
The American League averaged 2.007 teams missing the playoffs per decade, the National League averaged 1.693 teams missing per decade.
I found 93 decades in which the American League team with the most playoff appearances had exactly 9. I checked the distribution of appearances among the remaining teams to compare against the actual distribution of the 2000-2009 decade.
Table 5.
American League
Comparison of Model Decades With Leading Team Having 9 Appearances
|
American League |
|||||
|
Order |
2000-2009 Actual |
93 Decades Average |
93 D.A. Rounded |
Actual Cum. |
93 Cum. |
|
1 |
9 |
9 |
9 |
9 |
9 |
|
2 |
6 |
6.494624 |
6 |
15 |
15 |
|
3 |
6 |
5.11828 |
5 |
21 |
20 |
|
4 |
5 |
4.225806 |
4 |
26 |
24 |
|
5 |
5 |
3.505376 |
4 |
31 |
28 |
|
6 |
3 |
3.043011 |
3 |
34 |
31 |
|
7 |
2 |
2.505376 |
3 |
36 |
34 |
|
8 |
2 |
2.010753 |
2 |
38 |
36 |
|
9 |
1 |
1.483871 |
1 |
39 |
37 |
|
10 |
1 |
1.150538 |
1 |
40 |
38 |
|
11 |
0 |
0.784946 |
1 |
|
39 |
|
12 |
0 |
0.430108 |
1 |
|
40 |
|
13 |
0 |
0.225806 |
0 |
|
|
|
14 |
0 |
0.021505 |
0 |
|
|
The usual rounding for the 93 decades of individual averages produced an average of only 39 appearances per decade, so once again I had to use some creative rounding. I thought the best approach was to round up the team with the twelfth most appearances. It is noteworthy that in the real decade of 2000 through 2009 the teams with the third, fourth, and fifth most appearances did better than the corresponding teams in the average of the 93 decades. So it wasn’t just that the Yankees took up so many playoff spots, they had help from the other successful teams of this decade.
This linear table gives the number of decades for a given number of teams never making an appearance in that decade.
|
Teams missing |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
|
Decades |
1 |
18 |
19 |
27 |
22 |
5 |
1 |
So we see having 4 teams miss the playoffs in a decade is not that unusual given that the top team had 9 appearances. The mean value is 2.75. Rounding this value to 3 we see the decade had just one extra team not make the playoffs. (It is worth noting that with the exception of Texas in 2004 none of Baltimore, Kansas City, Texas, and Toronto came close to making the playoffs in the real decade. And in this case although Texas finished just 3 games out of first place, they still finished in third place.)
The simulation results indicate that having the top team with two more appearances than the overall average should not have impacted the number of appearances for the next few teams, as shown in this table.
Table 6.
American League
Average Appearances Total vs. Average Appearances for Decade, Top Team 9
|
Order |
All 1008 Decades |
Rounded All |
Rounded 93 |
93 Decades, Top Has 9 |
|
1 |
6.943452 |
7 |
9 |
9 |
|
2 |
5.747024 |
6 |
6 |
6.494624 |
|
3 |
4.91369 |
5 |
5 |
5.11828 |
|
4 |
4.279762 |
4 |
4 |
4.225806 |
|
5 |
3.738095 |
4 |
4 |
3.505376 |
|
6 |
3.246032 |
3 |
3 |
3.043011 |
|
7 |
2.825397 |
3 |
3 |
2.505376 |
|
8 |
2.382937 |
2 |
2 |
2.010753 |
|
9 |
1.945437 |
2 |
1 |
1.483871 |
|
10 |
1.60119 |
2 |
1 |
1.150538 |
|
11 |
1.189484 |
1 |
1 |
0.784946 |
|
12 |
0.770833 |
1 |
1 |
0.430108 |
|
13 |
0.350198 |
0 |
0 |
0.225806 |
|
14 |
0.066468 |
0 |
0 |
0.021505 |
This is another piece of evidence that the most remarkable result of the 2000 to 2009 decade in the American League was the cumulative performance of the top five teams.
During the decade 2000 through 2009 the National League had a more egalitarian distribution of playoff appearances among its teams than did the American League. This is primarily due to the top 5 AL teams taking so many of the wildcard spots. What is perhaps more interesting is that the simulation results did not duplicate reality, even though the model allotted higher probabilities of obtaining the wildcard to teams which had made the playoffs in the previous season. The ability of the next four teams to garner playoff positions in the real decade outstripped the results of those teams even when simulation was restricted to decades in which the top team had won nine playoff appearances.
(The reason that the model did not match reality for the AL is probably due to the fact that in reality slightly more than half the division winners went on to repeat as division champions again in the 1999 through 2009 seasons. Although the major league result was exactly half, 33 of 66 division winners repeating, the percentage was slightly higher in the AL, slightly lower in the NL.)
Dave Trautman
The Citadel