Re: 2012 NCAA Tournament: Division I Bracketology
While were on the subject, here's some new discussion on KRACH vs. RPI. It's really long though.
------------------------
------------------------
I believe the RPI is particularly ill-suited for women's ice hockey. Two present characteristics of women's college hockey pose problems for the RPI: [1] a low share of interregional games (6 of 37-39 games for a typical WCHA contender), and [2] a Western region that is stronger on average than the Eastern region. This combination poses difficulties for the RPI's calculation of strength-of-schedule. Relative to rating systems with firmer statistical foundations, the RPI typically underrates middle-of-the-pack WCHA teams (3rd-6th place) by roughly three or four places in the national standings. This is a nontrivial discrepancy for an 8-team tournament. The RPI also influences the "record vs. teams under consideration" selection criterion, because the RPI is used to determine the 12 teams under consideration.
-----------------------------
Problems with the RPI
To illustrate the problems with the RPI, it's helpful to consider a concrete, yet simplified example. Suppose you are comparing two teams, call them team E and team W, with similar results, except Team W plays several games against Wisconsin (a team with a 90% win pct.) and team E plays several games against Boston College (a team with a 70% win pct.). Let's further suppose that team W has posted a 15% win percentage in its games against Wisconsin, while team E has posted a 30% win percentage against Boston College. This scenario may approximate what the committee would face in comparing teams like North Dakota and UMD against Northeastern and Boston University this season.
How do you compare these two teams with different schedules? If Wisconsin and Boston College played equally challenging schedules, then the evaluation is simple: an average team should beat Wisconsin 10% of the time and an average team should beat Boston College 30% of the time. Team W beat Wisconsin 15% of the time and is clearly above average, while Team E beat Boston College 30% of the time and is clearly average. Does the RPI get this right? No! Team E has a 15% edge over team W in win percentage, while team W has a 20% edge over team E in opponents' record. But win percentage gets a 35% weight in the RPI, while opponents' record gets only a 24% weight. 15%*35% > 20%*24%, so team E gets the edge, even though team W is clearly the better team in this example.
The prior paragraph assumed that Wisconsin and Boston College played equally challenging schedules, so we assumed away the impact of the 46%-weighted component of the RPI which considers opponents' opponents' schedules -- let's call it the "adjustment for opponents' strength-of-schedule" henceforth. Now let's assume that Wisconsin and Boston College have schedules similar to their actual schedules this season. According to the strength-of-schedule measures from both the USCHO RPI and USCHO KRACH, Wisconsin has played either the first or second toughest schedule in the country, while BC's schedule is the 7th or 10th-toughest. Considering that Wisconsin played a tougher schedule than BC, then relative to the example in the previous paragraph, there should be a better case for team W. But does the RPI's consideration of opponents' strength-of-schedule actually favor the team who played Wisconsin? No! Even though Wisconsin played a tougher schedule than BC, all that matters in determining how much credit team W gets for playing Wisconsin is the record of Wisconsin's opponents -- not the strength-of-schedule measure that places Wisconsin 2nd in the nation! It so happens that BC's opponents actually have a slightly better record than Wisconsin's opponents (see the OPWP column of Rutter's RPI). In conclusion, the RPI's adjustment for opponents' strength-of-schedule actually favors team E in this example, even though team W's opponent Wisconsin played the tougher schedule by any other measure.
The problems with the RPI described here are not some cherry-picked special cases that rarely occur in reality, but rather a significant problem for women's hockey, which exhibits a limited number of interregional games and a lack of parity between conferences. In the extreme case where one region is universally stronger than the other, but only a small fraction of games are interregional, then the RPI standings are only marginally different from simply merging the standings of each conference -- the strength-of-schedule adjustment would be almost nonexistent. The fact that the RPI's adjustment for opponents' strength-of-schedule actually favors a BC opponent over a Wisconsin opponent (even though Wisconsin clearly has the stronger schedule) illustrates that the concern I describe matters in practice.
How KRACH solves the RPI's flaws
How does a statistical model like USCHO's KRACH address the problems described above? What KRACH does is it assigns a rating to each team, such that the expected probability of a team with rating A beating a team with rating B equals A/(A+B). KRACH calculates the ratings for each team so that the estimated probabilities of game outcomes most closely match the observed outcomes. It's a simple, transparent system. It avoids the arbitrary weighting of the RPI, and it avoids the problems with the RPI described in previous paragraphs. The ratings can then be used to compute a "round-robin win percentage" (RRWP) which describes what each team's win percentage would be if each team played every other opponent exactly once -- this RRWP lends itself to a much easier interpretation than the KRACH ratings themselves.
Let's reconsider my first example of an RPI flaw, and assume that Wisconsin beats a league average team 90% of the time, and BC beats a league average team 70% of the time. The USCHO KRACH league average is normalized to 100, so such a Wisconsin team would have a KRACH rating of 900 and BC would have a KRACH of 233.3. Team E, which beats BC 30% of the time, would have a KRACH of 100 -- precisely the league average. Team W, which beats Wisconsin 15% of the time, would have a KRACH of 158.8, above the league average. KRACH would also predict that team W would win about 60% of its games against team E. So KRACH corrects the first fallacy I described.
Now let's consider my critique of the RPI's opponents' strength-of-schedule adjustment. In actuality, Wisconsin wins about 90% of its games while BC wins about 70% of its games. When we adjust these win percentages for strength of schedule, we should see a wider gap between Wisconsin and BC than 20%, since Wisconsin plays a tougher schedule than BC. Indeed, the KRACH model suggests that if the teams each played a balanced schedule, Wisconsin would win 95% of its games while BC would win 74% of its games. (Wisconsin's KRACH rating is 2303 and BC's is 320.5). Team W, which beats Wisconsin 15% of the time, would have a KRACH rating of 406 -- well above the league average -- while team E that beats BC 30% of the time would have a KRACH of 137 -- slightly above league average. Team W would be expected to beat Team E about 75% of the time. So KRACH corrects the second RPI fallacy I described.
How KRACH does not solve the RPI's flaws
One common misconception is that KRACH would correct the RPI by giving more weight to strength of schedule. As the previous examples demonstrate, the RPI's approach to calculating strength-of-schedule is flawed, and simply giving more weight to a flawed approach does not improve it. What KRACH achieves is a more accurate approach to calculating strength-of-schedule.
The misconception that KRACH fixes the RPI by giving more weight to strength-of-schedule leads to some resistance to its adoption. For example, one head coach I spoke with last year expressed a common concern that it would be impossible for eastern schools to compete under KRACH, because WCHA schools would play tougher schedules and Eastern schools couldn't afford to travel to WCHA schools. But an effective KRACH is not achieved simply by playing a difficult schedule. North Dakota and UMD are 4th and 5th in KRACH not because they simply played more games against Wisconsin and Minnesota, but because they were more successful against Wisconsin and Minnesota than anyone else. Meanwhile, a team like Cornell can achieve a KRACH higher than North Dakota and UMD, despite having never played Wisconsin and Minnesota, by beating teams of similar quality to those that beat North Dakota and UMD.
There is no inherent advantage or disadvantage of playing a tougher schedule under KRACH. One crucial advantage of improving the selection criteria is that it reduces any current or future incentive to try to game the present NCAA criteria through the choice of nonconference opponents.
The Results vs. Teams Under Consideration criterion
Replacing RPI with KRACH would improve the record vs. teams under consideration criterion by more accurately picking the top 12 teams in the country, but it would not correct the fact that this criterion is typically not adjusted for strength-of-schedule. My understanding is the committee may have some discretion in assessing the relative merits of teams' results beyond the simple W-L-T record, but such discretion would be subject to controversy. One simple solution would be to calculate a second estimation of KRACH using only the results between the top 12 teams from the original KRACH.