What's new
USCHO Fan Forum

This is a sample guest message. Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

  • The USCHO Fan Forum has migrated to a new plaform, xenForo. Most of the function of the forum should work in familiar ways. Please note that you can switch between light and dark modes by clicking on the gear icon in the upper right of the main menu bar. We are hoping that this new platform will prove to be faster and more reliable. Please feel free to explore its features.

KRACH Ratings

Re: KRACH Ratings

The KRACH is indeed transparent if one is willing to break it down...And while it's a natural fact that no statistical metric is perfect, the KRACH is by far the best I've seen of the very few available.

I'd certainly be interested in looking at a better calculation if anyone knows of one.

I admit that the math is well beyond me and I have no particular interest in trying to catch up in any meaningful way at this point in my life. My questions invite emotion and may not have simple answers but I wonder if people here care to have a go:
Is there evidence that any of these systems is a better predictor of outcome?
If there isn't why would anyone care?
If there is can you explain why one is better - using small words so I'll understand?
Other than tradition is there any reason the NC$$ uses PairWise? Assuming there is a demonstrably better system
I understand womens hockey may be a bit different but can anyone point to PWR changed the outcome of the tourney? (I know that has opinion all over it but convince me please)
EDIT: maybe the better question would be, Can you point to a tourney that would have been substantially different using a better system?
Would it be any easier to demonstrate the systems accuracy on the mens side?

At one level I don't suppose it does not matter as much for 1-4 but for 5-12, or so, might change their season.
 
Last edited:
Re: KRACH Ratings

If there is can you explain why one is better - using small words so I'll understand?
The Ratings Percentage Index (RPI) is at the heart of the PairWise. RPI is so flawed that it needs a kludge so that a team's rating doesn't drop when it wins a game. No logical model does that.
 
Re: KRACH Ratings

I understand womens hockey may be a bit different but can anyone point to PWR changed the outcome of the tourney?

After the conference tournaments in 2013, KRACH had North Dakota as the #3 team in the country rather than #8, so if the NCAA used a KRACH based system rather than an RPI based one, UND would have hosted a quarterfinal rather than travelling to Minneapolis.
 
Re: KRACH Ratings

After the conference tournaments in 2013, KRACH had North Dakota as the #3 team in the country rather than #8, so if the NCAA used a KRACH based system rather than an RPI based one, UND would have hosted a quarterfinal rather than travelling to Minneapolis.

the problem i see with most of these 'rating' systems [the guy who does 'Myhockey' is similar] is that they put no emphasis on Head-to-Head result...which is THE MOST efficient way to rank teams...When you have 2 teams with similar records, and team A beats team B....in my book, team B should Never be ranked higher than team A. That's just pure logic and a "fail" l in these type of 'math only' systems.
 
Re: KRACH Ratings

the problem i see with most of these 'rating' systems [the guy who does 'Myhockey' is similar] is that they put no emphasis on Head-to-Head result...which is THE MOST efficient way to rank teams...When you have 2 teams with similar records, and team A beats team B....in my book, team B should Never be ranked higher than team A. That's just pure logic and a "fail" l in these type of 'math only' systems.

Of course, PWR puts a huge emphasis on head to head results. And one math only system that doesn't privilege head-to-head match ups at all is regular season conference standings. Do you advocate that we change things such that a team that finishes in second place by one game, but that swept the first place team should be declared the conference champion? If not, why not?

One problem with your approach is in defining what, exactly, constitutes "similar records." Raw won/loss percentage is clearly inadequate, because teams play different schedules. You have to account for strength of schedule somehow, and that's where all of the math more complicated than calculating won/lost record comes into the picture. The question that's being asked here is, "Which of these systems accounts for strength of schedule best?"

Another problem is that, in college hockey, it's pretty rare for two teams to play each other just once. How often do we find ourselves in a position where we need to make distinctions between two teams that only played each other once? How often do we find ourselves in a position where one team swept another in multiple games and finds itself ranked behind that team? And what does head-to-head record tell us about teams that played multiple times without either team sweeping?

And how do you resolve situations in which three or more teams have played each other, with Team A beating Team B, which beat Team C, which beat Team A? By your logic, you can't rank any of these teams above the other.

You need to ask what it is that your ranking system is trying to accomplish. The two main things that they try to do is either to rank teams by who has had the best season or rank them by which team is most likely to beat the others in a game played in the future. For the latter goal, putting any special weight on head-to-head match ups is a mistake for a system that is trying to rank 30+ teams. A single game, or even two, is too small of a sample size to be at all useful. If it were, split two game series would be a rare thing, because, by winning the first game, the winner has established that it is, by a significant margin, the better team. If the universe you are trying to model is just two teams, ignoring considerations of how they would do against anyone else, then looking at head-to-head games can be useful, because some teams don't match up well against others, but that isn't much of a ranking system.
 
Re: KRACH Ratings

Of course, PWR puts a huge emphasis on head to head results. And one math only system that doesn't privilege head-to-head match ups at all is regular season conference standings. Do you advocate that we change things such that a team that finishes in second place by one game, but that swept the first place team should be declared the conference champion? If not, why not?

One problem with your approach is in defining what, exactly, constitutes "similar records." Raw won/loss percentage is clearly inadequate, because teams play different schedules. You have to account for strength of schedule somehow, and that's where all of the math more complicated than calculating won/lost record comes into the picture. The question that's being asked here is, "Which of these systems accounts for strength of schedule best?"

Another problem is that, in college hockey, it's pretty rare for two teams to play each other just once. How often do we find ourselves in a position where we need to make distinctions between two teams that only played each other once? How often do we find ourselves in a position where one team swept another in multiple games and finds itself ranked behind that team? And what does head-to-head record tell us about teams that played multiple times without either team sweeping?

And how do you resolve situations in which three or more teams have played each other, with Team A beating Team B, which beat Team C, which beat Team A? By your logic, you can't rank any of these teams above the other.

You need to ask what it is that your ranking system is trying to accomplish. The two main things that they try to do is either to rank teams by who has had the best season or rank them by which team is most likely to beat the others in a game played in the future. For the latter goal, putting any special weight on head-to-head match ups is a mistake for a system that is trying to rank 30+ teams. A single game, or even two, is too small of a sample size to be at all useful. If it were, split two game series would be a rare thing, because, by winning the first game, the winner has established that it is, by a significant margin, the better team. If the universe you are trying to model is just two teams, ignoring considerations of how they would do against anyone else, then looking at head-to-head games can be useful, because some teams don't match up well against others, but that isn't much of a ranking system.


i've seen many, many cases where team A and B play similar schedules....maybe team "A" is given a higher rank based on SOS...but then team B beats team A...once or maybe twice...but they are ranked lower. Sorry, but that's just stupid.
 
Re: KRACH Ratings

i've seen many, many cases where team A and B play similar schedules....maybe team "A" is given a higher rank based on SOS...but then team B beats team A...once or maybe twice...but they are ranked lower. Sorry, but that's just stupid.

You're right, Lindenwood should be ranked ahead of Northeastern and North Dakota should be ranked ahead of Wisconsin.
 
Re: KRACH Ratings

Okay, so what is the SoS? The KRACH explanation says that it's the weighted average of the KRACH ratings of your opponents, but my calculations of that come out very different than what is shown either on the USCHO KRACH page, or your version. Using the USCHO page, I come up with a SoS for Minnesota of 386.96 and for Ohio State of 556.70, so KRACH is, at a minimum, opaque in the sense that it's not clear what the weighting on that average is. Further, I'm not sure that just taking a weighted average of your opponents' ratings is a particularly meaningful measure of SoS given the tendency of KRACH to head towards infinity as a team gets really good.

I've been meaning to respond to this but wanted to give it the detail it deserved.

When you're doing your calculation as described above you're looking at it linearly, but you can't do that because KRACH is on an odds scale -- that is, the ratios of the ratings are what's important, not the difference. So you can't just average the three numbers and get a SOS.

For example, say UMD (rating of 200) played three teams with ratings of 50 (RPI), 100 (RMU), and 6,000 (BC). Taking an average of those gives you a strength of schedule of 2,050 which is obviously not accurate. They played an amazing team, an average team, and a bad team. Yes, playing an amazing team should have a bigger affect on your SOS than playing a great team, but there's diminishing returns there -- for all intents and purposes, playing a team with a KRACH rating of 6,000 and playing a team with a KRACH rating of 6,000,000 are going to have about the same affect on your SOS.

CHN gives a pretty good FAQ on the calculations of KRACH and SOS here, but yes, the math can be tough to wrap one's head around. Basically, when it says "weighted average," it's taking the average in such a way that it's not linear and accounts for the ratings being on an odds scale.

For that UMD (200) schedule of RPI (50), RMU (100), and BC (6000), the SOS would be calculated as follows:

( [1/(UMD rating + opponent rating) * opponent rating] + same for opponent #2 + same for opponent #3 ),
all of this divided by the sum of all of the "weighting factors", which is the part I bolded.

[1/(200+50) * 50] + [1/(200+100) * 100] + [1/(200+6000) * 6000]
=50/250 + 100/300 + 6000/6200
=1.501075
Divided by (1/200+50) + (1/200+100) + (1/200+6000)
=1.501075/(1/250 + 1/300 + 1/6200)
=1.501075/.0074946

= 200.28, not the 2,050 we got earlier just by averaging the three numbers.

*keep in mind this is just a basic example. In order to actually calculate the ratings for yourself and your opponent and your SOS, you need to actually have a set of results rather than just say "well if a team with a rating of 200 played teams with ratings of 50, 100, and 6,000" because if they played those teams, they would have results against those teams, which would give that team an actual rating (i.e. not 200), which would affect those team's ratings, and therefore SOS. It's all recursive. But this example is at least illustrative.
 
Last edited:
Re: KRACH Ratings

If so, then it shouldn't report SOS at all, because I maintain that piece is flawed. After Minnesota and Ohio State have played four times, it still thinks that the Gophers have played a considerably tougher schedule.

I actually agree with you. I think what it's calling "strength of schedule" is something that's a little deeper than that. It's really something of a multiplicative factor.

When you consider the logical progression of "well, the Gophers appear to have a softer schedule than Ohio State, but UM's SOS is higher so if they had the same record UM's rating would be higher and that doesn't make any sense," you have to consider the fact that they don't have the same record, and if they had the same record against the same teams it would affect those team's ratings and therefore it would affect the "SOS" numbers that KRACH reports, and you wouldn't have UM having a higher SOS (they would be the same).

It starts to bend your brain. But I do think I agree with you. I don't really think what it's calling "SOS" is really SOS at at all, but rather some derivative of it.

When you compare the two schedules of OSU and UM it really gives you an idea:

They both have played PSU x2, UW x2, ND x2, SCSU x3, MSU x2, BSU x2, and UMD x2. So those cancel out.

The differences are:

OSU has played:
LU x2 (KRACH of 35)
An extra SCSU game (170)
UVM x1 (31)
BU x1 (165)
UM x4 (1,400)

UM has played:
OSU x4 (61)
An extra 2 against MSU (16)
Yale x2 (65)

UMx4 > OSU x4
SCSU+BU > Yale x2
LU x2 > MSU x2

I find it hard to believe that one game against a worse team than UM played (UVM) for Ohio State actually makes UM's SOS better than OSU's.

So, yeah.

I don't think it takes away from KRACH as a rating method because the ratings themselves intuitively make so much sense (sum up the expected winning percentages and it equals the number of wins you actually have exactly -- really, really clever), but I think the number it is reporting for SOS shouldn't really be described as "strength of schedule."
 
Last edited:
Re: KRACH Ratings

Side note! I added a new tab to the KRACH rankings in my signature they tells you the % chance any team has of beating any other team. It's in the "probabilities" tab.
 
Re: KRACH Ratings

That explanation of strength of schedule reinforces my point that KRACH is opaque rather than refuting it. Sure, you can dig through the math and a few people will be able to figure it out, but that doesn't help the average fan. This is exacerbated by giving a number that is called "Strength of Schedule", but really isn't. To be transparent requires more than the creators providing an explanation that would allow you, with a significant time investment, replicate the results. It has to be readily understandable by the people using it or trying to draw information from it. KRACH just doesn't meet that standard.

I agree with you that KRACH is the best of the ranking systems we have. I also suspect that there is no way to structure a rating system such that it both does the job sufficiently and is transparent to the average fan. Rutter requires some knowledge of Bayesian statistics. I really don't know anything about CHODR. Faced with that trade-off, I'd much rather go with the approach that is opaque rather than the one that's fatally flawed, but that doesn't mean denying the drawbacks that KRACH has. One of them is that most people are going to look at it and not really grasp where it comes from, though dropping the strength of schedule column, or at least renaming it, ought to be a no brainer that no one seems to care enough to do.

The real question about KRACH is how well it actually predicts results. Setting up a test of that is conceptually easy, but really time consuming to actually do. I might get around to it at some point.
 
Re: KRACH Ratings

i've seen many, many cases where team A and B play similar schedules....maybe team "A" is given a higher rank based on SOS...but then team B beats team A...once or maybe twice...but they are ranked lower. Sorry, but that's just stupid.

No, it really isn't. Each team plays more than one opponent, and allowing just a single game to determine your rankings when you have a wealth of additional data is what is dumb. As I said, if the results of a single head-to-head match up were really that determinative, we wouldn't see weekend series get split. We have actual evidence on this question, and it does not support your thesis.
 
Re: KRACH Ratings

No, it really isn't. Each team plays more than one opponent, and allowing just a single game to determine your rankings when you have a wealth of additional data is what is dumb. As I said, if the results of a single head-to-head match up were really that determinative, we wouldn't see weekend series get split. We have actual evidence on this question, and it does not support your thesis.

Yes, actually it is...the system is a Very poor predictor of actual results when these team get to settle it out on the ice, rather than some nerd's computer. Sorry, but I've seen it happen for years.:(
 
Re: KRACH Ratings

That explanation of strength of schedule reinforces my point that KRACH is opaque rather than refuting it. Sure, you can dig through the math and a few people will be able to figure it out, but that doesn't help the average fan. This is exacerbated by giving a number that is called "Strength of Schedule", but really isn't. To be transparent requires more than the creators providing an explanation that would allow you, with a significant time investment, replicate the results. It has to be readily understandable by the people using it or trying to draw information from it. KRACH just doesn't meet that standard.

I agree with you that KRACH is the best of the ranking systems we have. I also suspect that there is no way to structure a rating system such that it both does the job sufficiently and is transparent to the average fan. Rutter requires some knowledge of Bayesian statistics. I really don't know anything about CHODR. Faced with that trade-off, I'd much rather go with the approach that is opaque rather than the one that's fatally flawed, but that doesn't mean denying the drawbacks that KRACH has. One of them is that most people are going to look at it and not really grasp where it comes from, though dropping the strength of schedule column, or at least renaming it, ought to be a no brainer that no one seems to care enough to do.

The real question about KRACH is how well it actually predicts results. Setting up a test of that is conceptually easy, but really time consuming to actually do. I might get around to it at some point.
I hear you. I think, though, that KRACH itself is pretty transparent and intuitive. Strength of schedule maybe not, but the ratings, definitely. The average fan doesn't have to understand the step by step mathematical process to derive the numbers to understand what it's doing.

It's like calculus -- you don't need to know how to calculate a derivative to understand that you're just trying to find the slope at a point on a curve.
 
Re: KRACH Ratings

It's really something of a multiplicative factor.
I agree; it would be better if they called it something like "Schedule Factor", so people wouldn't use it to conclude, "Team A has played the toughest schedule in the country."
 
Re: KRACH Ratings

For fun I wanted to see how the KRACH ratings look if you include only games in the 2nd half. I realize there are flaws to this (fewer OOC games, for one) but still.

The result is a ranking pretty close to what you have over the course of the full year with a few exceptions.

1 Boston College 4108.74
2 Minnesota 2918.11
3 Wisconsin 1116.22
4 Clarkson 1002.95
5 Quinnipiac 721.91
6 Northeastern 562.40
7 Princeton 283.01
8 Colgate 278.44
9 Syracuse 220.90 ??????????????????????????????????
10 Harvard 213.66
11 Mercyhurst 196.16
12 St. Lawrence 188.55
13 North Dakota 180.30
14 Bemidji State 164.45
15 Boston University 151.47
16 Connecticut 108.35
17 Rensselaer 103.38
18 Cornell 102.36
19 St. Cloud State 84.61
20 Minnesota-Duluth 83.64

Look at UConn too hanging out around league average.
 
Re: KRACH Ratings

In other news, as further evidence that KRACH's "strength of schedule" label makes no sense, BC's 2nd half "strength of schedule" is 13th in the country. But, if NU had beaten BU in their 2nd to last game of the regular season, BC's SOS would have been 3rd.

That makes less than zero sense.
 
Re: KRACH Ratings

KRACH also has the great property that winning will ALWAYS help your rating and losing will ALWAYS hurt your rating. That is, there are no "bad wins" like in RPI.

No matter how bad the team is that you're playing, your expected winning percentage will always be less than 1. So a win will always add more wins to your win total (1) than your chance of winning the game (like BC has a 0.997-ish chance of beating Union, for example), increasing their rating.

Same with losing -- losing will always lower your rating. Union losing to BC will always add fewer wins to Union's win total (0) than their chances of winning the game (0.003-ish), lowering their rating.

It also gives you a method to see what the percent chance team A has of beating team B: (Team A)/(Team A + Team B).

It's seriously fantastic and digging into it was one of the most interesting things I've ever done because I'm a raging nerd.

Can you now tackle the D3 women's Krach ratings. I'm trying to figure out how much is added to the eastern teams strength of schedule for solely being east of Michigan :)
 
Back
Top