Announcement

**Patman** · 12-15-2011, 10:48 AM

Re: Statistics Models

Originally posted by Numbers View Post

Patman,
What exactly is Rutter's application of the KRACH model?
Thanks.
Numbers

IIRC, Rutter uses a "heirarchical model"... the general concept is that say you are flipping coins... you know all coins are milled somewhat differently but behave similar to each other... so while they may take on different values we have some reason to believe that certain rates are more frequent than others. We then can flip several coins and use the knowledge from the other coins to provide sharper estimates of each of the coins by borrowing from a common structure (which we will often give a parameteric form... see beta distribution).

In this case, the concept is the same, but then its a question of linkage and calculation. I believe Rutter uses the linearized form as explained in the previous post but then entwines a hierarchy where he assumes each beta_i has a normal distribution with mean zero and variance sigma (but unknown). The idea is similar, you assume a common super-distribution (hyper-distribution, heirarchy, hyper-paramters, so on) and so you can use this to better couch the estimates... it serves in some case as a deflationary influence and as such is useful against extreme values. Rutter, as a Bayesian, also employs what is termed a "prior" on the unknown sigma... often this can be refered to as "prior belief", "subjective probability", etc. though there are forms that try to be objective. Even the heirarchy imposed here can be seen as a Bayesian application. The general idea is you start with a vague initial state and you use your learned knowledge (data) to refine that belief. Anyhoo... there are a lot of other parts that one can argue about or disagree with... but the important notion here is that he ties all the teams through a common distribution under the notion that hockey teams should exhibit some amount of spread and that spread can be calculated. He's also making a rough notional call to the "Stein Estimation" problem. Its an interesting phenomena in science (things that are true in 2-D in science are often not true in 3-D and beyond... I had a professor who said he had a professor who speculated that this is why our universe works)... so while one may argue that hockey teams don't necessarily come from a common pool there is some utility in doing so because it mitigates the overall degree of error.

As such its a bit technical of an approach but certainly something that I personally see as a reasonable model. Hopefully I've made somewhat of an understandable pitch without being too technical.

**Umileated** · 12-15-2011, 12:37 PM

Re: Statistics Models

Originally posted by LynahFan View Post

My MATLAB version is only about 30 lines of code for the actual computation, with maybe another 50 or so for reading in the game results, formatting the output for the screen, etc. Completely trivial.

Feeling generous?

**LynahFan** · 12-15-2011, 01:23 PM

Re: Statistics Models

Originally posted by Umileated View Post

Feeling generous?

Always, but it's on a different computer on another continent. I can post it when I have access again.

**FlagDUDE08** · 12-15-2011, 01:29 PM

Re: Statistics Models

Originally posted by LynahFan View Post

Always, but it's on a different computer on another continent. I can post it when I have access again.

SSH is your friend. Assuming that different computer is on, connected to the network, and you either know the IP or have a DynDNS setup on it...

**Patman** · 12-15-2011, 09:23 PM

Re: Statistics Models

Originally posted by LynahFan View Post

Always, but it's on a different computer on another continent. I can post it when I have access again.

the hardest part, as a statistician, is creating the records and reading them in from file... once they're in array or matrix form its quite simple...

Code:

SCRATCH CODE BASED ON R... MIGHT WORK WITH S+
#assume... rate.cur is current rating... initialized at array(100,n.teams)
#assume... rate.new is new rating
#assume... win.vctr is the vector of the count number of victories V_i
#assume... game.mtx is the matrix of games... N_ij

for(k in 1:n.iter){
     for(i in 1:n.teams){
     rate.new[i]=win.vctr[i]/sum(game.mtx[i,]/(rate.cur[i]+rate.cur))
     }
rate.cur=rate.new
}
#note... converges very fast... 100 or fewer... convergence rate term not usually needed
#could probably re-skin with a tapply function and make even smaller

**Umileated** · 12-15-2011, 09:30 PM

Re: Statistics Models

I just got caught up on the NHL 2012-13 realignment. It seems that the biggest complaint of the public is inter-division imbalance w.r.t. playoff structure. I'm inclined to think that having the first 2 rounds take place within your division is a good counter-perk to this grievance.

I'm thinking the imbalance is more an issue when it comes to draft seeding. With a more isolated schedule, W/L/O records are likely going to bring injustice to seeding. the emphasis of the SOS component of KRACH makes me think that it might be a suitable tool for more accurate seeding. But, having not played with the model firsthand yet, I wonder if the team schedules will be varied enough, given that your schedule will be just about identical to everyone else in your division and that every team matches up with every other team at least twice.

What do you guys think? Is there a tool that can do the job better? Will the NHL ever adopt something so "hard to understand?"

**Patman** · 12-15-2011, 09:40 PM

Re: Statistics Models

Originally posted by Numbers View Post

And, generally, I have another question that seems to belong here.

If college hockey wishes to choose its' NCAA field by game results only, and KRACH (can we please find a better name? And, how would a statistician really refer to this method?) does so as well as any, how do we deal with the following problem?

Currently, the top of the list is filled with CCHA teams. I mean filled. I don't really have a problem with that, but I have a feeling that the math works out that way because the number of non-conf games is small, so a couple of handfuls of good results elevate the entire league.

Again, I don't really have a problem with that - if you want to use results, then use results. What I wish was that there were a way to smear the benefit a little. Does anyone understand what I mean?

Maybe in short it would be like this: KRACH makes the non-conference results of all the teams in one's league to be very important, because of the high number of insulated games within conferences. How can we tone that down a little?

Thanks,
Numbers

Intra-conference effects are mostly a result of what is termed in statistics as "leverage"... this is a concept of linear modeling that says that certain observations, through their predictors (independent variables, etc... depends on which book you're using) will frequently serve to be over-influential on the model. This often comes when there's only certain situations which serve to suitably inform on the overall picture. Think of it sort of like the matrix concept of linear dependence. In this case, if there are results that are far from what is most likely its because of a few observations that have high influence... non-conference games are that type of thing because there's only so much information relating to the general position of the conference in relation to each other. This is also the classic "SEC" effect... it really depends on what's true... the opposite is the "Mount Union effect" or we could dub it the "Boise effect".

Anyhoo... how do you deal with the over-influence of a successful conference? Usually you can't. There are certain procedures (hierarchical models as discussed before) which could dampen the values and bring them closer to center. Bayes techniques may impose a different form of dampening but they're incredibly complicated for the sports ranking problem. The real question is "what do you assume" and can you express it in a tractable mathematical form.

The real answer to this... its a problem (from a math perspective) until the schedules become more open

**Patman** · 12-15-2011, 09:48 PM

Re: Statistics Models

Originally posted by Umileated View Post

I just got caught up on the NHL 2012-13 realignment. It seems that the biggest complaint of the public is inter-division imbalance w.r.t. playoff structure. I'm inclined to think that having the first 2 rounds take place within your division is a good counter-perk to this grievance.

I'm thinking the imbalance is more an issue when it comes to draft seeding. With a more isolated schedule, W/L/O records are likely going to bring injustice to seeding. the emphasis of the SOS component of KRACH makes me think that it might be a suitable tool for more accurate seeding. But, having not played with the model firsthand yet, I wonder if the team schedules will be varied enough, given that your schedule will be just about identical to everyone else in your division and that every team matches up with every other team at least twice.

What do you guys think? Is there a tool that can do the job better? Will the NHL ever adopt something so "hard to understand?"

Point blank... hockey is the most resistive to quantitative analysis of all the major sports. (Football > Soccer > Baseball > Basketball > ... > *tumble weed* > ... > Hockey, in that order) I would like that to change... for all I know they had people at the last NESSIS at Harvard. In spirit, you are correct, but you're working against the good ol' boys and waving this weird math stuff in their face.

That being said, my understanding and observations of computer rankings in sports with more regular schedules is that things follow closely but not identically to the standings (neglecting the loss point of the NHL)... you'll tend to get some flips and flops. As long as its not like an AHL schedule it'll be similar enough. Last I heard, is that the schedules aren't that isolated. Further, if you threw something like KRACH out there you may see people trying to game the system through the schedules or forsake any sane notion of scheduling.

Its a cute thing to think about... maybe slightly interesting to analyze... but establishing utility isn't clear. Remember, these are the same people who gave us the loss point because they didn't want anybody feeling bad.

**Umileated** · 12-15-2011, 10:34 PM

Re: Statistics Models

Originally posted by Patman View Post

Point blank... hockey is the most resistive to quantitative analysis of all the major sports. (Football > Soccer > Baseball > Basketball > ... > *tumble weed* > ... > Hockey, in that order) I would like that to change... for all I know they had people at the last NESSIS at Harvard. In spirit, you are correct, but you're working against the good ol' boys and waving this weird math stuff in their face.

That being said, my understanding and observations of computer rankings in sports with more regular schedules is that things follow closely but not identically to the standings (neglecting the loss point of the NHL)... you'll tend to get some flips and flops. As long as its not like an AHL schedule it'll be similar enough. Last I heard, is that the schedules aren't that isolated. Further, if you threw something like KRACH out there you may see people trying to game the system through the schedules or forsake any sane notion of scheduling.

Its a cute thing to think about... maybe slightly interesting to analyze... but establishing utility isn't clear. Remember, these are the same people who gave us the loss point because they didn't want anybody feeling bad.

My understanding from what I've read on the re-alignment is this:
4 divisions of either 7 or 8 teams.
A given team will play every non-division opponent exactly twice - once at home, and one away.
A given team will play every division opponent 6 times (with 7 teams in division), or an annual alternating 5-6 times (8 team divisions).

So the division/non-division schedule weighting will be either 36/46 or 38/44.

**FlagDUDE08** · 12-16-2011, 08:51 AM

Re: Statistics Models

Originally posted by Patman View Post

Point blank... hockey is the most resistive to quantitative analysis of all the major sports. (Football > Soccer > Baseball > Basketball > ... > *tumble weed* > ... > Hockey, in that order) I would like that to change... for all I know they had people at the last NESSIS at Harvard. In spirit, you are correct, but you're working against the good ol' boys and waving this weird math stuff in their face.

That being said, my understanding and observations of computer rankings in sports with more regular schedules is that things follow closely but not identically to the standings (neglecting the loss point of the NHL)... you'll tend to get some flips and flops. As long as its not like an AHL schedule it'll be similar enough. Last I heard, is that the schedules aren't that isolated. Further, if you threw something like KRACH out there you may see people trying to game the system through the schedules or forsake any sane notion of scheduling.

Its a cute thing to think about... maybe slightly interesting to analyze... but establishing utility isn't clear. Remember, these are the same people who gave us the loss point because they didn't want anybody feeling bad.

Hockey people complain about math... yet our determination of who makes the college national tournament is the only one that is completely objective...

**Umileated** · 12-16-2011, 10:23 AM

Re: Statistics Models

Just found this:

http://www.sportsclubstats.com/2011-...anada/NHL.html

**Patman** · 12-16-2011, 11:37 AM

Re: Statistics Models

Originally posted by FlagDUDE08 View Post

Hockey people complain about math... yet our determination of who makes the college national tournament is the only one that is completely objective...

yeah... curious that it evolved like that... part of me wonders if a lot of that has to do with the old east vs. west haziness back pre-widespread internet.

**Ralph Baer** · 12-16-2011, 11:48 AM

Re: Statistics Models

Originally posted by Patman View Post

yeah... curious that it evolved like that... part of me wonders if a lot of that has to do with the old east vs. west haziness back pre-widespread internet.

The only haziness was in the smoke-filled room.

**FlagDUDE08** · 12-16-2011, 11:53 AM

Re: Statistics Models

Originally posted by Ralph Baer View Post

The only haziness was in the smoke-filled room.

Yet they still use it in football and basketball... oh wait, drama sells....

Announcement

Statistics Models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment