John t whelan ranking simulator

goblue78 · Nov 19, 2013

Re: John t whelan ranking simulator

KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
krach = 100*gamma/max(gamma)
do {
lastkrach = krach
krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
} while (mreldif(krach, lastkrach)>1e-10)

Here's KRACH adjusted for ties and home ice in STATA:

capture program drop bttieh
program define bttieh
args lnf g1 g2 theta hf
tempvar num denom
quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
quietly gen double `num' = `g1'+`hf' if $ML_y1==1
quietly replace `num' = `g2' if $ML_y1==0
quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
quietly replace `lnf' = `num' - `denom'
end
forvalues i=1/59 {
constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
}
ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
ml max,nocnsr
matrix adjkrach = e(b)

teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

Not nearly as neat, huh?

Numbers · Nov 19, 2013

Re: John t whelan ranking simulator

goblue78 said:
KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
krach = 100*gamma/max(gamma)
do {
lastkrach = krach
krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
} while (mreldif(krach, lastkrach)>1e-10)

Here's KRACH adjusted for ties and home ice in STATA:

capture program drop bttieh
program define bttieh
args lnf g1 g2 theta hf
tempvar num denom
quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
quietly gen double `num' = `g1'+`hf' if $ML_y1==1
quietly replace `num' = `g2' if $ML_y1==0
quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
quietly replace `lnf' = `num' - `denom'
end
forvalues i=1/59 {
constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
}
ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
ml max,nocnsr
matrix adjkrach = e(b)

teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

Not nearly as neat, huh?

Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.

Patman · Nov 19, 2013

Numbers said:
Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.

ac/(b+ac)

'c' is home factor

... How to solve the max likelihood is a different question. Usually I skip the iterative and go right to the logistic model formulation.

goblue78 · Nov 19, 2013

Re: John t whelan ranking simulator

Sure (sorta):
Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage

Patman · Nov 19, 2013

goblue78 said:
Sure (sorta):
Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage

These are the days I wish we could embed TeX because that's **** near unreadable

goblue78 · Nov 19, 2013

Re: John t whelan ranking simulator

What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.

Numbers · Nov 19, 2013

Re: John t whelan ranking simulator

I think I see what is happening. But let me ask a question: this expression {sqrt(exp(kh+h)*exp(kv). h means 'home' and v means 'visitor'? So, the probabilities are now "Result wanted"/"Sum of all possible results". Is that right?

And, (Prob of Home Loss)=(Prob of Vis Win) = exp(k2)/{Big ugly denominator}.

Now, the next question is: what does "pg" mean? And, I understand that ln(pg) is a function of all 4 variables.

Then, sum ln(pg) over all games nationwide, right?

Then, why do you want to maximize sum ln(pg)? And, I sort of get how that gives you 61 equations.

Then, a theoretical question: The 'theta' and 'h' that come out are 'empirical' values, right? So they are really "For this season, with this number of games, this is the home advantage and prob of a tie." And, then for predictive work, you assume the same h and theta apply to the next game, right?

goblue78 · Nov 19, 2013

Re: John t whelan ranking simulator

1. yeah, I switched from kv and kh to k1 and k2, but failed to do so consistently.
2. pg is the probability of seeing the result you saw in that particular game. so if it's a home win, it's whatever that probability was, etc, etc.
The pg1 is the probability of the result for game 1, pg2 is the probability of the result for game 2, etc.
So the the probability of seeing every result you saw is pg1*pg2*pg3*...*pgn, where n is the total number of games. That's a likelihood. now taking the log turns the multiplications into addition: log(pg1*pg2*...*pgn) = log(pg1) + ... +log(pgn), or just sum(lpg)
You maximize the sum of the logs to maximize the likelihood of seeing the particular results you actually saw.
theta and h are empirical values, just like k1...k59 are. So yes, you assume they are constant enough to use for predictive purposes.

Patman · Nov 19, 2013

goblue78 said:
What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.

Which is why I like working from the conceptual and then say the resulting ugly math is a consequence of the concept

Numbers · Nov 20, 2013

Re: John t whelan ranking simulator

goblue78 said:
What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.

Thanks Blue for the link. Like I say, I don't have lots of higher education like you all do, but given a few read-throughs, I can actually wrap my mind around that a little.

I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.

RHamilton · Nov 20, 2013

Re: John t whelan ranking simulator

I "finished" updates to my interpretation of the RPI. As a background, my implementation is written in PHP and primarily serves to power my exhaustive PWR predictor, which I won't be firing up until championship weekend, though it could possibly do some monte carlo a couple weekends earlier. I'm hoping to develop it further to aid prognosticators in finding corner-cases and understanding how things can shake out. Again, it's only really useful for the last weekend of league championship games.

But, might as well get the key parts done early. I've also included fairly in-depth breakdowns of how the RPI is formed for each team -- let me know if you spot any mistakes or would like to see any other components in further detail.

http://pwr.reillyhamilton.com/pwr.html

It agrees with JimDahl's RPI for all teams except Mankato (and there's only a .0002 difference there, seems to have been rounding on one of our parts when determining negative wins). Haven't taken a close look at why it differs from FlagDude's, as I'm not sure what "stage" of the calculations are listed in the GUI, ie are OWP and OOWP before or after negative impact wins have been removed?

It's also doing PWR, but I haven't scrutinized that closely, so I'm not confident it's correct. It was accurate (compared to USCHO, CHN, and JimDahl/SiouxSports) last year, so I imagine it shouldn't be far off this year, as the only changes were removal of the TUC comparison and .5000 RPI qualifier. It's also November...

Currently not updating automatically (but current through today's games); I may implement a caching layer and/or cron-job in a couple days.

By the way, I find all the different ways that we all think to be fascinating. I'm completely lost by some of the "actual math" going on here; I think much more iteratively. I do volunteer to help a scraping / data acquisition effort if it would be helpful to the greater simulator cause, as I love the idea and I'm already working on a bunch of collegehockeystats.net scraping for RPI TV's titles and graphics package.

goblue78 · Nov 20, 2013

Re: John t whelan ranking simulator

Numbers said:
I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.

Yep. If you maximize likelihood with theta and h constrained to 0 (so exp(h)=1), then you end up getting exactly the same result as the iterative technique. indeed, that's just equation (1) in Hunter, really, and the equivalent iterative technique in Games and Wins alone is Hunter's equation (3). So you can solve that problem either way: maximum likelihood or iteratively using the matrix equilibrium condition and you get the same result.

There is a proof that the iterative matrix technique always converges so long as the games played matrix is rich enough; the reason that you have to use the maximum likelihood technique where theta and h have to be estimated as well is that there is no such promise of convergence.... the iterative technique can blow up without ever solving. As Hunter has confirmed to me in correspondence, it might be possible to find a so-called majorizing minimizing expression that allows a matrix-like calculation on the the theta-h problem, but it would require someone to put in the work of finding that MM relationship -- it's a little like integration; there may be a trick that allows you to solve the problem, but it's not clear what trick you need. But the maximum likelhood technique I outlined above always works -- as long as there are no undefeated teams or winless teams.

FlagDUDE08 · Nov 20, 2013

Re: John t whelan ranking simulator

RHamilton said:
I "finished" updates to my interpretation of the RPI. As a background, my implementation is written in PHP and primarily serves to power my exhaustive PWR predictor, which I won't be firing up until championship weekend, though it could possibly do some monte carlo a couple weekends earlier. I'm hoping to develop it further to aid prognosticators in finding corner-cases and understanding how things can shake out. Again, it's only really useful for the last weekend of league championship games.

But, might as well get the key parts done early. I've also included fairly in-depth breakdowns of how the RPI is formed for each team -- let me know if you spot any mistakes or would like to see any other components in further detail.

http://pwr.reillyhamilton.com/pwr.html

It agrees with JimDahl's RPI for all teams except Mankato (and there's only a .0002 difference there, seems to have been rounding on one of our parts when determining negative wins). Haven't taken a close look at why it differs from FlagDude's, as I'm not sure what "stage" of the calculations are listed in the GUI, ie are OWP and OOWP before or after negative impact wins have been removed?

It's also doing PWR, but I haven't scrutinized that closely, so I'm not confident it's correct. It was accurate (compared to USCHO, CHN, and JimDahl/SiouxSports) last year, so I imagine it shouldn't be far off this year, as the only changes were removal of the TUC comparison and .5000 RPI qualifier. It's also November...

Currently not updating automatically (but current through today's games); I may implement a caching layer and/or cron-job in a couple days.

By the way, I find all the different ways that we all think to be fascinating. I'm completely lost by some of the "actual math" going on here; I think much more iteratively. I do volunteer to help a scraping / data acquisition effort if it would be helpful to the greater simulator cause, as I love the idea and I'm already working on a bunch of collegehockeystats.net scraping for RPI TV's titles and graphics package.

The stuff that is listed is after the games have been removed. Take a look in the command line (assuming you run it from there and not the exe) to determine which specific games have been removed. Do we still disagree there?

FlagDUDE08 · Nov 21, 2013

Re: John t whelan ranking simulator

Apologies for not previously releasing, as time was spent with FlagDUDETTE. Here's what we have, as of games ending 20 November:

5.00 Minnesota
4.75 St. Cloud State
4.50 Quinnipiac
4.25 Providence
4.00 Boston College
3.75 Michigan
3.50 Ferris State
3.25 Wisconsin
3.00 LSSU
2.75 Miami
2.50 Notre Dame
2.25 Bowling Green
2.00 Cornell
1.75 Clarkson
1.50 Northern Michigan
1.25 North Dakota
1.00 New Hampshire
0.75 Minnesota State Mankato
0.50 Union
0.25 UMASS Lowell

And the tournament field:

Minnesota
St. Cloud State
Providence
Quinnipiac

Boston College
Michigan
Ferris State
Wisconsin

Miami
LSSU
Notre Dame
Bowling Green

Cornell
Clarkson
North Dakota
AHA Champ (37 - Air Force)

Ralph Baer · Nov 21, 2013

Re: John t whelan ranking simulator

FlagDUDE08 said:
And the tournament field:

Minnesota
St. Cloud State
Providence
Quinnipiac

Boston College
Michigan
Ferris State
Wisconsin

Miami
LSSU
Notre Dame
Bowling Green

Cornell
Clarkson
North Dakota
AHA Champ (37 - Air Force)

Interesting that currently each of the five non-AHA leagues has three representatives.

The Exiled One · Nov 22, 2013

Re: John t whelan ranking simulator

FlagDUDE08 said:
And the tournament field:

North Dakota

Did they get rid of the Wisconsin rule? If not, North Dakota doesn't currently qualify for an at-large bid.

Numbers · Nov 22, 2013

Re: John t whelan ranking simulator

GoBlue,

I have another question about this KRACH with ties and home/away. The mathematical formulation uses exp(k1+h) for example. I assume that is because it makes the math come out - because you can take the logs and then it's addition rather than multiplication. However, it seems to me that the k(i) that comes out of that calculation is not the k(i) that comes out of Whelan's example of how to calculate KRACH here and on CHN.
It seems to me that the Ki all need to re-exponentiated to get the numbers that are equivalent to the KRACH ratings in Whelan's formulation.

Is that correct?

Thanks again.

FlagDUDE08 · Nov 22, 2013

Re: John t whelan ranking simulator

The Exiled One said:
Did they get rid of the Wisconsin rule? If not, North Dakota doesn't currently qualify for an at-large bid.

The what rule?

Numbers · Nov 22, 2013

Re: John t whelan ranking simulator

FlagDUDE08 said:
The what rule?

Wisconsin rule. That's the one prohibiting teams with less that .500 records from an at-large place in the tournament.

FlagDUDE08 · Nov 22, 2013

Re: John t whelan ranking simulator

Numbers said:
Wisconsin rule. That's the one prohibiting teams with less that .500 records from an at-large place in the tournament.

Does autobids have any impact on that? Alabama Huntsville made the tournament a few years ago with a sub-500 record, but they had the CHA autobid. Also, I seem to remember about 6 or 7 years ago UVM had a team that almost didn't make the Hockey East playoffs but ended making some noise in the national tournament.

John t whelan ranking simulator

New member

New member

Rodent of Unusual Size

New member

Rodent of Unusual Size

New member

New member

New member

Rodent of Unusual Size

New member

RPI Class of '12

New member

Banned

Banned

Let's Go 'Tute!

New member

New member

Banned

New member

Banned