Announcement

Collapse
No announcement yet.

John t whelan ranking simulator

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • goblue78
    replied
    Re: John t whelan ranking simulator

    Originally posted by Numbers View Post
    I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.
    Yep. If you maximize likelihood with theta and h constrained to 0 (so exp(h)=1), then you end up getting exactly the same result as the iterative technique. indeed, that's just equation (1) in Hunter, really, and the equivalent iterative technique in Games and Wins alone is Hunter's equation (3). So you can solve that problem either way: maximum likelihood or iteratively using the matrix equilibrium condition and you get the same result.

    There is a proof that the iterative matrix technique always converges so long as the games played matrix is rich enough; the reason that you have to use the maximum likelihood technique where theta and h have to be estimated as well is that there is no such promise of convergence.... the iterative technique can blow up without ever solving. As Hunter has confirmed to me in correspondence, it might be possible to find a so-called majorizing minimizing expression that allows a matrix-like calculation on the the theta-h problem, but it would require someone to put in the work of finding that MM relationship -- it's a little like integration; there may be a trick that allows you to solve the problem, but it's not clear what trick you need. But the maximum likelhood technique I outlined above always works -- as long as there are no undefeated teams or winless teams.

    Leave a comment:


  • RHamilton
    replied
    Re: John t whelan ranking simulator

    I "finished" updates to my interpretation of the RPI. As a background, my implementation is written in PHP and primarily serves to power my exhaustive PWR predictor, which I won't be firing up until championship weekend, though it could possibly do some monte carlo a couple weekends earlier. I'm hoping to develop it further to aid prognosticators in finding corner-cases and understanding how things can shake out. Again, it's only really useful for the last weekend of league championship games.

    But, might as well get the key parts done early. I've also included fairly in-depth breakdowns of how the RPI is formed for each team -- let me know if you spot any mistakes or would like to see any other components in further detail.

    http://pwr.reillyhamilton.com/pwr.html

    It agrees with JimDahl's RPI for all teams except Mankato (and there's only a .0002 difference there, seems to have been rounding on one of our parts when determining negative wins). Haven't taken a close look at why it differs from FlagDude's, as I'm not sure what "stage" of the calculations are listed in the GUI, ie are OWP and OOWP before or after negative impact wins have been removed?

    It's also doing PWR, but I haven't scrutinized that closely, so I'm not confident it's correct. It was accurate (compared to USCHO, CHN, and JimDahl/SiouxSports) last year, so I imagine it shouldn't be far off this year, as the only changes were removal of the TUC comparison and .5000 RPI qualifier. It's also November...

    Currently not updating automatically (but current through today's games); I may implement a caching layer and/or cron-job in a couple days.





    By the way, I find all the different ways that we all think to be fascinating. I'm completely lost by some of the "actual math" going on here; I think much more iteratively. I do volunteer to help a scraping / data acquisition effort if it would be helpful to the greater simulator cause, as I love the idea and I'm already working on a bunch of collegehockeystats.net scraping for RPI TV's titles and graphics package.

    Leave a comment:


  • Numbers
    replied
    Re: John t whelan ranking simulator

    Originally posted by goblue78 View Post
    What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

    For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
    Thanks Blue for the link. Like I say, I don't have lots of higher education like you all do, but given a few read-throughs, I can actually wrap my mind around that a little.

    I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.

    Leave a comment:


  • Patman
    replied
    Originally posted by goblue78 View Post
    What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

    For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
    Which is why I like working from the conceptual and then say the resulting ugly math is a consequence of the concept

    Leave a comment:


  • goblue78
    replied
    Re: John t whelan ranking simulator

    1. yeah, I switched from kv and kh to k1 and k2, but failed to do so consistently.
    2. pg is the probability of seeing the result you saw in that particular game. so if it's a home win, it's whatever that probability was, etc, etc.
    The pg1 is the probability of the result for game 1, pg2 is the probability of the result for game 2, etc.
    So the the probability of seeing every result you saw is pg1*pg2*pg3*...*pgn, where n is the total number of games. That's a likelihood. now taking the log turns the multiplications into addition: log(pg1*pg2*...*pgn) = log(pg1) + ... +log(pgn), or just sum(lpg)
    You maximize the sum of the logs to maximize the likelihood of seeing the particular results you actually saw.
    theta and h are empirical values, just like k1...k59 are. So yes, you assume they are constant enough to use for predictive purposes.
    Last edited by goblue78; 11-19-2013, 05:20 PM.

    Leave a comment:


  • Numbers
    replied
    Re: John t whelan ranking simulator

    I think I see what is happening. But let me ask a question: this expression {sqrt(exp(kh+h)*exp(kv). h means 'home' and v means 'visitor'? So, the probabilities are now "Result wanted"/"Sum of all possible results". Is that right?

    And, (Prob of Home Loss)=(Prob of Vis Win) = exp(k2)/{Big ugly denominator}.

    Now, the next question is: what does "pg" mean? And, I understand that ln(pg) is a function of all 4 variables.

    Then, sum ln(pg) over all games nationwide, right?

    Then, why do you want to maximize sum ln(pg)? And, I sort of get how that gives you 61 equations.

    Then, a theoretical question: The 'theta' and 'h' that come out are 'empirical' values, right? So they are really "For this season, with this number of games, this is the home advantage and prob of a tie." And, then for predictive work, you assume the same h and theta apply to the next game, right?

    Leave a comment:


  • goblue78
    replied
    Re: John t whelan ranking simulator

    What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

    For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
    Last edited by goblue78; 11-19-2013, 04:27 PM.

    Leave a comment:


  • Patman
    replied
    Originally posted by goblue78 View Post
    Sure (sorta):
    Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
    for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

    Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
    Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
    Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
    Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
    Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
    Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

    Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

    Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

    maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

    Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

    Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage
    These are the days I wish we could embed TeX because that's **** near unreadable

    Leave a comment:


  • goblue78
    replied
    Re: John t whelan ranking simulator

    Sure (sorta):
    Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
    for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

    Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
    Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
    Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
    Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
    Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
    Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

    Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

    Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

    maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

    Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

    Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage

    Leave a comment:


  • Patman
    replied
    Originally posted by Numbers View Post
    Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.
    ac/(b+ac)

    'c' is home factor

    ... How to solve the max likelihood is a different question. Usually I skip the iterative and go right to the logistic model formulation.

    Leave a comment:


  • Numbers
    replied
    Re: John t whelan ranking simulator

    Originally posted by goblue78 View Post
    KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
    krach = 100*gamma/max(gamma)
    do {
    lastkrach = krach
    krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
    } while (mreldif(krach, lastkrach)>1e-10)

    Here's KRACH adjusted for ties and home ice in STATA:

    capture program drop bttieh
    program define bttieh
    args lnf g1 g2 theta hf
    tempvar num denom
    quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
    quietly gen double `num' = `g1'+`hf' if $ML_y1==1
    quietly replace `num' = `g2' if $ML_y1==0
    quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
    quietly replace `lnf' = `num' - `denom'
    end
    forvalues i=1/59 {
    constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
    }
    ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
    ml max,nocnsr
    matrix adjkrach = e(b)

    teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

    Not nearly as neat, huh?
    Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.

    Leave a comment:


  • goblue78
    replied
    Re: John t whelan ranking simulator

    KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
    krach = 100*gamma/max(gamma)
    do {
    lastkrach = krach
    krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
    } while (mreldif(krach, lastkrach)>1e-10)

    Here's KRACH adjusted for ties and home ice in STATA:

    capture program drop bttieh
    program define bttieh
    args lnf g1 g2 theta hf
    tempvar num denom
    quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
    quietly gen double `num' = `g1'+`hf' if $ML_y1==1
    quietly replace `num' = `g2' if $ML_y1==0
    quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
    quietly replace `lnf' = `num' - `denom'
    end
    forvalues i=1/59 {
    constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
    }
    ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
    ml max,nocnsr
    matrix adjkrach = e(b)

    teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

    Not nearly as neat, huh?
    Last edited by goblue78; 11-19-2013, 03:08 PM.

    Leave a comment:


  • Patman
    replied
    Originally posted by Numbers View Post
    From Whelan's explanation on the CHN site:

    K(i) = V(i) / [∑(j)N(ij)/(K(i)+K(j))]

    K = Krach Rating
    V = # of Victories of that team regardless of who they faced
    n(ij) = Number of games against opponent j

    Obviously, since K(i) is on both sides, this is recursive. Plug in 100 for everyone on the right sides, recalculate all 59 Ratings. Plug those in again.... Continue until nothing changes anymore (usually 20 iterations gets close enough)
    Yeah, I only run 100 because its so **** fast... Reality 10 does it well enough

    Leave a comment:


  • Patman
    replied
    Originally posted by FlagDUDE08 View Post
    I am the same where it's impractical for me to do such a predictor. Mostly because I'd then have to change my entire format to include what league, whether a game was a league game or not, and so on. I'm sure I could do it, but I fear everything would just get larger and larger.
    For me its just more confusing... I don't have the ability to bare down on things anymore. The way I would do it would to have to define the type of tournament and types of results, etc.

    The more things get formalized the closer we will get.

    Honestly, if I could just leave myself to model development... Still too many steps ahead I'm afraid

    Leave a comment:


  • Numbers
    replied
    Re: John t whelan ranking simulator

    Originally posted by FlagDUDE08 View Post
    I have not tried KRACH, mostly because I do not know what the formula is.
    From Whelan's explanation on the CHN site:

    K(i) = V(i) / [∑(j)N(ij)/(K(i)+K(j))]

    K = Krach Rating
    V = # of Victories of that team regardless of who they faced
    n(ij) = Number of games against opponent j

    Obviously, since K(i) is on both sides, this is recursive. Plug in 100 for everyone on the right sides, recalculate all 59 Ratings. Plug those in again.... Continue until nothing changes anymore (usually 20 iterations gets close enough)

    Leave a comment:

Working...
X