Announcement

Collapse
No announcement yet.

John t whelan ranking simulator

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re: John t whelan ranking simulator

    KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
    krach = 100*gamma/max(gamma)
    do {
    lastkrach = krach
    krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
    } while (mreldif(krach, lastkrach)>1e-10)

    Here's KRACH adjusted for ties and home ice in STATA:

    capture program drop bttieh
    program define bttieh
    args lnf g1 g2 theta hf
    tempvar num denom
    quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
    quietly gen double `num' = `g1'+`hf' if $ML_y1==1
    quietly replace `num' = `g2' if $ML_y1==0
    quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
    quietly replace `lnf' = `num' - `denom'
    end
    forvalues i=1/59 {
    constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
    }
    ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
    ml max,nocnsr
    matrix adjkrach = e(b)

    teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

    Not nearly as neat, huh?
    Last edited by goblue78; 11-19-2013, 02:08 PM.

    Comment


    • Re: John t whelan ranking simulator

      Originally posted by goblue78 View Post
      KRACH code in MATA: [wvec is a 59x1 matrix of wins+.5ties for each team;games is a 59x59 matrix of games played]
      krach = 100*gamma/max(gamma)
      do {
      lastkrach = krach
      krach = wvec :/ rowsum(games :/ ((krach :* J(59,59,1)) :+ krach'))
      } while (mreldif(krach, lastkrach)>1e-10)

      Here's KRACH adjusted for ties and home ice in STATA:

      capture program drop bttieh
      program define bttieh
      args lnf g1 g2 theta hf
      tempvar num denom
      quietly gen double `denom' = ln(exp(`g1'+`hf') + exp(`g2') + exp(`theta')*((exp(`g1'+`hf')*exp(`g2'))^(.5)))
      quietly gen double `num' = `g1'+`hf' if $ML_y1==1
      quietly replace `num' = `g2' if $ML_y1==0
      quietly replace `num' = (`theta' + .5*(`g1'+`hf'+`g2')) if $ML_y1==.5
      quietly replace `lnf' = `num' - `denom'
      end
      forvalues i=1/59 {
      constraint define `i' _b[home:`i'.teamid] = _b[away:`i'.oppid]
      }
      ml model lf bttieh (home: twin = i.teamid,nocons) (away:i.oppid,nocons) (theta: ) (hf: home,nocons),constraints(2-59)
      ml max,nocnsr
      matrix adjkrach = e(b)

      teamid and oppid are index variables for the home team and road team (arbitrary in the case of neutral ice), home is an indicator variable for the home team not on neutral ice, twin is a vector giving the result {0,.5,1} for the team denoted by teamid

      Not nearly as neat, huh?
      Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.

      Comment


      • Originally posted by Numbers View Post
        Can you wwrite the home/away adjusted as a formula? Thanks. I always wonder if the adjustment factor comes out in the math or if you have to guess at it.
        ac/(b+ac)

        'c' is home factor

        ... How to solve the max likelihood is a different question. Usually I skip the iterative and go right to the logistic model formulation.
        BS UML '04, PhD UConn '09

        Jerseys I would like to have:
        Skating Friar Jersey
        AIC Yellowjacket Jersey w/ Yellowjacket logo on front
        UAF Jersey w/ Polar Bear on Front
        Army Black Knight logo jersey


        NCAA Men's Division 1 Simulation Primer

        Comment


        • Re: John t whelan ranking simulator

          Sure (sorta):
          Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
          for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

          Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
          Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
          Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
          Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
          Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
          Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

          Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

          Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

          maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

          Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

          Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage

          Comment


          • Originally posted by goblue78 View Post
            Sure (sorta):
            Let the home team have KRACH parameter k1 and the road team have KRACH parameter k2. We will also define a tie parameter (theta) and a home ice parameter h
            for simplicity in the solving, they all run theoretically from -infinity to infinity, but we exponentiate when calculating the probabilities. So:

            Pr(Home Win) = exp(k1 + h)/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
            Pr(Home Tie) = exp(theta)*sqrt(exp(kh + h)*exp(kv))/(exp(k1+h) + exp(k2) + exp(theta)*sqrt(exp(kh + h)*exp(kv))
            Pr(Home Loss) = 1 - Pr(Home Win) - Pr(Home Tie)
            Pr(Neutral Win) = exp(k1)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
            Pr(Neutral Tie) = exp(theta)*sqrt(exp(k1)*exp(k2)/(exp(k1) + exp(k2) + exp(theta)*sqrt(exp(k1)*exp(k2))
            Pr(Neutral Loss) = 1- Pr(Neutral Win - Pr (neutral Tie)

            Exactly one of these equations applies to each game, yielding pg for that game. Take the log of pg, making lpg. Note that lpg=f(k1,k2,theta,h)

            Now sum across all games we get sum(lpg) = f(k1,...k59,theta,h) for a total of 61 parameters

            maximize sum(lpg) with respect to these 61 parameters (ie 61 equations in 61 unknowns, with an an adding up constraint because ki is only determined up to a multiplicative constant) and you're done!

            Once you get the estimates of k1...k59, theta and h, predicta any particular game using the equations above.

            Unfortunately, there is no iterative technique to get you there when you add h, although one author has provided an iterative technique when all you want to do is estimate theta and you assume h=0, ie no generic home ice advantage
            These are the days I wish we could embed TeX because that's **** near unreadable
            BS UML '04, PhD UConn '09

            Jerseys I would like to have:
            Skating Friar Jersey
            AIC Yellowjacket Jersey w/ Yellowjacket logo on front
            UAF Jersey w/ Polar Bear on Front
            Army Black Knight logo jersey


            NCAA Men's Division 1 Simulation Primer

            Comment


            • Re: John t whelan ranking simulator

              What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

              For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
              Last edited by goblue78; 11-19-2013, 03:27 PM.

              Comment


              • Re: John t whelan ranking simulator

                I think I see what is happening. But let me ask a question: this expression {sqrt(exp(kh+h)*exp(kv). h means 'home' and v means 'visitor'? So, the probabilities are now "Result wanted"/"Sum of all possible results". Is that right?

                And, (Prob of Home Loss)=(Prob of Vis Win) = exp(k2)/{Big ugly denominator}.

                Now, the next question is: what does "pg" mean? And, I understand that ln(pg) is a function of all 4 variables.

                Then, sum ln(pg) over all games nationwide, right?

                Then, why do you want to maximize sum ln(pg)? And, I sort of get how that gives you 61 equations.

                Then, a theoretical question: The 'theta' and 'h' that come out are 'empirical' values, right? So they are really "For this season, with this number of games, this is the home advantage and prob of a tie." And, then for predictive work, you assume the same h and theta apply to the next game, right?

                Comment


                • Re: John t whelan ranking simulator

                  1. yeah, I switched from kv and kh to k1 and k2, but failed to do so consistently.
                  2. pg is the probability of seeing the result you saw in that particular game. so if it's a home win, it's whatever that probability was, etc, etc.
                  The pg1 is the probability of the result for game 1, pg2 is the probability of the result for game 2, etc.
                  So the the probability of seeing every result you saw is pg1*pg2*pg3*...*pgn, where n is the total number of games. That's a likelihood. now taking the log turns the multiplications into addition: log(pg1*pg2*...*pgn) = log(pg1) + ... +log(pgn), or just sum(lpg)
                  You maximize the sum of the logs to maximize the likelihood of seeing the particular results you actually saw.
                  theta and h are empirical values, just like k1...k59 are. So yes, you assume they are constant enough to use for predictive purposes.
                  Last edited by goblue78; 11-19-2013, 04:20 PM.

                  Comment


                  • Originally posted by goblue78 View Post
                    What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

                    For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
                    Which is why I like working from the conceptual and then say the resulting ugly math is a consequence of the concept
                    BS UML '04, PhD UConn '09

                    Jerseys I would like to have:
                    Skating Friar Jersey
                    AIC Yellowjacket Jersey w/ Yellowjacket logo on front
                    UAF Jersey w/ Polar Bear on Front
                    Army Black Knight logo jersey


                    NCAA Men's Division 1 Simulation Primer

                    Comment


                    • Re: John t whelan ranking simulator

                      Originally posted by goblue78 View Post
                      What Patman wrote as c is h in my formulation, but the math is the same. And yes, it affects things a lot. Adding theta (the parameter which creates the probability of a tie) makes it even worse.

                      For those willing to stick with it, the relevant reference here is David Hunter, "MM Algorithms for Generalized Bradley Terry Models," Annals of Statistics, 2004, Vol. 32, No. 1, pp. 384-406. Note that the relevant iterative algorithm on page 391 is actually wrong, as Hunter has confirmed with me. But with a bunch of algebra one can fix it. Anybody really interested can write me for details.
                      Thanks Blue for the link. Like I say, I don't have lots of higher education like you all do, but given a few read-throughs, I can actually wrap my mind around that a little.

                      I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.

                      Comment


                      • Re: John t whelan ranking simulator

                        I "finished" updates to my interpretation of the RPI. As a background, my implementation is written in PHP and primarily serves to power my exhaustive PWR predictor, which I won't be firing up until championship weekend, though it could possibly do some monte carlo a couple weekends earlier. I'm hoping to develop it further to aid prognosticators in finding corner-cases and understanding how things can shake out. Again, it's only really useful for the last weekend of league championship games.

                        But, might as well get the key parts done early. I've also included fairly in-depth breakdowns of how the RPI is formed for each team -- let me know if you spot any mistakes or would like to see any other components in further detail.

                        http://pwr.reillyhamilton.com/pwr.html

                        It agrees with JimDahl's RPI for all teams except Mankato (and there's only a .0002 difference there, seems to have been rounding on one of our parts when determining negative wins). Haven't taken a close look at why it differs from FlagDude's, as I'm not sure what "stage" of the calculations are listed in the GUI, ie are OWP and OOWP before or after negative impact wins have been removed?

                        It's also doing PWR, but I haven't scrutinized that closely, so I'm not confident it's correct. It was accurate (compared to USCHO, CHN, and JimDahl/SiouxSports) last year, so I imagine it shouldn't be far off this year, as the only changes were removal of the TUC comparison and .5000 RPI qualifier. It's also November...

                        Currently not updating automatically (but current through today's games); I may implement a caching layer and/or cron-job in a couple days.





                        By the way, I find all the different ways that we all think to be fascinating. I'm completely lost by some of the "actual math" going on here; I think much more iteratively. I do volunteer to help a scraping / data acquisition effort if it would be helpful to the greater simulator cause, as I love the idea and I'm already working on a bunch of collegehockeystats.net scraping for RPI TV's titles and graphics package.
                        RPI Class of 2012
                        Visit rpitv.org to watch almost every RPI Hockey home game LIVE, as well as a huge collection of on demand games from this season and seasons past, all for FREE!

                        Comment


                        • Re: John t whelan ranking simulator

                          Originally posted by Numbers View Post
                          I have one more theoretical question: The way KRACH is normally calculated - with the wins and games matrices and the iterations - could you instead do that a different way - as Hunter writes, except with h=1 and Theta=0 (no chance of a tie)? Thanks.
                          Yep. If you maximize likelihood with theta and h constrained to 0 (so exp(h)=1), then you end up getting exactly the same result as the iterative technique. indeed, that's just equation (1) in Hunter, really, and the equivalent iterative technique in Games and Wins alone is Hunter's equation (3). So you can solve that problem either way: maximum likelihood or iteratively using the matrix equilibrium condition and you get the same result.

                          There is a proof that the iterative matrix technique always converges so long as the games played matrix is rich enough; the reason that you have to use the maximum likelihood technique where theta and h have to be estimated as well is that there is no such promise of convergence.... the iterative technique can blow up without ever solving. As Hunter has confirmed to me in correspondence, it might be possible to find a so-called majorizing minimizing expression that allows a matrix-like calculation on the the theta-h problem, but it would require someone to put in the work of finding that MM relationship -- it's a little like integration; there may be a trick that allows you to solve the problem, but it's not clear what trick you need. But the maximum likelhood technique I outlined above always works -- as long as there are no undefeated teams or winless teams.

                          Comment


                          • Re: John t whelan ranking simulator

                            Originally posted by RHamilton View Post
                            I "finished" updates to my interpretation of the RPI. As a background, my implementation is written in PHP and primarily serves to power my exhaustive PWR predictor, which I won't be firing up until championship weekend, though it could possibly do some monte carlo a couple weekends earlier. I'm hoping to develop it further to aid prognosticators in finding corner-cases and understanding how things can shake out. Again, it's only really useful for the last weekend of league championship games.

                            But, might as well get the key parts done early. I've also included fairly in-depth breakdowns of how the RPI is formed for each team -- let me know if you spot any mistakes or would like to see any other components in further detail.

                            http://pwr.reillyhamilton.com/pwr.html

                            It agrees with JimDahl's RPI for all teams except Mankato (and there's only a .0002 difference there, seems to have been rounding on one of our parts when determining negative wins). Haven't taken a close look at why it differs from FlagDude's, as I'm not sure what "stage" of the calculations are listed in the GUI, ie are OWP and OOWP before or after negative impact wins have been removed?

                            It's also doing PWR, but I haven't scrutinized that closely, so I'm not confident it's correct. It was accurate (compared to USCHO, CHN, and JimDahl/SiouxSports) last year, so I imagine it shouldn't be far off this year, as the only changes were removal of the TUC comparison and .5000 RPI qualifier. It's also November...

                            Currently not updating automatically (but current through today's games); I may implement a caching layer and/or cron-job in a couple days.





                            By the way, I find all the different ways that we all think to be fascinating. I'm completely lost by some of the "actual math" going on here; I think much more iteratively. I do volunteer to help a scraping / data acquisition effort if it would be helpful to the greater simulator cause, as I love the idea and I'm already working on a bunch of collegehockeystats.net scraping for RPI TV's titles and graphics package.
                            The stuff that is listed is after the games have been removed. Take a look in the command line (assuming you run it from there and not the exe) to determine which specific games have been removed. Do we still disagree there?

                            Comment


                            • Re: John t whelan ranking simulator

                              Apologies for not previously releasing, as time was spent with FlagDUDETTE. Here's what we have, as of games ending 20 November:

                              5.00 Minnesota
                              4.75 St. Cloud State
                              4.50 Quinnipiac
                              4.25 Providence
                              4.00 Boston College
                              3.75 Michigan
                              3.50 Ferris State
                              3.25 Wisconsin
                              3.00 LSSU
                              2.75 Miami
                              2.50 Notre Dame
                              2.25 Bowling Green
                              2.00 Cornell
                              1.75 Clarkson
                              1.50 Northern Michigan
                              1.25 North Dakota
                              1.00 New Hampshire
                              0.75 Minnesota State Mankato
                              0.50 Union
                              0.25 UMASS Lowell

                              And the tournament field:

                              Minnesota
                              St. Cloud State
                              Providence
                              Quinnipiac

                              Boston College
                              Michigan
                              Ferris State
                              Wisconsin

                              Miami
                              LSSU
                              Notre Dame
                              Bowling Green

                              Cornell
                              Clarkson
                              North Dakota
                              AHA Champ (37 - Air Force)

                              Comment


                              • Re: John t whelan ranking simulator

                                Originally posted by FlagDUDE08 View Post
                                And the tournament field:

                                Minnesota
                                St. Cloud State
                                Providence
                                Quinnipiac

                                Boston College
                                Michigan
                                Ferris State
                                Wisconsin

                                Miami
                                LSSU
                                Notre Dame
                                Bowling Green

                                Cornell
                                Clarkson
                                North Dakota
                                AHA Champ (37 - Air Force)
                                Interesting that currently each of the five non-AHA leagues has three representatives.
                                sigpic

                                Let's Go 'Tute!

                                Maxed out at 2,147,483,647 at 10:00 AM EDT 9/17/07.

                                2012 Poser Of The Year

                                Comment

                                Working...
                                X