Announcement

Collapse
No announcement yet.

Pairwise Analysts, Please Come Help Us 2013

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by JimDahl View Post
    I can simulate everything except across the conference tournament break. So, for now I simulate up to the end of the regular season, then once the tournaments are seeded I can simulate to the end of the conference tournaments.

    If there's interest in any particular output/results, or even some new type of interactivity so you can explore the data yourself, let me know and I'll see what I can do. I'll probably get my first post of the season up this week.

    Basically, there are three kinds of variably scheduled games in college hockey:
    * Best of three series
    * Tournaments where games are played by winners/losers of other games
    * Conference tournaments. After the first round, they all fall into the 2nd bullet above. In the play-in rounds they depend on standings and may involve ranking and reordering the bracket. Each is different, which is a little annoying; but the real killer is the CCHA which uses shootout results that I don't have in my games database.
    Jim, lets talk after I get something written up... Figuring out the playoffs is mostly a game of efficiently programming tie-breaking rules.

    I am not a computer scientist nor a true programmer (and I really get sick of needing to be a jack of all trades.)

    I think if the work can be parceled in some way then something useful can be done.
    BS UML '04, PhD UConn '09

    Jerseys I would like to have:
    Skating Friar Jersey
    AIC Yellowjacket Jersey w/ Yellowjacket logo on front
    UAF Jersey w/ Polar Bear on Front
    Army Black Knight logo jersey


    NCAA Men's Division 1 Simulation Primer

    Comment


    • #32
      Re: Pairwise Analysts, Please Come Help Us 2013

      Originally posted by Patman View Post
      Jim, lets talk after I get something written up... Figuring out the playoffs is mostly a game of efficiently programming tie-breaking rules.

      I am not a computer scientist nor a true programmer (and I really get sick of needing to be a jack of all trades.)

      I think if the work can be parceled in some way then something useful can be done.
      Jim and Pat,

      Thanks for all you fellows bring to discussion like this. Can I ask a few questions? I seem to remember from some discussion last year that it is impossible to do a full-up "odds" considering every possibility, because the number of games is too many, and the PWR would have to be calculated at the end of each combination of games.

      So, if that's right, it seems to me that Jim does what is called a series of "Monte Carlo" runs. Now, if I understand that right, that means he uses some kind of comparison between each team that gives odds of winning each game (KRACH would work nice here), and uses some random number generator or other programming tool to 'pick' the winner of each game, in accord with those odds. Do that for every game, add up the PWR at the end. Then repeat. Do it a bunch of times (1000? I think), and you can say the "odds" of UND getting a #1 seed are (insert number here in %). Does that seem right?

      If so, that is a nice tool to have available to those of us curious about this matter. Because, those odds will fill the rest of the games. And, in the end, we will see some things changing from the present PWR that we might not guess at ahead of time.

      For example, I suspect that Lowell will come out lower than they are right now, for reasons discussed above. BU perhaps, too. So, that "Monte Carlo" odds calculator will help us learn who is in a more advantaged situation, and who in a less advantaged situation, that we can see just from the PWR numbers now (I mean the # of comparisons won).

      Thanks again fellows. My natural interest in these things is why I named myself "Numbers."

      Comment


      • #33
        Re: Pairwise Analysts, Please Come Help Us 2013

        Originally posted by Numbers View Post
        So, if that's right, it seems to me that Jim does what is called a series of "Monte Carlo" runs. Now, if I understand that right, that means he uses some kind of comparison between each team that gives odds of winning each game (KRACH would work nice here), and uses some random number generator or other programming tool to 'pick' the winner of each game, in accord with those odds. Do that for every game, add up the PWR at the end. Then repeat. Do it a bunch of times (1000? I think), and you can say the "odds" of UND getting a #1 seed are (insert number here in %). Does that seem right?

        If so, that is a nice tool to have available to those of us curious about this matter. Because, those odds will fill the rest of the games. And, in the end, we will see some things changing from the present PWR that we might not guess at ahead of time.

        For example, I suspect that Lowell will come out lower than they are right now, for reasons discussed above. BU perhaps, too. So, that "Monte Carlo" odds calculator will help us learn who is in a more advantaged situation, and who in a less advantaged situation, that we can see just from the PWR numbers now (I mean the # of comparisons won).
        Yeah, that's essentially what I do. The products I currently produce from the monte carlos are area charts of the probabilities of each PWR ranking for each team based on how many games that team wins. e.g. this time of year, area charts like these; then as the tournament approaches, full tables of the remaining possible outcomes.
        Last edited by JimDahl; 01-15-2013, 08:29 AM.

        Comment


        • #34
          Re: Pairwise Analysts, Please Come Help Us 2013

          A Monte Carlo simulation of the remainder of the season (with PWR calculations at the end of each simulated season) is fairly simple to do, but there are, it seems to me, two big problems (and one small) with it. First, KRACH doesn't predict ties, or at least I don't see any simple way to get it to do so, and ties are a nontrivial component of PWR. In addition, the KRACH predictions should at least be modified for home ice, although that's fairly simple to do as long as you have a good way to make the estimate. Second, we would be fixing KRACH at today's level which really introduces a whole new set of uncertainties. It makes no sense (theoretically or in terms of computer effort) to dynamically update KRACH for pseudodata, but you will find yourself in the position of making predictions that you'd never make in real life, where teams with great records (in the pseudodata) are getting thumped by lesser teams because of lucky runs. This problem is in some ways philosophical rather than practical, but it's a big one.

          One other fairly sizeable problem is programming the playoff rules in every conference, which is really a pain.
          Last edited by goblue78; 01-15-2013, 08:49 AM.

          Comment


          • #35
            Re: Pairwise Analysts, Please Come Help Us 2013

            Originally posted by goblue78 View Post
            A Monte Carlo simulation of the remainder of the season (with PWR calculations at the end of each simulated season) is fairly simple to do, but there are, it seems to me, two big problems (and one small) with it. First, KRACH doesn't predict ties, or at least I don't see any simple way to get it to do so, and ties are a nontrivial component of PWR. In addition, the KRACH predictions should at least be modified for home ice, although that's fairly simple to do as long as you have a good way to make the estimate. Second, we would be fixing KRACH at today's level which really introduces a whole new set of uncertainties. It makes no sense (theoretically or in terms of computer effort) to dynamically update KRACH for pseudodata, but you will find yourself in the position of making predictions that you'd never make in real life, where teams with great records (in the pseudodata) are getting thumped by lesser teams because of lucky runs. This problem is in some ways philosophical rather than practical, but it's a big one.

            One other fairly sizeable problem is programming the playoff rules in every conference, which is really a pain.
            GoBlue,
            I totally understand what you are saying. I think there are a couple of things to mention here.

            1) What is the goal? Is it to get the 'best' possible prediction of the rest of the season? Some might have interest in that. If that is what we want, then, yes, all the things you write about are very valid concerns.

            2) However, my interest is not in 'predicting.' Past years have left me the impression that predicting is impossible. Someone will go on a run of Wins or Losses that we don't anticipate. One year recently, the #1 team as of the beginning of January failed to even make the field. So, my interest is more like this:

            2a) The PWR is really basically RPI, with a few tweaks thrown in. Let's predict that everyone's RPI basically stays the same. Then, what effect will the TUC records and the ComOpp records have on individual comparisons? Remember, if Team A has a better RPI than Team B, they can still lose the compare if they lose both TUC and ComOpp. Therefore, the TUC and ComOpp records are like 'hidden' features.

            2b) In some cases (Qunnipiac is a good example right now), a team's 2nd half schedule does not have as much Schedule Strength as the 1st half. That means that their record will need to be better to keep their RPI up to the same level. Can we make note of that, too?

            These are the kind of things I am interested in.

            When it is Conference Tourney time, then it becomes really important. It still is impossible to make full up predictions, but it becomes possible at that point to analyze specific comparisons. And, some of those shed real light on the final field.

            I hope all that is clear.

            And, it is why I am interested in what Jim and Pat have in mind, because those Monte Carlo runs are basically like, "Let's play the rest of the season and keep everyone's RPI the same. Then, how does the PWR come out?"
            Last edited by Numbers; 01-15-2013, 09:54 AM.

            Comment


            • #36
              Re: Pairwise Analysts, Please Come Help Us 2013

              By the way, everyone.

              I am having trouble connecting to Whelan's site this morning - the one that goes slack.net........


              Does anyone else know something about that?

              Thanks

              Comment


              • #37
                Re: Pairwise Analysts, Please Come Help Us 2013

                Originally posted by goblue78 View Post
                A Monte Carlo simulation of the remainder of the season (with PWR calculations at the end of each simulated season) is fairly simple to do, but there are, it seems to me, two big problems (and one small) with it. First, KRACH doesn't predict ties, or at least I don't see any simple way to get it to do so, and ties are a nontrivial component of PWR. In addition, the KRACH predictions should at least be modified for home ice, although that's fairly simple to do as long as you have a good way to make the estimate.
                There are adaptations of Bradley-Terry (which is known by college hockey fans as KRACH) that account for ties or home-ice advantage. I haven't seen them used in conjunction, but it shouldn't be overly difficult to account for. There was a paper published by three Taiwanese professors that details the additional factors used for either adaptation (PDF).

                For the home-ice advantage, the probability is shown by:
                Code:
                                      Θq1
                                   -------- if T1 is home
                                   Θq1 + q2
                P(T1 beats T2) = {
                                      q1
                                   -------- if T2 is home
                                   q1 + Θq2
                q1 = B-T Ranking of T1
                q2 = B-T Ranking of T2
                Θ>0 = strength of home-field advantage
                The log of the probability of each game is added and the KRACH and theta values are adjusted to maximize the sum (this can be done by Excel Solver or by someone with coding experience in a different language, such as R).

                The problem I have with this method is it seems to apply the same home-ice advantage to each team, which is obviously false. I thought I had seen another paper discussing ways to introduce home-field advantage to Bradley-Terry, but I can't find it right now.

                To include the possibility of ties in Bradley-Terry:
                Code:
                                    q1
                P(T1 beats T2) = --------
                                 q1 + Θq2
                
                                    q2
                P(T2 beats T1) = --------
                                 Θq1 + q2
                
                                  (Θ^2 - 1)(q1)(q2)
                P(T1 ties T2) = --------------------
                                (q1 + Θq2)(Θq1 + q2)
                q1 = B-T Ranking of T1
                q2 = B-T Ranking of T2
                Θ>1 = "threshold" parameter within which teams can be considered equal aka a tie occurs
                And, the same thing is done here, we take the log of the probability of the result of each game and adjust the KRACH and theta values to maximize their sum.

                I have been calculating team's ratings based on the tie-adjusted Bradley-Terry system and here are their rankings, for comparison (Θ = 1.43):
                Code:
                Boston College         600.23
                New Hampshire          586.81
                Quinnipiac             461.19
                Minnesota              459.13
                Notre Dame             417.67
                Boston University      387.60
                Denver                 322.94
                North Dakota           294.09
                Dartmouth              256.29
                Yale                   252.97
                Miami                  219.21
                Nebraska-Omaha         216.85
                UMass Lowell           209.39
                Western Michigan       208.80
                Minnesota State        195.70
                St. Cloud State        179.53
                Cornell                153.99
                Colgate                150.95
                Wisconsin              144.62
                Northern Michigan      141.57
                Providence             133.96
                Colorado College       131.42
                Lake Superior          128.97
                Robert Morris          122.56
                Minnesota Duluth       120.11
                Ohio State             120.06
                Union                  119.94
                Niagara                117.32
                Ferris State           108.67
                Alaska                 106.87
                Massachusetts          103.18
                Rensselaer              89.64
                Merrimack               87.57
                Michigan Tech           87.34
                Princeton               83.70
                Vermont                 81.27
                Bemidji State           79.79
                Holy Cross              77.24
                Michigan State          76.62
                Harvard                 76.16
                Bowling Green           72.58
                St. Lawrence            67.70
                Brown                   63.15
                Michigan                57.40
                Northeastern            56.93
                Mercyhurst              53.39
                Alaska Anchorage        52.53
                Maine                   49.83
                Connecticut             44.90
                Air Force               41.93
                Clarkson                39.21
                Canisius                38.93
                Bentley                 33.75
                RIT                     24.93
                Army                    24.91
                Penn State              22.51
                American International  12.25
                Alabama-Huntsville      7.03
                Sacred Heart            2.66
                Using these adjusted ratings, a team has a 17.6% chance of tying itself, 15.8% against a team with a KRACH two times its own, 13.1% against 3x's, etc. For example, a UMass v UAA game would no longer have an (approximately) 66.7 / 33.3% win split. It would now have a 57.9 / 26.3 / 15.8% W/L/T split for UMass.

                Surprisingly, the estimated tie rate is actually pretty accurate. There have been 83 ties so far this season and using the theta of 1.43, if you add up the probability of a tie in every game that has been played so far, the total is 83.30.
                Originally posted by goblue78 View Post
                Second, we would be fixing KRACH at today's level which really introduces a whole new set of uncertainties. It makes no sense (theoretically or in terms of computer effort) to dynamically update KRACH for pseudodata, but you will find yourself in the position of making predictions that you'd never make in real life, where teams with great records (in the pseudodata) are getting thumped by lesser teams because of lucky runs.
                Even if you are updating KRACH after every weekend or so with the pseudodata, it's still going to underpredict the upset runs that teams will go on while riding a hot goalie
                Originally posted by goblue78 View Post
                One other fairly sizeable problem is programming the playoff rules in every conference, which is really a pain.
                Yes it is. Especially since the AHA doesn't really publish their info, HEA's guidelines could be interpreted in three different ways, and the CCHA uses shootouts.

                I know that was way more math intensive than most people were looking for, but hopefully it was useful to some.
                Last edited by burgie12; 01-15-2013, 10:41 AM.
                Go Red!!

                National Champions: 1954, 1985, 201x

                Houston Field House, Cheel Arena, Agganis Arena, Magness Arena, Ritter Arena, Messa Rink, Matthews Arena, Von Braun Center, Lynah Rink, Starr Rink, Appleton Arena, Dwyer Arena, Buffalo State Ice Arena, Kelley Rink (also Verizon Center (DC), Herb Brooks Arena, Fenway Park (Frozen Fenway I), Times Union Center, DCU Center, Blue Cross Arena)

                Comment


                • #38
                  Re: Pairwise Analysts, Please Come Help Us 2013

                  burgie12: Nice work. As I said, the home ice multiplier is a fairly simple procedure. And the tie methodology is quite intuitive so long as the 1.43 is right. I'll need to think a minute or two about the best method of including both in a unified way, but that looks pretty easy.

                  numbers: if all you really want to know is whether RPI under- or overstates your chances, you mostly just need to look and see how the RPI column and PWR rating differ. For example, right now, they line up for the first 10 teams, and they line up for the first 14 except for a reversal of 11 and 12. That's not always the case by any means, but it shows how close RPI is to PWR, at least until every team has 10 TUC games. Overcoming an RPI disadvantage requires really both a COpp and TUC advantage in most cases where the teams haven't played head-to-head, or where the head-to-head results are inconclusive. While there is something to be said for BU's disadvantage going forward (which to date hasn't cost them a single spot, assuming their tie with NoDak is broken by RPI) it shows that even a massive TUC disadvantage will rarely cost you more than a place or two for a good RPI team. It can have a bigger effect for lower teams since the RPIs are much closer together -- but nobody cares because nobody other than conference winners with RPIs below about 14 or so are going to the dance anyway.

                  Comment


                  • #39
                    Originally posted by Numbers View Post
                    Jim and Pat,

                    Thanks for all you fellows bring to discussion like this. Can I ask a few questions? I seem to remember from some discussion last year that it is impossible to do a full-up "odds" considering every possibility, because the number of games is too many, and the PWR would have to be calculated at the end of each combination of games.

                    So, if that's right, it seems to me that Jim does what is called a series of "Monte Carlo" runs. Now, if I understand that right, that means he uses some kind of comparison between each team that gives odds of winning each game (KRACH would work nice here), and uses some random number generator or other programming tool to 'pick' the winner of each game, in accord with those odds. Do that for every game, add up the PWR at the end. Then repeat. Do it a bunch of times (1000? I think), and you can say the "odds" of UND getting a #1 seed are (insert number here in %). Does that seem right?

                    If so, that is a nice tool to have available to those of us curious about this matter. Because, those odds will fill the rest of the games. And, in the end, we will see some things changing from the present PWR that we might not guess at ahead of time.

                    For example, I suspect that Lowell will come out lower than they are right now, for reasons discussed above. BU perhaps, too. So, that "Monte Carlo" odds calculator will help us learn who is in a more advantaged situation, and who in a less advantaged situation, that we can see just from the PWR numbers now (I mean the # of comparisons won).

                    Thanks again fellows. My natural interest in these things is why I named myself "Numbers."
                    Monte Carlo is a method that is employed when more direct means of estimation is untenable. As such, since the PWR is a function of an entire season the resulting non-MC estimator would be ridiculous and not worth the time. As such, Monte Carlo is an estimation strategy.

                    Personally, I think it could shed some light one what does and doesn't influence the procedure. Give me a few nights. I started on a doc late last night after a uconn alum event. Given the way I work I'll have quite a bit of detail but not nearly as much as if it were a technical spec. I want to get things out there.

                    Until then, there is some measure of reading from formula but there is a bit of tea-leaf reading... I'd like to take the tea leaves out, myself.
                    BS UML '04, PhD UConn '09

                    Jerseys I would like to have:
                    Skating Friar Jersey
                    AIC Yellowjacket Jersey w/ Yellowjacket logo on front
                    UAF Jersey w/ Polar Bear on Front
                    Army Black Knight logo jersey


                    NCAA Men's Division 1 Simulation Primer

                    Comment


                    • #40
                      Originally posted by burgie12 View Post
                      There are adaptations of Bradley-Terry (which is known by college hockey fans as KRACH) that account for ties or home-ice advantage. I haven't seen them used in conjunction, but it shouldn't be overly difficult to account for. There was a paper published by three Taiwanese professors that details the additional factors used for either adaptation (PDF).

                      For the home-ice advantage, the probability is shown by:
                      Code:
                                            Θq1
                                         -------- if T1 is home
                                         Θq1 + q2
                      P(T1 beats T2) = {
                                            q1
                                         -------- if T2 is home
                                         q1 + Θq2
                      q1 = B-T Ranking of T1
                      q2 = B-T Ranking of T2
                      Θ>0 = strength of home-field advantage
                      The log of the probability of each game is added and the KRACH and theta values are adjusted to maximize the sum (this can be done by Excel Solver or by someone with coding experience in a different language, such as R).

                      The problem I have with this method is it seems to apply the same home-ice advantage to each team, which is obviously false. I thought I had seen another paper discussing ways to introduce home-field advantage to Bradley-Terry, but I can't find it right now.

                      To include the possibility of ties in Bradley-Terry:
                      Code:
                                          q1
                      P(T1 beats T2) = --------
                                       q1 + Θq2
                      
                                          q2
                      P(T2 beats T1) = --------
                                       Θq1 + q2
                      
                                        (Θ^2 - 1)(q1)(q2)
                      P(T1 ties T2) = --------------------
                                      (q1 + Θq2)(Θq1 + q2)
                      q1 = B-T Ranking of T1
                      q2 = B-T Ranking of T2
                      Θ>1 = "threshold" parameter within which teams can be considered equal aka a tie occurs
                      And, the same thing is done here, we take the log of the probability of the result of each game and adjust the KRACH and theta values to maximize their sum.

                      I have been calculating team's ratings based on the tie-adjusted Bradley-Terry system and here are their rankings, for comparison (Θ = 1.43):
                      Code:
                      Boston College         600.23
                      New Hampshire          586.81
                      Quinnipiac             461.19
                      Minnesota              459.13
                      Notre Dame             417.67
                      Boston University      387.60
                      Denver                 322.94
                      North Dakota           294.09
                      Dartmouth              256.29
                      Yale                   252.97
                      Miami                  219.21
                      Nebraska-Omaha         216.85
                      UMass Lowell           209.39
                      Western Michigan       208.80
                      Minnesota State        195.70
                      St. Cloud State        179.53
                      Cornell                153.99
                      Colgate                150.95
                      Wisconsin              144.62
                      Northern Michigan      141.57
                      Providence             133.96
                      Colorado College       131.42
                      Lake Superior          128.97
                      Robert Morris          122.56
                      Minnesota Duluth       120.11
                      Ohio State             120.06
                      Union                  119.94
                      Niagara                117.32
                      Ferris State           108.67
                      Alaska                 106.87
                      Massachusetts          103.18
                      Rensselaer              89.64
                      Merrimack               87.57
                      Michigan Tech           87.34
                      Princeton               83.70
                      Vermont                 81.27
                      Bemidji State           79.79
                      Holy Cross              77.24
                      Michigan State          76.62
                      Harvard                 76.16
                      Bowling Green           72.58
                      St. Lawrence            67.70
                      Brown                   63.15
                      Michigan                57.40
                      Northeastern            56.93
                      Mercyhurst              53.39
                      Alaska Anchorage        52.53
                      Maine                   49.83
                      Connecticut             44.90
                      Air Force               41.93
                      Clarkson                39.21
                      Canisius                38.93
                      Bentley                 33.75
                      RIT                     24.93
                      Army                    24.91
                      Penn State              22.51
                      American International  12.25
                      Alabama-Huntsville      7.03
                      Sacred Heart            2.66
                      Using these adjusted ratings, a team has a 17.6% chance of tying itself, 15.8% against a team with a KRACH two times its own, 13.1% against 3x's, etc. For example, a UMass v UAA game would no longer have an (approximately) 66.7 / 33.3% win split. It would now have a 57.9 / 26.3 / 15.8% W/L/T split for UMass.

                      Surprisingly, the estimated tie rate is actually pretty accurate. There have been 83 ties so far this season and using the theta of 1.43, if you add up the probability of a tie in every game that has been played so far, the total is 83.30.

                      Even if you are updating KRACH after every weekend or so with the pseudodata, it's still going to underpredict the upset runs that teams will go on while riding a hot goalie

                      Yes it is. Especially since the AHA doesn't really publish their info, HEA's guidelines could be interpreted in three different ways, and the CCHA uses shootouts.

                      I know that was way more math intensive than most people were looking for, but hopefully it was useful to some.
                      This topic in the stats community is a bit over bludgeoned. Rutter uses a Bayesian hierarchical model on the B-T formulation (or rather logistic regression form). One could choose other link functions... A former Mich stat student used normals, other links are available, I'd be curious about non-parametric links as the differences between link choice is somewhat esoteric (and I stumped a student in a phd defense on this despite her working on link fcns... It was her 2nd phd, I have no remorse.)

                      Aside from that you also have latent variable models where it envisions more of a "tug-of-war" with the tie being some zone in the middle (Albyn? Jones does this for soccer.). This is before you get to various score-based models.

                      Fact of the matter, options never end and I have a few ideas of my own I'd like to see. I am aware of one paper that used nonparametrics to address football prediction.

                      It doesn't really end.
                      BS UML '04, PhD UConn '09

                      Jerseys I would like to have:
                      Skating Friar Jersey
                      AIC Yellowjacket Jersey w/ Yellowjacket logo on front
                      UAF Jersey w/ Polar Bear on Front
                      Army Black Knight logo jersey


                      NCAA Men's Division 1 Simulation Primer

                      Comment


                      • #41
                        Re: Pairwise Analysts, Please Come Help Us 2013

                        Numbers: Maybe this an example of the sort of thing you're thinking about. Let me know if I'm on the right track. Look at Yale-NoDak. Currently, NoDak leads in RPI (by a smidge) and also has an edge in Common Opponents. Neither has 10 TUC games. This gives NoDak a 2-0 edge. But the Common Opponent edge will be determined by the two games left between NoDak and DU. Yale's number is done at 2 (1-0 vs DU, 1-0 vs. CC, 0-1 vs HC). NoDak has two more games with DU left. Their final ComOpp rating will be (2-0 vs. HC [1], 2-2 vs CC [0.5], plus the rating achieved from their current 1-0-1 with DU combined with the next two games) But unless they lose both of them, or tie one and lose one, they still have the common opponent matchup won. The important thing is that if you give them two DU losses, though, holding everything else constant (which is just for discussion's sake) they'd probably fall behind Yale in RPI anyway. And clearly the TUC comparison could go either way since both teams have a bunch of TUC games left, and both are around the .500 TUC level. So even though there are possibilties here, I suspect that RPI will win out in the end.

                        (I am reminded that this becomes more difficult by the fact that NoDak could face CC and/or DU again in the playoffs.) This makes it even more complicated.
                        Last edited by goblue78; 01-15-2013, 12:51 PM.

                        Comment


                        • #42
                          Re: Pairwise Analysts, Please Come Help Us 2013

                          Originally posted by burgie12 View Post
                          Surprisingly, the estimated tie rate is actually pretty accurate. There have been 83 ties so far this season and using the theta of 1.43, if you add up the probability of a tie in every game that has been played so far, the total is 83.30.
                          Why is this surprising? Didn't you "fit" the 1.43 parameter based on the fact that there have been, in fact, 83 ties? If you let your iteration run longer (to get more decimals on the 1.43), I would expect the prediction of the number of ties to be *exactly* 83.0 ties.
                          If you don't change the world today, how can it be any better tomorrow?

                          Comment


                          • #43
                            Re: Pairwise Analysts, Please Come Help Us 2013

                            Originally posted by goblue78 View Post
                            So even though there are possibilties here, I suspect that RPI will win out in the end.
                            I didn't know we had that good of a shot...

                            Comment


                            • #44
                              Re: Pairwise Analysts, Please Come Help Us 2013

                              Originally posted by goblue78 View Post
                              burgie12: Nice work. As I said, the home ice multiplier is a fairly simple procedure. And the tie methodology is quite intuitive so long as the 1.43 is right.
                              1.43 is right for this season, but it isn't a constant value. At the end of last season, it was 1.36. If you were to calculate the B-T Ratings for the EPL, it's 2.15.

                              Originally posted by goblue78 View Post
                              numbers: if all you really want to know is whether RPI under- or overstates your chances, you mostly just need to look and see how the RPI column and PWR rating differ. For example, right now, they line up for the first 10 teams, and they line up for the first 14 except for a reversal of 11 and 12.
                              That's only because there are so few teams that have played ten games against TUCs. That's a function of how early in the season it is, not the fact that RPI is actually accurate compared to the PWR. Yes, I know you know that, I'm just making sure that it's stated plainly.
                              Originally posted by goblue78 View Post
                              Overcoming an RPI disadvantage requires really both a COpp and TUC advantage in most cases every case where the teams haven't played head-to-head, or where the head-to-head results are inconclusive. While there is something to be said for BU's disadvantage going forward (which to date hasn't cost them a single spot, assuming their tie with NoDak is broken by RPI) because Dartmouth and North Dakota haven't played enough TUC games, not because their RPI is holding them up. As I said above, they will drop two comparison wins just by playing out the rest of the regular season and could easily drop four. it shows that even a massive TUC disadvantage will rarely cost you more than a place or two for a good RPI team.
                              The rest of my response is in-line with the rest of your post.
                              Originally posted by Patman View Post
                              This topic in the stats community is a bit over bludgeoned. Rutter uses a Bayesian hierarchical model on the B-T formulation (or rather logistic regression form). One could choose other link functions... A former Mich stat student used normals, other links are available, I'd be curious about non-parametric links as the differences between link choice is somewhat esoteric (and I stumped a student in a phd defense on this despite her working on link fcns... It was her 2nd phd, I have no remorse.)

                              Aside from that you also have latent variable models where it envisions more of a "tug-of-war" with the tie being some zone in the middle (Albyn? Jones does this for soccer.). This is before you get to various score-based models.

                              Fact of the matter, options never end and I have a few ideas of my own I'd like to see. I am aware of one paper that used nonparametrics to address football prediction.

                              It doesn't really end.
                              Not being a mathemtician... *woosh* Rutter uses the properties of a normal distribution to calculate his model. I like it, especially since it can provide a definitive difference between teams at opposite ends of the spectrum without pushing ratings into the thousands, but its ability to predict ties is a bit off.
                              Go Red!!

                              National Champions: 1954, 1985, 201x

                              Houston Field House, Cheel Arena, Agganis Arena, Magness Arena, Ritter Arena, Messa Rink, Matthews Arena, Von Braun Center, Lynah Rink, Starr Rink, Appleton Arena, Dwyer Arena, Buffalo State Ice Arena, Kelley Rink (also Verizon Center (DC), Herb Brooks Arena, Fenway Park (Frozen Fenway I), Times Union Center, DCU Center, Blue Cross Arena)

                              Comment


                              • #45
                                Re: Pairwise Analysts, Please Come Help Us 2013

                                Originally posted by LynahFan View Post
                                Why is this surprising? Didn't you "fit" the 1.43 parameter based on the fact that there have been, in fact, 83 ties? If you let your iteration run longer (to get more decimals on the 1.43), I would expect the prediction of the number of ties to be *exactly* 83.0 ties.
                                It's out to as many decimal places as Excel Solver will allow it (1.42611606730253 to be exact). Solver is maximizing the sum of the probabilities of the games (well, technically the logs of these probabilities) having occurred as they did given the ratings that the teams end up with. When I set Sum(P(ties)) to be equal to the number of ties as a constraint, it freaks out, saying that the solution is unfeasible. And, I'm surprised, because it was closer in its prediction in the number of ties than the Mease / Rutter Rankings (83.32 predicted), which I thought would be much more accurate in keeping the ties constraint reasonable.

                                ETA: Well, that and the fact that when I was originally calculating the B-T total number of predicted ties, I was doing it wrong and I was off by 15%. So, when I actually calculated it properly, to see it come so close to accurate, I was a bit surprised.
                                Last edited by burgie12; 01-15-2013, 02:17 PM. Reason: ETA
                                Go Red!!

                                National Champions: 1954, 1985, 201x

                                Houston Field House, Cheel Arena, Agganis Arena, Magness Arena, Ritter Arena, Messa Rink, Matthews Arena, Von Braun Center, Lynah Rink, Starr Rink, Appleton Arena, Dwyer Arena, Buffalo State Ice Arena, Kelley Rink (also Verizon Center (DC), Herb Brooks Arena, Fenway Park (Frozen Fenway I), Times Union Center, DCU Center, Blue Cross Arena)

                                Comment

                                Working...
                                X