Announcement

Collapse
No announcement yet.

Statistics Models

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistics Models

    I recently started playing around with the CHODR model crafted by Robin Lock over at SLU. Here's his page on the topic:
    http://it.stlawu.edu/~chodr/

    I also have been tinkering with a variant that isolates either home or away data for a given team depending on the matchup.

    Who out there has worked with it before? What other models do you guys like? There has to be some other dork out there.
    Hit Somebody!

  • #2
    Re: Statistics Models

    CHODR is a Poisson regression... he doesn't consider any results that go into OT, and I believe he discounts ENGs. I've done home advantage for every team, but I don't really know that it really provides be a substantive activity. I've also used an off-set for time-of-match... I think at some point, however, I had to adjust it more accurately to the "first one who scores wins" paradigm... I forget exactly.

    To me, the Poisson model is more or less the best of the standard forms at explaining hockey. Univariate distributions rarely have any useful bivariate (correlative) forms... overdispersed poisson (negative binomial) are usually a waste of time. I believe that goal differential IS a better at explaining strength in hockey than wins and losses... granted score gives you more information... but I believe that GF-GA is a bit more predictive... but you can make pros and cons... pro: Boston v. Vancouver SCF... con: TB making the CF.

    The inherent mathematical problem of the ranking problem is the fixing contrast... sum to zero, whatever one calls it... that every team is compared in a 1 and -1 fashion. This means that its hard to use more exotic methods. I've been coming around to Rutter's application of the logit (KRACH) model... the idea that you have a hierarchy imposed that denotes some measure of closeness as a means of controlling against large estimates. Anything beyond the generalized linear model class is hard to deal with... hierarchy is a ***** to deal with when having to face issues of 1 and -1 contrasts. So, I have a few models I'd love to employ (I think Bayes is ideal for ranking sports teams in some ways... anything from the "objective" class)... I just don't believe the mathematics involved is favorable (in many plausible models).

    Dr. Joyce

    edit: I only partially apologize if I just went over everybody's heads... I've been thinking about this for awhile and i'm not really game for playing towards the layman right now... gotta go out and drink and stuff
    Last edited by Patman; 12-09-2011, 07:15 PM.
    BS UML '04, PhD UConn '09

    Jerseys I would like to have:
    Skating Friar Jersey
    AIC Yellowjacket Jersey w/ Yellowjacket logo on front
    UAF Jersey w/ Polar Bear on Front
    Army Black Knight logo jersey


    NCAA Men's Division 1 Simulation Primer

    Comment


    • #3
      Re: Statistics Models

      Originally posted by Patman View Post
      CHODR is a Poisson regression... he doesn't consider any results that go into OT, and I believe he discounts ENGs. I've done home advantage for every team, but I don't really know that it really provides be a substantive activity. I've also used an off-set for time-of-match... I think at some point, however, I had to adjust it more accurately to the "first one who scores wins" paradigm... I forget exactly.

      To me, the Poisson model is more or less the best of the standard forms at explaining hockey. Univariate distributions rarely have any useful bivariate (correlative) forms... overdispersed poisson (negative binomial) are usually a waste of time. I believe that goal differential IS a better at explaining strength in hockey than wins and losses... granted score gives you more information... but I believe that GF-GA is a bit more predictive... but you can make pros and cons... pro: Boston v. Vancouver SCF... con: TB making the CF.

      The inherent mathematical problem of the ranking problem is the fixing contrast... sum to zero, whatever one calls it... that every team is compared in a 1 and -1 fashion. This means that its hard to use more exotic methods. I've been coming around to Rutter's application of the logit (KRACH) model... the idea that you have a hierarchy imposed that denotes some measure of closeness as a means of controlling against large estimates. Anything beyond the generalized linear model class is hard to deal with... hierarchy is a ***** to deal with when having to face issues of 1 and -1 contrasts. So, I have a few models I'd love to employ (I think Bayes is ideal for ranking sports teams in some ways... anything from the "objective" class)... I just don't believe the mathematics involved is favorable (in many plausible models).

      Dr. Joyce

      edit: I only partially apologize if I just went over everybody's heads... I've been thinking about this for awhile and i'm not really game for playing towards the layman right now... gotta go out and drink and stuff

      Don't apologize for the technobabble - this was the sort of response I was hoping to receive.

      As far as the ENG, I believe I read somewhere in the site (or perhaps emails) that he'd like to discount them, but data on when they occur is "fuzzy."

      I'm planning on tinkering with the KRACH in the near future. I had set out originally to re-engineer KRACH, but got distracted along the way for a few reasons.
      For one, CHODR seemed far easier to program - I've set it up in a simple excel file. Secondly, I also like the appeal of measuring goals for/against rather than win records. CHODR seemed more immediately useful in looking at a specific upcoming matchup (read: just how much is UNH going to disappoint me tonight?) I constructed the home/away scenario model because the "home ice advantage" scalar that is used in traditional CHODR seemed a bit too generalized for me. the Home/Away model does away with this scalar, but also takes more time to converge on a useful prediction.

      Unfortunately, I was far too entertained by linear algebra when statistics came around so I'm making up for it now with this project. I also haven't taken advantage of any formal Matlab training - UNH finally created a course in the program in my last semester, and the work I did do with it never really summed up to a cohesive understanding for me (entirely my fault). This was one of the motivators for me in playing around here.

      What are your thoughts on some sort of goals for/against variant on the KRACH or at least a ranking model that considers such data? To me, the zero-sum nature of win/loss data makes any ranking model based upon it just a degree or two of interpretation removed from the win/loss record itself - particularly with leagues that place a strong emphasis on intra-conference schedules.
      Hit Somebody!

      Comment


      • #4
        Re: Statistics Models

        Hi... been away for awhile but just got back and saw your post. I agree with Patman that some sort of Poisson regression is probably the best way to go for a number of reasons. I also agree that handling OT is one of the trickiest problems under a Poisson model of goals. I experimented with this a lot last year but haven't had the time this year to follow through. Search back for some of my posts last year and PM me if you want more info about what I did. As another useful reference, take a look at hockeyanalytics.com and in particular his pdf called "Poisson Toolbox."

        Edit: Relevant thread here
        Last edited by goblue78; 12-10-2011, 10:15 AM.

        Comment


        • #5
          Re: Statistics Models

          Originally posted by Umileated View Post
          Don't apologize for the technobabble - this was the sort of response I was hoping to receive.

          As far as the ENG, I believe I read somewhere in the site (or perhaps emails) that he'd like to discount them, but data on when they occur is "fuzzy."

          I'm planning on tinkering with the KRACH in the near future. I had set out originally to re-engineer KRACH, but got distracted along the way for a few reasons.
          For one, CHODR seemed far easier to program - I've set it up in a simple excel file. Secondly, I also like the appeal of measuring goals for/against rather than win records. CHODR seemed more immediately useful in looking at a specific upcoming matchup (read: just how much is UNH going to disappoint me tonight?) I constructed the home/away scenario model because the "home ice advantage" scalar that is used in traditional CHODR seemed a bit too generalized for me. the Home/Away model does away with this scalar, but also takes more time to converge on a useful prediction.

          Unfortunately, I was far too entertained by linear algebra when statistics came around so I'm making up for it now with this project. I also haven't taken advantage of any formal Matlab training - UNH finally created a course in the program in my last semester, and the work I did do with it never really summed up to a cohesive understanding for me (entirely my fault). This was one of the motivators for me in playing around here.

          What are your thoughts on some sort of goals for/against variant on the KRACH or at least a ranking model that considers such data? To me, the zero-sum nature of win/loss data makes any ranking model based upon it just a degree or two of interpretation removed from the win/loss record itself - particularly with leagues that place a strong emphasis on intra-conference schedules.
          I think anything that tries to combine the two is well-intentioned but will inevitably fail. I suppose its a chicken and egg thing... did you win because you scored a lot and gave up less or did you score a lot and gave up less because you won?

          There are some more complicated things out there (I SWEAR I saw a Journal of Quantative Analysis of Sports article that tried some sort of semi-parametric score model for college football... extremely ambitious, IMO). Some of the BCS stuff, their non "win/loss" stuff (as in, not their current BCS method) employ what is called a "game-point function"... f(GF,GA)-->p in [0,1]. The idea is that the score imparts some information on strength of the win which implies what the underlying chance of winning was... i think its an ill-posed concept but it gets it away from the extreme [0,1] dynamic... but it does get you somewhere further than before... there's always a loss of information when projecting down to two figures... just as there is when projecting down to two counting numbers (afterall, a better analysis would consider the players actions and try to measure talent and fatigue... I've heard of some statistical models for tanks... but those are insanely implausible... all statistical models are a philosophic approximation at the underlying truth).

          KRACH is something that's very simple to implement... i think i've done it in C at some point... its nearly trivial in R (R is slower but is a wiz with multi-dimensional objects... still slower but easy to write)... if you intend to pursue statistics and/or data analytics then R (its free) is something you ought to start playing around with casually. If I had my druthers (and I don't... and won't... and we don't have the money) I'd hire a very strong R programmer (with a math or stat masters) tomorrow.

          I've been more interested in applications... rankings are "great"... but I still dream of something more like Baseball Prospectus playoff predictions. It'd take an ambitious amount of computation work (everybody has their own playoffs, tie-breakers, in-season tournaments, those rules, etc., etc., etc.) that, really, would only be doable by a college student with a good handle on computation and more time than sense (which in hindsight is better spent drinking and chasing girls). The information is fun because it can show you some simple things... but an ordinal ranking is almost the same kind of navel gazing but with more rigor behind it. Learning what the information implies is much more work but much more insightful. Quickest things I've learned... 3-2 scores are the most likely in hockey (right now, as long as GPG is generally below 3)... somebody ran a score prediction contest... every answer was either 3-2 or 4-1 despite a heavy bonus given for predicting shutouts.... i was in 2nd place only a month or two into the season before it was shutdown. I also suspect you can pull your goalie MUCH sooner (say like 5 minutes or more) and still have an advantage... but nobody's going to test that, this is only based on the Poisson model assumption, and doesn't assume that the defending (and leading team) won't get better at 6 v 5.
          Last edited by Patman; 12-10-2011, 10:03 AM.
          BS UML '04, PhD UConn '09

          Jerseys I would like to have:
          Skating Friar Jersey
          AIC Yellowjacket Jersey w/ Yellowjacket logo on front
          UAF Jersey w/ Polar Bear on Front
          Army Black Knight logo jersey


          NCAA Men's Division 1 Simulation Primer

          Comment


          • #6
            Re: Statistics Models

            Originally posted by goblue78 View Post
            Hi... been away for awhile but just got back and saw your post. I agree with Patman that some sort of Poisson regression is probably the best way to go for a number of reasons. I also agree that handling OT is one of the trickiest problems under a Poisson model of goals. I experimented with this a lot last year but haven't had the time this year to follow through. Search back for some of my posts last year and PM me if you want more info about what I did. As another useful reference, take a look at hockeyanalytics.com and in particular his pdf called "Poisson Toolbox."

            Edit: Relevant thread here
            I'm looking over your paper that started the older thread. The introduction sound a lot like what I wanted to try and get at. I'll have comments in a day or two, some of them might even wax academic.
            Hit Somebody!

            Comment


            • #7
              Re: Statistics Models

              lies, dam lies, and statistics.
              a legend and an out of work bum look a lot alike, daddy.

              Comment


              • #8
                Re: Statistics Models

                Originally posted by Red Cows View Post
                I'm not sure I agree with your 2nd sentence.

                When Bill James was still writing Baseball Abstract, he did an entire chapter one year on the significance of how you won, and, in particular, by what margin, that postulated that it was very indicative of what kind of team you are/have. Good teams win by large margins. Bad ones don't. That was the gist of it all. This same article completely pooh-poohed 1 run wins as pretty much meaningless (despite how much you always hear about them in MLB), over the course of baseball history, and he took a look at all of it to formulate that opinion.

                It looks like the writer here came to some of the same conclusions that James did, for college hockey, although I readily admit we are talking two entirely different sports here. The parallels to what he said in Baseball Abstract are interesting, though.
                Pulled this from goblue's relevant thread below. Anyone have an idea on the year of the article?
                Hit Somebody!

                Comment


                • #9
                  Re: Statistics Models

                  I suspect 82 or 83, which are the really ancient Abstracts that I no longer have...

                  Comment


                  • #10
                    Re: Statistics Models

                    Originally posted by Patman View Post
                    I also suspect you can pull your goalie MUCH sooner (say like 5 minutes or more) and still have an advantage... but nobody's going to test that,
                    You haven't watching RPI, have you?
                    sigpic

                    Let's Go 'Tute!

                    Maxed out at 2,147,483,647 at 10:00 AM EDT 9/17/07.

                    2012 Poser Of The Year

                    Comment


                    • #11
                      Re: Statistics Models

                      Originally posted by Patman View Post
                      KRACH is something that's very simple to implement... i think i've done it in C at some point... its nearly trivial in R (R is slower but is a wiz with multi-dimensional objects... still slower but easy to write)...
                      My MATLAB version is only about 30 lines of code for the actual computation, with maybe another 50 or so for reading in the game results, formatting the output for the screen, etc. Completely trivial.
                      If you don't change the world today, how can it be any better tomorrow?

                      Comment


                      • #12
                        Hi everybody. No college degree for this guy who likes numbers (that's a different story). But, I understand the KRACH model, and like it lots (it's interesting to implement for NHL, too). How do you adjust it for home/road? Because it would seem like you have to guess what should be the 'benefit' to being the home team. Or else, you compute a total average over all of hockey for that, and then use that. But, I don't think the advantage for the home team is the same in every barn?
                        Oh, and as far as code, I don't understand all that, but I have an NHL Excel file set up and am simply using
                        K(i) = v(i)*(SUM(Over j){1/(K(i) + K(j))} and then iterating manually.

                        Well, actually, I am using OpenOffice Calc rather than excel.

                        Thanks,
                        NUMBERS

                        Comment


                        • #13
                          Originally posted by Patman View Post
                          I've been coming around to Rutter's application of the logit (KRACH) model... the idea that you have a hierarchy imposed that denotes some measure of closeness as a means of controlling against large estimates. Anything beyond the generalized linear model class is hard to deal with... hierarchy is a ***** to deal with when having to face issues of 1 and -1 contrasts. So, I have a few models I'd love to employ (I think Bayes is ideal for ranking sports teams in some ways... anything from the "objective" class)... I just don't believe the mathematics involved is favorable (in many plausible models).

                          Dr. Joyce

                          edit: I only partially apologize if I just went over everybody's heads... I've been thinking about this for awhile and i'm not really game for playing towards the layman right now... gotta go out and drink and stuff
                          Patman,
                          What exactly is Rutter's application of the KRACH model?
                          Thanks.
                          Numbers

                          Comment


                          • #14
                            And, generally, I have another question that seems to belong here.

                            If college hockey wishes to choose its' NCAA field by game results only, and KRACH (can we please find a better name? And, how would a statistician really refer to this method?) does so as well as any, how do we deal with the following problem?

                            Currently, the top of the list is filled with CCHA teams. I mean filled. I don't really have a problem with that, but I have a feeling that the math works out that way because the number of non-conf games is small, so a couple of handfuls of good results elevate the entire league.

                            Again, I don't really have a problem with that - if you want to use results, then use results. What I wish was that there were a way to smear the benefit a little. Does anyone understand what I mean?

                            Maybe in short it would be like this: KRACH makes the non-conference results of all the teams in one's league to be very important, because of the high number of insulated games within conferences. How can we tone that down a little?

                            Thanks,
                            Numbers

                            Comment


                            • #15
                              Re: Statistics Models

                              The quick (and disappointing) answer is that you can't. That is to say, if you did it wouldn't look like the form for KRACH.

                              KRACH works nicely because its somewhat more of a fundamental form and that it can be re-stated through simple sums. Sadly, most of statistics does not build up this way... this is where we get into the generalized linear model. We relate the parameter onto the real line, relate a linear function through that transformation, and then calculate the maximum likelihood... but that's no longer as neat as p_i/(p_i+p_j).

                              KRACH can be re-stated as the regression form when we allow exp(c)/(1+exp(c))=p where c=beta_i-beta_j. A suitable re-establishment of the beta terms plus a nice constraint gives us the result. Run that maximum likelihood calculation (which we can do so easily by newton-raphson as the likelihood is convex in beta) and then the relationship between that and KRACH results is a*exp(beta_i).

                              If we wanted to adjust home and away what we'd do is then do c=beta_i-beta_j+home*I... I is 1 if the winning team is the home team and negative if they are the away team (and usually zero for venues considered to be neutral). Sadly, we cannot re-establish this cleanly as multiples of this or that... (I say this, but I'm not 100% confident... it'd have to be something like p_i/(p_i+h*p_j)). On the other hand, as a direct comparison of teams we can still use a*exp(beta_i).
                              BS UML '04, PhD UConn '09

                              Jerseys I would like to have:
                              Skating Friar Jersey
                              AIC Yellowjacket Jersey w/ Yellowjacket logo on front
                              UAF Jersey w/ Polar Bear on Front
                              Army Black Knight logo jersey


                              NCAA Men's Division 1 Simulation Primer

                              Comment

                              Working...
                              X