Re: A new ranking systen for college hockey
There is a story, perhaps apocryphal, of a grade-school student who nearly predicted a perfect NCAA men's division one basketball tournament bracket. His method? He had the team mascots face off in a fantasy challenge and picked the winners that way (you know, a Jayhawk would peck out the eyes of a Husky but it would get trapped by a Pioneer, that kind of thing).
On a more serious note, I'm not sure if I can articulate my question in words instead of math...isn't it the case that most statistical models have a likelihood, a margin of error, and a time frame? Wouldn't the number of playoff games in the tournament make a difference?
in other words, wouldn't there be a significant difference in the effectiveness of your model in a best of seven series (all those references to Bill James earlier) compared to a one-and-done tournament? (like a basketball game in which a referee has trouble counting to five?)
Picking out an NCAA pool is a bit of knowledge and a bit of luck... the Facebook NCAA pools are great for that... some girl at some D1 school goes 32/32 is "oh, wow, I am so awesome"... now, doing this consistently over the course of the year is another matter. You know what the (generally) statistically optimal thing to do is? Pass the higher seeds nearly without question... every once in awhile you'll have a low-balled low seed (Winthrop a couple of years ago... College of Charleston, IIRC, 12 years back or so) but that #2 is more likely to be knocked out by somebody than that #1 nearly every time... note, more likely.... #11 George Mason and #11 Villanova speak to the reliability of this statement.
A few of you recall that there was a "guess the score" website that I'll bet got C&D on gambling grounds... a third of the season I was sitting in 2nd place out of everybody... and I am near to certain I would win. Why? Well, when most people guess the score they go with their hearts... pick upsets... 5-1, 2-0, 1-0... etc. See, while the certain scores were prioritized (shutouts) they turn out to be a lot less likely than the reward given for correctly predicting one... this wasn't totally the case. So, what I did for the first couple of weeks was pick the winner and guess 3-2... why? Because that's the most likely result... technically 3-3 is but with ties this makes this result a lot less likely... IIRC you may have gotten points for wins as well... wins and ties and score in the formulation means its better to go for wins and losses... who cares of your gut says 1-1... laws of large numbers means I can give every game a 3-2 guess... and I'll do better in the long haul. I think I had a Poisson model in place along the line of Robin Lock's CHODR... I had adjusted for overtime time (hell, i can code that quick now given data)... Robin Lock does not adjust for overtime length (IIRC) and negates ENGs from the model.
Too bad, I missed my opportunity on a hockey jersey.
Anyhow... I'll try to look into the model... my brain is so fried from work, while I'll want to read I'm not sure how much I really want to extract, carp, and snipe right now... because I am a detailed opinionated bastard.
---
All that being said, I believe Poisson score models have a strong predictive power in Ice Hockey... one could argue about the applicability of the Poisson, effects of extra-man, PP, etc., etc., etc. but its fairly useful as anything else. That being said, I tend to look at GF-GA differences for measures of strength... I can plug in certain Poisson parameters for goal scoring rates and the margin of separation that does not occur would amaze you... you can be demonstrably better and still lose a game 10-20% of the time... I've drawn entire NHL seasons based on the Poisson distribution and I've seen teams fluctuate by 20-30 points from draw to draw... the only consistent things were the top teams and even those sometimes missed the playoffs in the sim. So, for as much as it is predictive, on any given day things happen. Now, in college the talent is a lot more separable... but the behavior is the behavior.
edit: at this point, having done a lot of "small area estimation" at work... if I were to punt out a model... seeing that I understand Mr. Rutter's work a bit more (setting up a random effects model on the Bradley-Terry formulation)... I'd see if I could set up something similar on the score model.... of course I have no real desire to put myself through such a model build up when there's meager reward as a non-academic who wants to do things on the weekend.... it'd set up some sort of possibly correlating random-effects on the Offensive/Defensive parameters, etc. This all goes into my hypothetical dream world Pairwise Simulator that would sim out the season remaining flush with accurate tie-breaks.
edit #2: I have only skimmed the paper... I don't have the desire to cross all the t's and dot the i's... that being said... if there's a special "in conference" effect, the biggest issue I would have is that I would want to see the necessity for the term tested (classical fashion, frequentist paradigm likelihood ratio test)... if only because one does have a null model of sorts that can be applied.
I don't want to critique every part of your paper... I remember being interested in this since high school through undergrad and into grad school with some really iffy models used from other people's code... but the biggest issue with the BCS Score Based formulas is more that people are aware of them and how to manipulate them... think of some of the comments from Jurassic Park made by Goldblum's character... the method is also part of the system (this also affected quantitative investing as demonstrated by MIT's Andrew Lo)... if the football people were ignorant of the rankings and the advantages then there wouldn't be an outright problem... the flaw isn't completely in the methodology... but then again it is... harder to cheat a KRACH model during the game (scheduling, could be argued though). The problem in the end was the gaming of the system and not the impact of the model in itself.