John t whelan ranking simulator

FlagDUDE08 replied

11-19-2013, 10:18 AM
Re: John t whelan ranking simulator

Originally posted by Patman View Post

I wish we were using the same code base... In season tournaments and tie breakers has been what has stopped me. But I will admit its the in season tournaments because I don't know how I want to wave it in and the rest

I am the same where it's impractical for me to do such a predictor. Mostly because I'd then have to change my entire format to include what league, whether a game was a league game or not, and so on. I'm sure I could do it, but I fear everything would just get larger and larger.
Leave a comment:
Patman replied

11-19-2013, 10:13 AM
Originally posted by goblue78 View Post

My previous code was in fact a simulator. It simulates a season (including in-season and post-season tournaments) and calculates the NCAA field in under 1/7th of a second per simulation. The one thing it didn't do was implement the tie-breaking rules for conference playoffs. It just randomly seeded teams with tied conference records. Anyway, when I get around to it, I'll have to implement the new pairwise. By the time I do that, all the in-season tournaments will be over, so the simulation should be really fast.

Basic KRACH code is dead-simple, but implementing a home-road differential and a tie probability requires a maximum likelihood routine. Still not difficult, but a lot of the elegance goes away.

Interesting. I had thought about this and made my code recursive, though it was quite rare in simulations after the season was up that you needed more than one pass. (As I recall, it was something like one season in 30.) I guess i don't understand how it could not be recursive. If you don't make it recursive, can't a team protest that there was a game left in the games that counted to calculate RPI that lowered its rating?

I wish we were using the same code base... In season tournaments and tie breakers has been what has stopped me. But I will admit its the in season tournaments because I don't know how I want to wave it in and the rest
Leave a comment:
goblue78 replied

11-19-2013, 09:52 AM
Re: John t whelan ranking simulator

Originally posted by Patman View Post

I still say the end goal is a simulator

My previous code was in fact a simulator. It simulates a season (including in-season and post-season tournaments) and calculates the NCAA field in under 1/7th of a second per simulation. The one thing it didn't do was implement the tie-breaking rules for conference playoffs. It just randomly seeded teams with tied conference records. Anyway, when I get around to it, I'll have to implement the new pairwise. By the time I do that, all the in-season tournaments will be over, so the simulation should be really fast.

Basic KRACH code is dead-simple, but implementing a home-road differential and a tie probability requires a maximum likelihood routine. Still not difficult, but a lot of the elegance goes away.

Originally posted by JimDahl View Post

A point of disagreement in the past has been whether this process is recursive. I've always been pretty convinced that they calculate RPI once, then drop all the games that make it go up if you drop them. Others have wondered if you then need to make another pass to see if the new, higher RPI, has pushed any new games into "adverse" territory (repeating until you don't find any). That matters a lot more this time of year than later, so I'm not sure we've ever had a conclusive test come tournament time.

Interesting. I had thought about this and made my code recursive, though it was quite rare in simulations after the season was up that you needed more than one pass. (As I recall, it was something like one season in 30.) I guess i don't understand how it could not be recursive. If you don't make it recursive, can't a team protest that there was a game left in the games that counted to calculate RPI that lowered its rating?
Leave a comment:
goblue78 replied

11-19-2013, 09:52 AM
Re: John t whelan ranking simulator

Originally posted by Patman View Post

I still say the end goal is a simulator

My previous code was in fact a simulator. It simulates a season (including in-season and post-season tournaments) and calculates the NCAA field in under 1/7th of a second per simulation. The one thing it didn't do was implement the tie-breaking rules for conference playoffs. It just randomly seeded teams with tied conference records. Anyway, when i get around to it, I'll have to implement the new pairwise. By the time I do that, all the in-season tournaments will be over, so the simulation should be really fast.

Basic KRACH code is dead-simple, but implementing a home-road differential and a tie probability requires a maximum likelihood routine. Still not diffocult, but a lot of the elegance goes away.

Originally posted by JimDahl View Post

A point of disagreement in the past has been whether this process is recursive. I've always been pretty convinced that they calculate RPI once, then drop all the games that make it go up if you drop them. Others have wondered if you then need to make another pass to see if the new, higher RPI, has pushed any new games into "adverse" territory (repeating until you don't find any). That matters a lot more this time of year than later, so I'm not sure we've ever had a conclusive test come tournament time.

Interesting. I had though about this and made my code recursive, though it was quite rare in simulations after the season was up that you needed more than one pass. (As I recall, it was something like one season in 30.) I guess i don't understand how it could not be recursive. If you don't make it recursive, can't a team protest that there was a game left in the games that counted to calculate RPI that lowered its rating?
Leave a comment:
JimDahl replied

11-19-2013, 09:51 AM
Re: John t whelan ranking simulator

Originally posted by Patman View Post

If we are talking about a major website. Unlikely. I mostly meant amongst ourselves. In theory if one wanted to grab direct data then webscraping might be the best... Though painful.

One alternative would be to ask collegehockeystats to do a dump file for us with the most relevant summary (game data) info. But I don't know under whose auspices they produce game information.

I think use of collegehockeystats is the key. I currently scrape a combination of sites, which gets the data sooner (USCHO and CHN often post earlier) and helps me catch errors (they do sometimes post bad scores), but does require some manual poking to fix things now and then. I don't think any of you would want to rely on that data (nor would I on yours) because you'd occasionally be waiting for me to notice, care about, and fix such a problem.

If you wanted it to be a truly automated, trusted source, I think a high quality scraper/translator for collegehockeystats into a machine-readable input file is the way to go.
Leave a comment:
FlagDUDE08 replied

11-19-2013, 09:43 AM
Re: John t whelan ranking simulator

Originally posted by LynahFan View Post

Absolutely. When I wrote my KRACH script (is there any hockey fan who hasn't at least tried this?) in MATLAB, it was <100 lines of code, and the majority of that just had to do with reading the input file and stuffing the information into the win matrix, as you say. The actual "calculation" itself is like 10 lines of code - that simplicity is one of the aesthetic beauties of KRACH (in addition to its functional beauty).

A standard input format would be great, but you'd probably need all of the major sites (USCHO, CHN, etc) to come together to agree on it, and I'm not sure they'd be motivated enough to bother.

KRACH, I would assume, is much easier to do than PWR. My SLOC is a few thousand, but the majority of this code is for display purposes; I would say there is only a couple hundred SLOC that actually involves math. I have not tried KRACH, mostly because I do not know what the formula is.

Standard input would be nice, but I agree that not many would. I know my input is entirely based upon the Google Docs spreadsheet that I put together over the summer that has the entire country's schedule.
Leave a comment:
Patman replied

11-19-2013, 09:32 AM
Originally posted by LynahFan View Post

Absolutely. When I wrote my KRACH script (is there any hockey fan who hasn't at least tried this?) in MATLAB, it was <100 lines of code, and the majority of that just had to do with reading the input file and stuffing the information into the win matrix, as you say. The actual "calculation" itself is like 10 lines of code - that simplicity is one of the aesthetic beauties of KRACH (in addition to its functional beauty).

A standard input format would be great, but you'd probably need all of the major sites (USCHO, CHN, etc) to come together to agree on it, and I'm not sure they'd be motivated enough to bother.

If we are talking about a major website. Unlikely. I mostly meant amongst ourselves. In theory if one wanted to grab direct data then webscraping might be the best... Though painful.

One alternative would be to ask collegehockeystats to do a dump file for us with the most relevant summary (game data) info. But I don't know under whose auspices they produce game information.
Leave a comment:
LynahFan replied

11-19-2013, 09:12 AM
Re: John t whelan ranking simulator

Originally posted by Patman View Post

I still say the end goal is a simulator

Edit: I am not using a modular executable language... I don't know the differences in the -oriented but what I do is use a thing that primarily uses C as a platform. I suppose its possible to treat it as a script but not without installing software.

For me KRACH is deadly simple once you purée the data into a win matrix and game matrix. I've posted the code for that before.

I'll say the big thing is if we can adopt a data input standard that will go a long way.

Absolutely. When I wrote my KRACH script (is there any hockey fan who hasn't at least tried this?) in MATLAB, it was <100 lines of code, and the majority of that just had to do with reading the input file and stuffing the information into the win matrix, as you say. The actual "calculation" itself is like 10 lines of code - that simplicity is one of the aesthetic beauties of KRACH (in addition to its functional beauty).

A standard input format would be great, but you'd probably need all of the major sites (USCHO, CHN, etc) to come together to agree on it, and I'm not sure they'd be motivated enough to bother.
Leave a comment:
Patman replied

11-19-2013, 08:41 AM
Re: John t whelan ranking simulator

Originally posted by JimDahl View Post

I agree, I interpreted OWP and OOWP as being straight (not with the home/away weightings). So, my OWP and OOWP calculations are essentially unchanged from previous years.

A point of disagreement in the past has been whether this process is recursive. I've always been pretty convinced that they calculate RPI once, then drop all the games that make it go up if you drop them. Others have wondered if you then need to make another pass to see if the new, higher RPI, has pushed any new games into "adverse" territory (repeating until you don't find any). That matters a lot more this time of year than later, so I'm not sure we've ever had a conclusive test come tournament time.

I would think what we've seen to date would imply it isn't... should be simple enough to test on previous data... if anything upsets the seeding apple cart then it'll show that it isn't recursive.
Leave a comment:
JimDahl replied

11-19-2013, 08:27 AM
Re: John t whelan ranking simulator

Originally posted by FlagDUDE08 View Post

Sorry for being unclear with the second point. I meant in terms of OWP and OOWP. The sources I have say not to take weighting into account, and to do a straight 1.0/0.0/0.5 for each game.

I agree, I interpreted OWP and OOWP as being straight (not with the home/away weightings). So, my OWP and OOWP calculations are essentially unchanged from previous years.

One thing I did notice with calculations, at least between RHamilton and myself, is that we had different games to remove for various teams. I wonder if this is the case for us.

A point of disagreement in the past has been whether this process is recursive. I've always been pretty convinced that they calculate RPI once, then drop all the games that make it go up if you drop them. Others have wondered if you then need to make another pass to see if the new, higher RPI, has pushed any new games into "adverse" territory (repeating until you don't find any). That matters a lot more this time of year than later, so I'm not sure we've ever had a conclusive test come tournament time.

Last edited by JimDahl; 11-19-2013, 08:29 AM.
Leave a comment:
FlagDUDE08 replied

11-19-2013, 08:14 AM
Re: John t whelan ranking simulator

Originally posted by JimDahl View Post

If the only differences are in teams with dropped wins, this is probably the difference. Our analysis of how to calculate Minnesota a page or two back was otherwise identical.

I'm not sure what this means.

I'm assuming this is unchanged from the past -- OOWP is simply the average of the OWP's for each opponent (so does not include games against each opponent).

I'm again assuming unchanged from the past -- average of records.

Sorry for being unclear with the second point. I meant in terms of OWP and OOWP. The sources I have say not to take weighting into account, and to do a straight 1.0/0.0/0.5 for each game.

One thing I did notice with calculations, at least between RHamilton and myself, is that we had different games to remove for various teams. I wonder if this is the case for us.
Leave a comment:
JimDahl replied

11-19-2013, 07:56 AM
Re: John t whelan ranking simulator

Originally posted by FlagDUDE08 View Post

As for Jim's and my differences, we've already discussed a disagreement in Quality Wins Bonus.

If the only differences are in teams with dropped wins, this is probably the difference. Our analysis of how to calculate Minnesota a page or two back was otherwise identical.

I will also ask Jim if he is taking weighting into account on RatingsPI

I'm not sure what this means.

One other factor that could be making a difference is how OOWP is calculated, specifically whether games involving the team in question should be counted. Some sources say yes, others say no.

I'm assuming this is unchanged from the past -- OOWP is simply the average of the OWP's for each opponent (so does not include games against each opponent).

One other thing that could cause issue is specifically how OWP and OOWP is calculated. Do you take a cumulative record, or do you take the average of each team's records?

I'm again assuming unchanged from the past -- average of records.
Leave a comment:
FlagDUDE08 replied

11-19-2013, 06:20 AM
Re: John t whelan ranking simulator

Originally posted by Numbers View Post

Flag, Jim Dahl,

Just a note. Your RPI tables do not agree with each other, and neither agrees with what is posted on USCHO or CHN (those 2 have the same).

USCHO and CHN aren't going to agree; neither of those sites have correctly taken into account the new calculation rules, and I know USCHO will not until January. As for Jim's and my differences, we've already discussed a disagreement in Quality Wins Bonus. I will also ask Jim if he is taking weighting into account on RatingsPI, because it shouldn't be happening. One other factor that could be making a difference is how OOWP is calculated, specifically whether games involving the team in question should be counted. Some sources say yes, others say no. When I was using USCHO as a model very early in the season with teams that had a RatingsPI that was easy to calculate, I found the answer to be no.

One other thing that could cause issue is specifically how OWP and OOWP is calculated. Do you take a cumulative record, or do you take the average of each team's records?
Leave a comment:
Numbers replied

11-18-2013, 11:23 PM
Re: John t whelan ranking simulator

Flag, Jim Dahl,

Just a note. Your RPI tables do not agree with each other, and neither agrees with what is posted on USCHO or CHN (those 2 have the same).
Leave a comment:
Patman replied

11-18-2013, 10:44 PM
Originally posted by goblue78 View Post

This is all interesting, and thanks, Flagdude, for your service here, but I'm spiritually with patman. Sometime around New Years' I'll reprogram my previous lightning fast Stata/Mata code and do this for real. (It also does KRACH and home/road/tie adjusted KRACH.) But I like the Java App because at least it will tell me if I'm matching your results. As I've done before, I'm happy to make that code available to everyone similarly obsessed. And making the results available on a dynamic basis is a real service, FD. Thanks.

I still say the end goal is a simulator

Edit: I am not using a modular executable language... I don't know the differences in the -oriented but what I do is use a thing that primarily uses C as a platform. I suppose its possible to treat it as a script but not without installing software.

For me KRACH is deadly simple once you purée the data into a win matrix and game matrix. I've posted the code for that before.

I'll say the big thing is if we can adopt a data input standard that will go a long way.

Last edited by Patman; 11-18-2013, 10:51 PM.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: