Still Eeyore
New member
Re: Minnesota Gophers 2018-2019
This is one of the earliest and most basic findings of the entire sports analytics process. It underlies the Pythagorean method that Bill James developed in the early 1980s, though Allan Roth was probably doing some of the same things when he worked for Branch Rickey three decades earlier. It's been refined substantially since then, though the math has gotten more complicated with the dynamic non-linear models that have supplanted the original James formula.
There are two basic reasons why a goal ratio (non-linear) model is a better estimator than a goal differential (linear) model. The first is that, if you just compute the correlation between the estimator and winning percentage, you get a better R^2 and lower residuals with a ratio. The differences aren't huge, with an R^2 greater than .9 for even for the linear methods, but they are significant. The second reason is that linear models break down badly at the extremes. If a team has a goal differential greater than half of the average number of goals scored per team (with some variation on the exact value depending upon the specifics of the model), a goal differential model is going to predict that it wins more than 100% of its games. In practice, goal differential works almost as well as goal ratio if the team you're looking at has a winning percentage of between .300 and .700. So, it works fine for major league baseball. For women's college hockey, that's a big problem; note that the two teams were specifically looking at have winning percentages close to .900.
Beyond that, it ought to be obvious just from my original post that goal differential isn't to be trusted in this case. If that's what you go with, then you need to believe that Minnesota is not only better than Wisconsin this year, but a lot better. I don't think that matches the world as we've observed it.
Some references:
This uses soccer, but explains some of the concepts: https://thetopflight.com/2014/05/20/understanding-relationship-between-goals-points/
This shows more of the math, and is hockey related: https://www.hockeyanalytics.com/Research_files/Win_Probabilities.pdf
Edit: When I say, "The differences aren't huge, with an R^2 greater than .9 for even for the linear methods, but they are significant," I should emphasize that the correlation is that high in the context of the NHL or MLB, where goal differentials and winning percentages cluster much more in the middle zone where a linear model is more robust. There is every reason to think that the R^2 would be much lower if you used goal differential to look at a league, like Division I women's hockey, with more variance in the data.
You're doing the same thing that TTT did with his GRANT ratings: you make this statement but you don't provide any support for why this would be true. Grant expects me to believe that a team that wins 2-1 is superior to one that wins 5-3 or 7-4. If you want to tell me that this explanation is based on math or logic, then you have to support your ratio theory, not just throw it out there as if it is a given.
This is one of the earliest and most basic findings of the entire sports analytics process. It underlies the Pythagorean method that Bill James developed in the early 1980s, though Allan Roth was probably doing some of the same things when he worked for Branch Rickey three decades earlier. It's been refined substantially since then, though the math has gotten more complicated with the dynamic non-linear models that have supplanted the original James formula.
There are two basic reasons why a goal ratio (non-linear) model is a better estimator than a goal differential (linear) model. The first is that, if you just compute the correlation between the estimator and winning percentage, you get a better R^2 and lower residuals with a ratio. The differences aren't huge, with an R^2 greater than .9 for even for the linear methods, but they are significant. The second reason is that linear models break down badly at the extremes. If a team has a goal differential greater than half of the average number of goals scored per team (with some variation on the exact value depending upon the specifics of the model), a goal differential model is going to predict that it wins more than 100% of its games. In practice, goal differential works almost as well as goal ratio if the team you're looking at has a winning percentage of between .300 and .700. So, it works fine for major league baseball. For women's college hockey, that's a big problem; note that the two teams were specifically looking at have winning percentages close to .900.
Beyond that, it ought to be obvious just from my original post that goal differential isn't to be trusted in this case. If that's what you go with, then you need to believe that Minnesota is not only better than Wisconsin this year, but a lot better. I don't think that matches the world as we've observed it.
Some references:
This uses soccer, but explains some of the concepts: https://thetopflight.com/2014/05/20/understanding-relationship-between-goals-points/
This shows more of the math, and is hockey related: https://www.hockeyanalytics.com/Research_files/Win_Probabilities.pdf
Edit: When I say, "The differences aren't huge, with an R^2 greater than .9 for even for the linear methods, but they are significant," I should emphasize that the correlation is that high in the context of the NHL or MLB, where goal differentials and winning percentages cluster much more in the middle zone where a linear model is more robust. There is every reason to think that the R^2 would be much lower if you used goal differential to look at a league, like Division I women's hockey, with more variance in the data.
Last edited: