https://www.threads.net/@praxiscatmu...Dvxg1diu4l07cw
A long thread about why polls (and aggregators) are problematic and not a good indicator anymore.
A bit of info...back in the 90s polls would get a 20-30% response rate. Now they struggle to get 1% making them unreliable. (And cell phones are a large issue) Plus almost all polling is opt in which is biased and very unreliable.
I encourage everyone to read it.
Every decade or so there's a topic on the interwebs I actually know.
This guy is giving very good information, and I am glad to see it, however, as dark as it is it's actually
worse than he suggests because he leaves the impression that the parties' top secret in-house polling data is better. And that in turn leaves the impression that if we only could use that methodology we could fix the problem.
But they aren't "better" in the most important way, and we
can't fix it. The problem is fundamental not to the math of sampling but to an assumption which allows the math to be applied in the first place.
For the reasons he gives, there may be nebulous biases which render this polling with self-selected response literally worthless. They aren't less representative, they are un-knowably-representative. All the statistical analysis which underlies polling measures like margin of error is predicated on a sample which is drawn blind to any sort of non-normalizable non-representativeness. But at that level of non-response, and with all the extenuating circumstances that gate responses, we can never know if that condition applies, which means we are not justified in using the sample
at all.
Let me give a very clean example. Let's say you have a population where the only stable historical demographic differential voter preference is against gender. Every other characteristic in this very weird population is a wash. Let's say the population is 50% female and the sample is 25% female. As long as the sample size is sufficient under the rules of statistics, that's not a problem -- the sample is normalizable: you just count every sample woman twice. A sample can actually be
wildly out of whack and still be fine once normalized. The thing the sample needs to have is the right to call all crosstabs of respondent representative of their crosstab cluster.
So far, so good. That's valid polling.
Now let's add a complication: response rates are very small, and we have good reason to expect they correlate with unknown and thus non-normalizable characteristics. In effect, each respondent is "skewed," but this time
we don't know how -- so if it's bad we literally
can't fix it. That sort of a sample could actually be massively larger than the minimum size called for in sampling theory, yet still not be representative. In the worst case, it isn't inexact -- it's
worthless. You can't balance crosstabs with coefficients -- you literally can't use the results you got
at all. In the worst case, it is garbage data, but because 90% of poli sci and even stats grads only understand the math but not the theory that justifies using the math at all, they keep on reporting the results and just bump up the MOE. That's malpractice.
What they should say is something like:
"In poll X, Harris got 50% and Dump got 46% of the sample votes. If the sample is usable, the MOE is +/- 5%, making this a statistical tie. However, there is a
significant likelihood of unknown magnitude that the data is unreliable and should be ignored."
When I post the 538 aggregates, that's my assumption. Note that having more samples doesn't reduce that second volatility number because we have no way of knowing whether all the samples are biased the same way. We literally can never know. Even if on election day the numbers match exactly, it still might just have been a fluke.