We're really locked into like 90% of our design, but at this stage there are two things that I really agonize over:
--weighting on self-reported race
--everything having to do with early voting data
Weighting on self-reported race is a ... 100% standard methodological decision that's employed by appx. 99% of pollsters.
But as I look at our data in NC/SC/FL/GA, where the voter reg form includes a question about race, I'm left with *a lot* of pause about using it everywhere
Here, I'm really comparing three things:
--the race our respondents gave when they registered to vote
--the race our respondents gave us in our survey in a census-like question
--race in the census / CPS-based estimates
Let's take GA, because it's recent and you can see the data on the GA SOS website yourself.
Our RVs by voter file race: W54, B30, H3, O5, Unknown 9
Our RVs by self-report race: W54, B28, H6, O6, DK5
So same people, but answering the question differently w/dif settings
Now let's compare to the CPS voter reg. estimate for RVs in 2018. Here, it's white, non-Hispanic 61, Black 33 (any race), Hisp 3.
Here's our RVs by self-report race again, but here dropping DK for a more direct comp:
White 60, Black 30 (any race for comp), Hisp 6.
This pattern exists everywhere in the South (SC, GA, NC, FL) where we can weight off of voter file-based race:
--More self-rpt Hispanic voters than Census (but right amount by voter file var)
--Fewer self-rpt Black voters than Census (but right amount by voter file var)
What does this mean in practice? Well, if we weighted on the *census* variable in GA, like most pollsters do or as we often would in states, we would have had Biden ahead.
But 32% of our LVs would have been black on the voter file, which is just factually wrong.
This raises (at least) a few possibilities:
--our poll is biased v. IRL pop. We get too many Hispanics, for ex.
--our design/format leads to bias v census. Here, we have right respondents but they respond differently in this setting
--CPS turnout way overshoots black turnout/reg
I can think of ev for all three, to some extent or another.
--We know design/questions effect answers (see voter file v. our poll!)
--The Hisp. question could be subject to acquiescence bias
--We know response rates vary by race
--We know the CPS is high on southern turnout
On the question of whether our *respondents* are biased, one test is to look at the respondents who said they were black on their voter registration form, but not in our survey. They still vote overwhelmingly Democratic, which to me says our sample is probably unbiased
And interestingly, the people who were white on the voter file, but *didn't* say they were white were even *more* for Trump than those who did call themselves white. So that also throws a wrench in the idea that we're getting the wrong people or something
I wasn't clear: it is out of the question. 30% of GA RVs are black when they registered. That's a *fact*, and you can look it up on GA SOS website. If we weighted on the Census, then our RVs would be 32% black when they registered. That means it's wrong
https://twitter.com/uconnfan2021/status/1319239233375186944?s=20
Now, extend this problem to the rest of the US, where voters don't say their race when they register.
Our voter file based race variables elsewhere are weaker: they're modeled, not self-reported
But if we weight to census targets, will we be biased toward Dems--and not know it?
In a state with a lot of Hispanic voters, it might be the exact opposite! If there's acquiescence bias toward Hispanic ID in our surveys, and we weight it *down* to the census, then we'd have too few Hispanics!
I've been struggling with this question all year, and my inclination, though it may be controversial, is to avoid weighting on self-reported race unless it just falls well beyond plausibility, including in states without self-reported race on the file.
If you look at our last US national survey, you'll note it's 11% black and 13% Hispanic. That's a result of this. If we had weighted to the census, it would have been 12.5 Black, 10.5 Hispanic or whatever, and Biden would have eeked up to a 10 point lead (though gaining <1 pt)
I am not sure this is the right choice. But I think the hardest, coldest fact I have is that we *would* have been biased toward too many voter-file black voters in SC/GA/NC/FL and too few Hispanics if we weighted on self-report. I'd expect that nationwide if we had the same data.
And indeed, that's what our poll shows: basically the same pattern and of a similar magnitude to our SC/NC/FL/GA data, when comparing our self-report to the census-based target.
What will be tough: if we run into a state where the gap is quite a bit larger.
And.... I'm only looking at partial data this AM, but I am looking at a state where the gap is a bit larger right now lol, and I'm hoping it shrinks when all of the interviews are completed.
Maybe I'll try and make some kind of self-report based estimate that models our post-VF weighted self-reported race in GA/FL/NC/SC as a function of the census-based estimates, and fall back on that as an alternative to just weighting on the census-based est
You can follow @Nate_Cohn.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: