I have been corresponding with the authors of the well-known Santa Clara County COVID-19 preprint, and I am alarmed at their sloppy behavior. The confidence interval calculation in their preprint made demonstrable math errors - *not* just questionable methodological choices.
Everyone makes mistakes, but the record must be corrected ASAP. I emailed them on Saturday morning asking them to do so. In the last three days they haven't corrected anything yet, but a subset of them have released a new study without saying how they did the analysis this time.
Given the critically important and time-sensitive policy decisions being made now, if the authors are still pressing their case in the media using possibly incorrect calculations, then I feel I should make my criticism public too.
The errors are not debatable and can be seen in these two screenshots of the supplement: 0.0034, the standard error meant to measure uncertainty about prevalence pi, is not the square root of 0.039, and the variance of a binomial estimate of proportion depends on the sample size.
I can't redo the whole calculation myself because parts were not described anywhere, but I have low confidence that those parts were done correctly; if not, the corrected confidence interval for prevalence in Santa Clara County might well stretch all the way to include zero.
The authors said by email that they used a built-in Stata function and aren't sure themselves how the software used the input weights. I suspect they misapplied that function (too complicated to tweet why) but I don't know Stata well enough to be sure; it seems neither do they.
Trevor Hastie, Steve Goodman, @robtibshirani and I have been asking for more information about their analysis and their data. The authors have been gracious and reasonably responsive; they say they are redoing the stats in a second draft and will share data if possible.
I was satisfied to wait for the second draft, which is supposedly imminent and which the authors assured us will include a detailed description and code for a different confidence interval method based on bootstrapping.
However, I'm flabbergasted that yesterday afternoon, a group including several of the same authors circulated a press release describing new results for a similar study in LA County, without any accompanying technical report and before correcting the Santa Clara County preprint.
I can only surmise that the numbers in the LAC press release are either still using the same muddled calculations, or using an unspecified new method that hasn't been described publicly and is very different from the one they described in the SCC preprint.
Whichever is the case, before journalists publicize any more results from this group, they should know that the confidence intervals reported in both studies have no known statistical provenance as of now. The calculations are not questionable; they are either wrong or unknown.
What should we believe while we wait for a defensible analysis from the authors? In my opinion, the analysis suggested by @graduatedescent using Fisher's exact test should be treated as authoritative until the authors are ready to give a competing account. https://bit.ly/2XR0pdL 
@graduatedescent shows the SCC data are too noisy even to rule out the possibility that all the positives are false positives. Simply put, the difference between 50 heads in 3330 flips (SCC residents) and 2 heads in 371 flips (negative controls*) isn't statistically significant.
The authors have demographic information they have not yet shared, so it's conceivable a more refined analysis will pin down the prevalence more precisely. My point is that right now, as far as I know, no such analysis exists.
Note that beyond the formal statistical analysis there are other good reasons to be skeptical of the study, which have been pointed out publicly by @graduatedescent, @nataliexdean, @StatModeling, and many others.
Thanks also to @jjcherian for his perceptive tweets about the paper that piqued my interest in the first place. https://twitter.com/jjcherian/status/1251272333177880576?s=20
*The "negative controls" are blood samples from people who were known not to have been infected with COVID-19.

The supplement I refer to can be found at https://www.medrxiv.org/content/medrxiv/suppl/2020/04/17/2020.04.14.20062463.DC1/2020.04.14.20062463-1.pdf
You can follow @wfithian.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: