A (longish) post-publication review of the most recent observational study in COVID19 patients treated with (hydroxy)chloroquine.

This is retrospective analysis of an international registry with >96,000 hospitalised patients in 6 continents. (1/n)
The major finding is that (hydroxy)chloroquine (HCQ/CQ) with or without a macrolide antibiotic is associated with ~30% excess mortality and very large increase in cardiac events (not specified what kind, e.g. torsades, JT or QRS prolongation). (2/n)
How did they get these results:

Primary analysis was Cox proportional hazards, adjusting for all the usual things (age, sex, BMI, co-morbidities, disease severity etc).
The patients had to have 1 positive test and received HCQ/CQ treatment within 48 hours of admission. (3/n)
First reaction: 30% excess mortality is a lot!! Given most hospitals are reporting 10-20% mortality in hospitalised patients, you would only need a few hundred to detect that kind of effect (RECOVERY study has recruited >10,000)
https://www.recoverytrial.net/  (4/n)
Obvious answer: confounding bias!

Why do some patients get HCQ/CQ and others not? This decision is made by the treating physician under "compassionate use" (due to no evidence that it works). So probably that means: sicker/worsening patient -> give drug just in case (5/n)
So the adjustment for disease severity is really crucial (probably much more than co-morbidity as already conditioned on "in hospital").

The authors use two, single-time points variables: oxygen saturation and the qSOFA score. In addition, they dichotomise them (why??) (6/n)
NEJM reported a very similar analysis
https://www.nejm.org/doi/full/10.1056/NEJMoa2012410

They see much greater severity in HCQ group, as measured by Pa02:Fi02. This measurement has much greater imbalance than SP02 (oxygen saturation) (7/n)
The NEJM paper sees no effect at all for HCQ on mortality!

So, what about qSOFA? This a severity score developed for sepsis and is proving a bad stratifier for COVID
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7167215/

And this can been seen by looking at Table 2, not huge differences in qSOFA/SPO2 (8/n)
Also, it's important to note that both qSOFA and SPO2 are single timepoint measurements. Disease progression (remember patients who got treatment up to 48 hours after admission were included) will likely play a role in decision to treat. This is not accounted for (9/n)
So a quick conclusion is that they have just inadequately adjusted for disease severity which is driving the treatment allocation.

And what is annoying is that they haven't made much effort to characterise what is driving the bias. Some ideas are as follows: (10/n)
1/ Is there dose effect? There is huge variation in daily doses given (mean for CQ is 735, SD is 300).

2/ Do they see JT or QRS prolongation (surely they must have ECG data?)

3/ How are proportions treated/not treated distributed across hospitals? (11/n)
This brings me to another point: there is hardly any detail about the participating hospitals, or even the participating countries!
The data come from the "Surgical Outcomes Collaborative" which appears to be a company ( https://surgicaloutcomes.com/ ). The second author is CEO (12/n)
There are in fact only 4 authors! And no acknowledgements. Given n=96,000, and 6 continents involved, this must be a large outlier in medical research.
Do the hospitals/countries know that their data are being used?
What about patient consent and ethics? (12/n)
Well, apparently "The data collection and analyses are
deemed exempt from ethics review". Errr... who deemed this exactly? God almighty himself?

Conclusion: limited adjustment for confounding; unknown data provenance; no data sharing statement; no consent/ethics. Nice! (13/n)
I would interested to know what the causal inference and epi people think about this, e.g. @EpiEllie @LucyStats @mlipsitch

This report has already had a big impact on randomised studies.
https://bit.ly/2ytVW6o 

Whether it actually means anything is very important (14/n)
@EricTopol has been very vocal about the interpretation of this study, saying "It’s one thing not to have benefit, but this shows distinct harm".

Does it? Can we reliably infer causality?

"Big data" does not get rid of systematic bias.
@TheLancet @richardhorton1 @MRMehraMD
You can follow @jwato_watson.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: