This story deserves to be told and read in its own right, of course. But I'm also sharing it because it reminds me why I'm uncomfortable with how a lot of data scientists are handling data from this pandemic.
A lot of us, myself included, have taken our data skills and curiosity and applied them to visualizing and sometimes forecasting the pandemic's course. No harm in that, and understandable that you want to try to do *something* useful in the face of all this dread.
I think it's much more problematic, though, when you start sharing that analysis and discussing its implications if you don't really understand the data-generating process and how that's shaping what you think you're seeing.
For example, a credulous look at the data would lead you to conclude that Brazil and Mexico are doing surprisingly well so far, given their population sizes and infrastructure. But, as this NYT analysis shows, that would be wrong.
These data deficiencies & distortions exist in every field I've ever studied, incl. armed conflict, atrocities, & protest. When I work on those topics, I know enough to know that it's difficult to make comparisons across cases & over time as a result, so I proceed w/much caution.
I have also interacted with people less familiar with data on those topics who pursue or suggest analyses that ignore those issues and wonder why I haven't done some obvious (to them) thing. It can be frustrating, and the results could be dangerously misleading.
So, I assume that epidemiologists must feel the same way about all these charts and forecasts and comparisons flying around right now, and I try not to add to the pile.
In sum, I guess my unsolicited advice to data-science colleagues would be this: by all means, explore and visualize and forecast away. But maybe don't broadcast the output, and leave the inferences to the domain experts.
You can follow @JayUlfelder.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: