Having one of those mornings where you realize that it's sometimes a lot more work to be a good scientist/analyst than a bad one.

(Explanation coming...)
Processing some source data that could just be tabulated and summarized with no one the wiser, thereby including some obviously impossible data points, e.g. dates that occurred before study began, double-entries, things of that nature.
Not exactly an original observation here, but when we talk about issues with stats/data analysis done by non-experts, this is often just as big of an issue (or a bigger issue) than whether they used one of those dumb flow diagrams to pick which analysis to do.
It would be *so* easy to just blow right past the meticulous double checking for duplicate entries, impossible dates, and go straight to running summary stats and models. And I'm guessing that's often what happens. Almost no way that's ever actually picked up later.
I'm not sure what to do about this other than tell people "do careful checks of the source data and cleaning and processing steps en route to creating your final analysis dataset." But please, if you analyze data, do this.
You can follow @ADAlthousePhD.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: