An example of reproducibility checking when data is restricted-access (or "reproducibility checking is hard, but never impossible").
I want to highlight a recent article in the @AEAjournals Economic Policy:
Leung, Pauline, and Christopher O'Leary. 2020. "Unemployment Insurance and Means-Tested Program Interactions: Evidence from Administrative Data." https://doi.org/10.1257/pol.20170262
This manuscript was part of our earliest reproducibility checks - assigned to us in April 2019. It is an excellent example of a well-documented, and as it turns out, reproducible article. Even if the data is confidential and restricted-access! (1 Caveat: see the End)
The data underlying the analysis is administrative data from various state agencies in Michigan, and it would have been difficult and time-consuming for us replicators to request access (not that we don't sometimes try - stay tuned!)
We reached out to the authors and asked whether they would be able to add a 3rd-party to their research protocol - either a member of our team or somebody at their institution @Upjohninstitute who was not involved in the original research.
The authors connected us with a research associate at their institution who had legal access to the data. Arms-length: the 3rd-party replicator followed only written instructions that are publicly available, as per our protocol https://aeadataeditor.github.io/aea-de-guidance/protocol-3rd-party-replication.html
In fact, all information and code came from the researchers' replication archive at openICPSR, which is now available at https://doi.org/10.3886/E109704V1. However, the replicator had access prior to publication of the archive, and what you see now is not what they had access to.
That's because the replicator identified and suggested multiple (small, but meaningful) edits to README and programs, to make the replication easier for subsequent replicators.
They also identified (minor) various numerical discrepancies, typically at the 3rd decimal place, that we tracked down to differences in the versions of the Stata package `rdrobust` https://sites.google.com/site/rdpackages/rdrobust
After discussions with `rdrobust` authors and manuscript authors, the replication package now contains the exact version of `rdrobust` used for the analysis, even though a more current (and corrected) version exists.
Overall, the report received from the 3rd party replicator was useful confirmation that the code that anybody can see would work well on the data that few people can see, and materially improved the replication package.
Take-away message: While we cannot do reproducibility checks on restricted-access data in a timely fashion for all papers that use restricted-access data, there are still many opportunities to do so, and we will always attempt to do so.
In fact, since that paper was reproduced, we have been able to make similar attempts at reproduction for many other papers. We have signed NDA or DUA for various datasets, to obtain copies ourselves, or worked with others. In the past month alone, about 10 such papers.
Not all attempts to access restricted data succeed, but expect to see more such verifications in the future!
The Caveat: We were still refining our various checks and guidance documents. Authors did excellent job citing various data sources (in online appendix https://www.aeaweb.org/doi/10.1257/pol.20170262.appx, but we did not ask for a data citation for the core confidential data. That would be different today.
You can follow @AeaData.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: