You know what is kind of funny but swept under the rug a lot? Bob LaLonde is his 1986 AER influential evaluation of the NSW program which was an RCT where he dropped the experimental controls and replaced with nonexperimental controls from CPS and PSID, both representative 1/n
He evaluated numerous popular econometric approaches at the time. The actual returns to the program using the randomization finds an ATE of +$800. So he knows the ground truth — an estimator to be unbiased would need to find a similar number, right? 2/n
Well interestingly he actually tests mainly three standard approaches to this day are still used: DDD, DD, and regression. And guess what he finds for DD? The worst results. DD is garbage in his study, which is kind of interesting. 3/n
Its not that we should be surprised bc obviously the unobservables are so bad that any estimator would be severely biased. My point is different — it’s actually DD that is so bad. The single most popular quasi experimental design TODAY in other words. 4/n
Anyway, it’s kind of interesting bc Dehijia and Wahba show propensity score matching improves even conditioning on the covariate in the data. And you imagine how CIA is questioned and so we run to DD, and poop on propensity scores 5/n
Yet I don’t know if a lot of people are noticing the title of columns 6-9 — it’s DD he’s using. And DD does badly. But it’s propensity scores (not in this paper) that does well. It does well when DD did badly. But it’s bc we are worried about selection that we use DD not scores 6
Kind of funny right? Now @agoodmanbacon shows that twoway fixed effects estimation of a DD design does badly. And Callaway and @pedrohcgs have an estimator that does great. But guess what’s under the hood in that estimator? Propensity scores. Lol. 7/n
It’s all kind of funny when you think about it. But bc these papers are spaced out, and bc Petra Todd, Jeff Smith others critique Dehejia and Wahba, that given a broad skepticism about CIA in economists is weird given that same skepticism doesn’t always hold for DD. 8/n
I don’t know — it’s kind of funny. It’s a weird evolution where DD does bad, scores do good, credibility revolution, DD popular again using TWFE. I’m going to use @pedrohcgs DD estimator using the 2002 sample from DE’s Restat bc I don’t know. Did it also do good? 9/n
Maybe it also does good. I don’t know — I always noticed it, but I didn’t notice notice it. @dlmillimet am I getting it wrong? 10/n
You can follow @causalinf.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: