Before I jump in, just going to add a bit of credibility by introducing @akelleh :)
(which is who all of this content is stolen from)
- Chief Data Scientist for Research at @BarclaysIB
- Teaches Causal Inference for Data Science at @DSI_Columbia
/2
Confounding is something that is well understood by experienced Data Scientists.

So are the dangers of selection bias.
- If you're conditioning on something when training, then not conditioning on that in prod, bad things can happen

And they know correlation !=> causation
/3
But if you asked them them to understand how X affects Z using only observational data they may collect all possible confounding variables (Y), put into a linear model to explain how those are affecting Z.

Sound reasonable?
What if X -> Y -> Z?
=> the coefficient on X -> 0.
/4
So there's obviously a bit more to understanding how X affects Z.

The rest of this thread will focus on how exactly to do that robustly (or as robustly as possible with the observational data that you can get your hands on :) ).
/5
1. Understand Causality:
- You need to understand how the phenomenon above can be corrected for using the "Back-Door Adjustment"
- Here is a thread to help: https://twitter.com/parker_brydon/status/1209114528845316097 covering this great post “A Technical Primer On Causality” by @akelleh https://link.medium.com/8TvoYDLEE2 
/6
2. Understand How To Identify Causal Structure in Your Data:

Now you know how to correct for the variables that satisfy the "back-door criterion"
But you still need to find those variables

Thread: https://twitter.com/parker_brydon/status/1312505624983343104 (also covering a great post from @akelleh :) )
/7
3. Understand How Implement:

Now you just need to know how to leverage the "Back-Door Adjustment" in practice with those variables

Thread: https://twitter.com/parker_brydon/status/1209136670706143232 covering this great post "Causal Inference With pandas.DataFrames" by @akelleh
https://link.medium.com/64UeJw7KE2 
/8
@akelleh would also love your thoughts on whether I'm missing anything in any of these, that you think are critical for a Data Scientist to effectively understand a relation between X and Z using observational data.
You can follow @parker_brydon.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: