Thread by @parker_brydon, Causality For Data ScientistsThis will be a thread highlighting when and how [...]

Causality For Data Scientists

This will be a thread highlighting when and how Data Scientists can effectively leverage #causality.

Inspired by and directly referencing @akelleh& #39;s excellent content on Causal #DataScience.
https://medium.com/causal-data-science/causal-data-science-721ed63a4027

/1">https://medium.com/causal-da...

Causal Data Science

I started a series of posts aimed at helping people learn about causality in data science (and science in general), and wanted to compile…

https://medium.com/causal-data-science/causal-data-science-721ed63a4027

Before I jump in, just going to add a bit of credibility by introducing @akelleh :)
(which is who all of this content is stolen from)
- Chief Data Scientist for Research at @BarclaysIB
- Teaches Causal Inference for Data Science at @DSI_Columbia
/2

Confounding is something that is well understood by experienced Data Scientists.

So are the dangers of selection bias.
- If you& #39;re conditioning on something when training, then not conditioning on that in prod, bad things can happen

And they know correlation !=> causation
/3

But if you asked them them to understand how X affects Z using only observational data they may collect all possible confounding variables (Y), put into a linear model to explain how those are affecting Z.

Sound reasonable?
What if X -> Y -> Z?
=> the coefficient on X -> 0.
/4

So there& #39;s obviously a bit more to understanding how X affects Z.

The rest of this thread will focus on how exactly to do that robustly (or as robustly as possible with the observational data that you can get your hands on :) ).
/5

1. Understand Causality:
- You need to understand how the phenomenon above can be corrected for using the "Back-Door Adjustment"
- Here is a thread to help: https://twitter.com/parker_brydon/status/1209114528845316097">https://twitter.com/parker_br... covering this great post “A Technical Primer On Causality” by @akelleh https://link.medium.com/8TvoYDLEE2
/6">https://link.medium.com/8TvoYDLEE...

https://twitter.com/parker_brydon/status/1209114528845316097

2. Understand How To Identify Causal Structure in Your Data:

Now you know how to correct for the variables that satisfy the "back-door criterion"
But you still need to find those variables

Thread: https://twitter.com/parker_brydon/status/1312505624983343104">https://twitter.com/parker_br... (also covering a great post from @akelleh :) )
/7

https://twitter.com/parker_brydon/status/1312505624983343104

3. Understand How Implement:

Now you just need to know how to leverage the "Back-Door Adjustment" in practice with those variables

Thread: https://twitter.com/parker_brydon/status/1209136670706143232">https://twitter.com/parker_br... covering this great post "Causal Inference With pandas.DataFrames" by @akelleh
https://link.medium.com/64UeJw7KE2
/8">https://link.medium.com/64UeJw7KE...

https://twitter.com/parker_brydon/status/1209136670706143232

@akelleh would also love your thoughts on whether I& #39;m missing anything in any of these, that you think are critical for a Data Scientist to effectively understand a relation between X and Z using observational data.

Latest Threads Unrolled: