so - I once spent 1 billion hours copy-pasting little circles in powerpoint in an effort to explain *marginal structural models*. here is an unsolicited #tweetorial. If you understand regression skip to spongebob. THREAD. #epitwitter
Say you are an investigator in a world with people who are either stressed or unstressed, and either smokers or non-smokers.
You want to find out whether smoking causes lung cancer. You know that being stressed causes smoking, and being stressed causes lung cancer through pathways other than smoking.
so you recruit a cohort of people. some of them are smokers (S) and some of them are stressed (dithering).
you follow them for a number of years, and find that a number of them develop lung cancer (red circles).
so how do you find out the effect of smoking on lung cancer? If you just calculate the risk of lung cancer among smokers and compare that to the risk among non-smokers, you have a problem: you're not sure whether its smoking or stress that explains the diff in risk of lung cancer
So maybe you run a regression. Effectively this groups people by whether they are stressed or not, and then in each of those groups compares lung cancer among smokers and non-smokers.
which is very nice. now you can be sure that it wasn't stress that explained the difference between smokers and non-smokers. you publish in the new england journal of medicine and get tenure *immediately*.
You could have done this another way, the marginal structural model way, but we have to use our imagination.
Imagine that there is a super-unethical researcher randomly assigning people to be smokers or non-smokers. But they decide your probability of assignment based on how stressed you are. stressed out people are more likely to be assigned to smoke.
The researcher then follows the cohort, to see who will end up having lung cancer and who will not.
You arrive mid-trial, and are interested in assessing the effect of smoking on lung cancer. you get IRB approval. your study benefits from the fact that there are already people who have been randomly assigned to smoke based only on their level of stress. How do you do it?
You decide - "Im gonna undo the randomization scheme that other researcher used by ensuring that everyone had the same probability of being assigned to be a smoker." You give everyone a weight that is the inverse of the probability of being assigned the smoking status they got.
unstressed people originally had probability 6/20 of being assigned smoking. Now you pretend each of them represents 20/6 people.
The unstressed people originally had a probability of 14/20 or being assigned non-smoking. Now you pretend each of them represents 20/14 people.... and so on.
Now look what you are left with in your imagination: a pseudo population. Notice that as you were undoing the randomization, you kept track of how many people developed the lung cancer outcome (red circle). You made sure to scale this number up by the weight.
sometimes this meant that a person has a fractional amount of lung cancer. but that's ok, since these people are in our imaginations - in the psuedo-population.
now you can compare smokers to non-smokers in your pseudo population. you have created a population such that the number of stressed people is the same among smokers as it is among non-smokers.
and you can compare the umber of lung cancer cases among smokers to that among non-smokers in your pseudo-population. Any difference you see in lung-cancer can only be explained by smoking. Not by stress. You have used an MSM (*giggle)
Thanks for coming to my TED talk.
mybad here is the MSM. lol
I would totally love for folks to use and tweak these slides. If you want them in pptx format, DM me your email address and I’ll send them to you!
You can follow @klts0.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: