Really excited for Dr. Patrick Ball of @hrdag's talk. His background is quantitative analysis for evaluating human rights violations, e.g., for truth commissions. @UCSF_Epibiostat #SamplingKnowledgeHub #epitwitter
This guy's incredible. Silicon valley background- made software to help people safely document human rights violations. Started in 1991, when a grad student at @UMich soc and demography. Struggled after seeing what was happening in Guatemala and El Salvador- many crimes.
Adopted as personal motivation defense of human rights. In El Salvador, began non-violent accompaniment: accompany someone who is under threat of violence, eg a religious leader. You are 'noisy' & use privilege to try to protect, w/ camera, passport, & home country network
This actually worked. But as an aside, it was boring because you mostly sat around waiting for important people to have important meetings but you as the accompanier were not really having important meetings. He spent his time working on trying to fix up floppy disk (1991!).
This led to requests trying to computerize case files written in hard copy re eyewitnesses. The group that asked for his help wanted to link these accounts to the career trajectories in the el salvador military officers (constructed from newspapers and resistance reports).
Goal was to figure out crimes that occurred under specific officers to hold them accountable. This was a key input into the peace process then being negotiated b/c they could force specific people who were the "worst" to retire from the military.
"We don't get a lot of wins in human rights. In human rights we're using moral force to contest ..." structures with real power. So when you do have a win, what's the lesson and how do you scale it up? He concluded: data.
In the 1990s, this was mostly about databases. But there's also a huge statistical problem. We don't know the sampling process that's occurred in the context of all of this violence so one event gets reported? We want to understand the whole picture.
From Human rights perspective (HR), goal is not just counting 'how many' people were murdered kidnapped, forced from their home, but also disaggregating over time to see patterns. So Population Size Estimation is about getting it right over and over in different time.
"Silences"= people don't always tell their stories.
Some technical issues with Patrick's computer. (note to self, consider contributing to @hrdag so they can buy him a new computer)
Goal of their work is to develop policy pushbacks, e.g., criminal accountability for people who committed mass crimes. This is rare. More common and often more meaningful to victims is historical memory. The idea that we will not forget - we will retain and remember the people.
To achieve either of these goals, we must be right. We must get the fact rights. When talking about stats, e.g., how many people were killed? Did violence go up in april or down in april? This is tricky. We don't have data on all the events. Some people wouldn't talk to us.
Some people were inaccessible. Some people were afraid. Some people we didn't even know to ask about it. The catchphrase for this is "we don't know what we don't know." What we don't know is likely to be systematically different than what we know.
There's a social process that generated the reason for people to talk to us.
slight disruption d/t technical problems. Non-linear but important note: the foundation of authoritarian govt's is to tell you things that are absurdly, obviously false. Then insist you believe those lies. Then it's the test of your loyalty is whether you're willing to believe.
The way we must respond to these lies is to come back to the truth. Both the qualitative truth of the experience of the victim and the quantitative truth of how many people, when, who...
Back to "we don't know what we don't know". But sometimes there are clues. Project "Iraq Body Count", number of people killed from 2003 allied invasion on. Collected info from world media across multiple languages. He was concerned about their statistical inferences.
Not only underreporting some deaths but over-representing other types of deaths, so giving a very distorted picture. For example, consider event size (I think this is # of deaths) and # of sources. Most important data is the data for which you have 0 sources.
V. cool graph shows that large events have many many sources of reporting. Small events (1 victim) usually only reported by 1 or 2 sources. Large events (15+victims) usually reported by 15+ sources. Implies that most of the events with just 1 victim are not reported.
Important b/c large events likely to be perpetrated by AQI (al qaeda in iraq), coalition collateral, withw/IED or airstrikes, random victims-goal is destabilization/control. Small events totally different
Small events likely to be committed by Shi'a militias w/ firearm, killing adult men w/ goal of ethnic cleansing. If we collect data in a way that shares biases of pre-suppositions, we are not testing, we are reinforcing own priors. Naive statistics reinforce international biases.
Imagine you collect and combine 3 databases. Do the 3 databases in combo recover most of events, or only a small fraction? e.g., in Peru, they knew most of the violations of the peruvian army, but relatively few of the violations by sendero luminoso.
Relationship between what is observed (the sample) and what is true (the population) is the coverage rate. Only a formal, probability based model can bridge that gap. This is for multiple systems estimation (aka capture recapture).
Toy example: you've got 2 lists A (50 events) and B (100 events). check the intersection of the lists (25 events), call those people on both lists M. Total population size can be estimated as A*B/M (50*100/25=200). Many unrealistic assumptions but that's the idea.
What goes wrong with this basic approach? Consider the case of police homicides in the US. BJS linked media reports of police homicide w/ the FBI homicide reports. (side note FBI only reports homicides if they determine that it's a legal homicide??)
Problem is that probability of observation in media and in FBI systems is highly correlated. So simple MSE calc doesn't work. Imagine 2 people murdered. (1)a US citizen, murder is videotaped. Within a day, recording everywhere. (2) an undocumented immigrant, not videotaped.
Homicide of undoc immigrant not widely reported. Social visibility creates a strong positive correlation between two sources. This leads to a downward bias in the estimate.
They went back and estimated list dependence - estimates from Kosovo, colombia, others. Adjusted estimates incorporating list dependence from other countries range a bit but more like 10,000 instead of original estimate of 7300.
Suggests 8-10% of all homicides in the US are caused by police. Really staggering b/c 3/4 (I may have precise # wrong) of homicides are by someone you know, so if you calculate the probability that if you're killed by a stranger, it was a police offer is very high...
Expert testimony in Rios Montt in Guatemala. 1990s truth commission concluded acts of genocide. Finally a trial in 2013 bringing former prez rios montt to trial. Side note, what's genocide? Not just killing, but *targeted killing people in specific religious, ethnic, other groups
In epispeak, we might call it the relative risk for some groups very high. Their group calculated whether deaths were disproportionately among indigenous vs non-indigenous.
They used the census in Guatemala to get denominators in a region and showed a relative risk of about 8 for indigenous people to be killed. Similar to Rwanda, RR for being Tutsi vs Hutu in the one area they had good data was ~5. In Bosnia, RR associated w/ being Muslim was ~3.
They did this using log linear model for population size estimation to get the numerators for the above calculation. They had 4 sources of info on killings, could evaluate extent of overlap amongst each combo of lists.
side note: Rios Montt died during the trials.
Human rights work is using moral force and information against real power. It was important to make the compelling case that genocide occurred.
Side note about the stats methods of: https://dmanriqu.pages.iu.edu/  addressing independence assumption by conceptualizing it as fulfilled within strata using LCMCR, allowing use of many more sources. Allows more robust estimates omitting certain sources. Quant Bias Analysis
These stories are so incredible I cannot capture them. Huge discovery of documents in Guatemala. All national police actions over the past century +. How can they make sense of this? A warehouse of data. They undertook "topographic sampling", based on the piles and piles of docs
They had to periodically redo the sampling because as the piles were being processes, the piles changed. The papers often were memos, which were signed off by multiple people - this office then that office, then files the memo. This helps show accountability.
Colleagues found documents ordering a sweep of subversives in the particular area of disappeared people. Could find the officers who got the orders. When they found those cops, they said yep, we did it. But we were just following orders.... they got 40 years in prison.
And who gave those orders? The document identified the grand boss - charged w/ command responsibility for the disappearance. A standard defense in such cases is "I was really a reformer. Those were rogue agents". They could show: nothing special or rogue about this campaign.
Fully bureaucratic. Bureaucracies dedicated to violence are often much more controlling b/c there's a 'principal agent' problem, have to be controlling to make sure the agents are doing the violence the bureaucracy wants.
We have to get it right. The only way we get to the justice is to tell the truth. That's every bit as important when we're doing statistics or any other human rights work.
This is truly among the most compelling talks about stats and epi methods that I've ever heard. Everyone should get to hear it.
You can follow @MariaGlymour.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: