Another evergreen reminder - ethnicity (or "race") is a process of self identification, often ticking a set of boxes, or gestalt assessment using visible characteristics of people (skin colour, hair type, clothes) by others. It is *not* a good representation of anyone's genetics.
The collapsing of ethnicity or race concepts as some sort of crude readout of genetics is plain wrong.
We can sometimes go the other way - genetic measurements in some places can predict the ethnicity box you will tick on a form - but we definitely can't predict your genetics from the box you tick.
For some aspects of ethnicity - notably skin colour - skin colour (like hair colour) has a genetic basis. However, skin colour is a small part of the human genome and also skin colour genetics is bamboozling complex
So - even in this well recognised physical attribute (skin colour) we can't even use this aspect of the box ticking to predict ("impute" would be the more formal term) genetics. If you want to use genetics, you have to ... measure genetics
Frustratingly geneticists do continue to use many ethnicity terms in their research as shorthand - eg, "Caucasian" and "African American", and it is common to talk about genetics for a particular "population" or "ancestry group"
It has taken me a while to work out what we are doing in these "population" groups. What we're not doing is handling genetic background effects. Rather we are navigating complex societal aspects using the fact we can predict ethnicity from genetics.
This is because humans are very social animals, and our social environment sets up many aspects of our life - from how much food and vitamins we have, to exercise, to how we access healthcare.
From a geneticists perspective these are all "environmental" (ie, non-genetic) factors, but they are our social environment. In complex societies, much of this is determined by this strange-when-you-think-about-it process of separating many aspects of society by ethnicity
We use this in everyday conversation ("Black British culture" and "White America") and some societies weave this through life in a deep way - the caste system in Hindu India. These are real parts of our society.
So when we do this "blob" in PC1/PC2 genetic space to select a subset of a cohort in a western society, we do this mainly to control (minimise) this social environment, and critically remove aspects of social environment that is confounded with drift of allele frequencies.
These "drifted" allele frequencies give rise to "genomic inflation" of our association tests (in some the tests are valid - but the causal link of association goes via this drift process, not the biological effect process we want to capture).
Importantly this process is *not* like laboratory mouse backgrounds, or even complex pedigrees in (say) diary cattle, where geneticists there don't use these techniques but rather more effective (and aggressive) linear mixed model techniques for the genetic background.
Ironically the slightly clumsy phraseology we use for this subsetting - "Caucasian" or "European American" or "Japanese" would be fine if one added the word "culture" or , if you want to be more abstracted, "social environment".
So - back to the starting point - divorce in your brain "ethnicity" and "human genetics". There are some aspects that link them (physical attributes, most notably skin colour) but these links are thin and not useful. They are different things.
You can follow @ewanbirney.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: