Super proud to share this work led by the talented @Kira_P_M, PhD student at @LieberInstitute and @HopkinsMedicine ! in this paper we performed whole genome bisulfite sequencing (WGBS) in hundreds of brain samples to better understand the genetic regulation of DNA methylation 1/ https://twitter.com/Kira_P_M/status/1309474500585689088">https://twitter.com/Kira_P_M/...
i think there were enough cool findings across (a) biological (b) clinical and (c) technical methodological domains, so i can dive into each in this thread 2/
bio 1: we used WGBS at high coverage (~20x post-qc) combined with common genetic variation to find extensive methylation quantitative trait loci (meQTLs) in two brain regions. many previous efforts used DNAm microarrays, which only profile a small fraction of ~29M CpGs 3/
bio 2: we identified a large fraction of tested CpGs (~40%) and common genetic variants (~75%) as meQTLs, which were many more than we expected. these meQTLs were not driven by LD or ancestry differences. you can download entire meQTL lists (~350M) from links in paper 4/
bio 3: we also identified a small fraction of CpH sites (~4%) showed association to genotype, even though we used homogenate tissue (and CpH is specific to neurons) - these effects were often independent of nearby CpG DNAm levels 5/
clinical 1: almost all #schizophrenia GWAS variants associated with DNAm levels at individual CpGs, and could further be clustered into many regional DNAm associations (which is likely a general property of SNPs and CpG DNAm unrelated to genetic risk) 6/
clinical 2: these SCZD genetic DMRs explained a substantial fraction of the heritability of significant GWAS loci even though they occupied 2% of genomic space, likely tagging plastic regions of the genome important for brain function 7/
methods 1: processing and analyzing wgbs data is no joke - its huge. shoutout to Richard Wilton at @IDIESJHU with the Arioc aligner ( https://academic.oup.com/bioinformatics/article/34/15/2673/4938491)">https://academic.oup.com/bioinform... who made processing this data at scale possible - its much faster and more accurate than others 8/
methods 2: smoothing DNAm levels within each sample (e.g. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-10-r83)">https://genomebiology.biomedcentral.com/articles/... helped with precision and reduced potential technical variation - shoutout to the `bsseq` Bioconductor package by @PeteHaitch and @KasperDHansen https://www.bioconductor.org/packages/release/bioc/html/bsseq.html">https://www.bioconductor.org/packages/... 9/
methods 3: the wgbs samples were generated over time and there were large batch effects, particularly among encode "blacklist" regions - we presented a comprehensive analysis of global and site-specific DNAm variation to help guide future studies 10/
thanks for reading the thread - big thanks to everyone involved, including many at @LieberInstitute and @JohnsHopkins ! 11/11