So, about that Genomics in the Cloud book ( https://oreil.ly/genomics-cloud ). @boconnor and I have received a lot of interest and a lot of questions about what it covers exactly so I thought I'd do a thread, with one tweet for each chapter, summarizing topics/goals.
1-Introduction: Why you should care about the cloud, and how #bioinformatics / #lifesciences research benefits from moving to a cloud-based ecosystem for data sharing and analysis. No, the cloud environment is not perfect; yes, it really is a game changer.
2-Genomics in a Nutshell: A primer for newcomers to the field of genomics, covering foundational terms and concepts such as genes, DNA and genomic variation, plus the technical basics of sequencing and handling genomic data.
3-Computing Technology Basics for Life Scientists: CPU, GPU, TPU, FPGA, OMG GTFO -- no really, just some basic hardware terminology, plus an introduction to key concepts like parallelism, pipelining, containers and virtual machines in fairly plain language.
4-First Steps in the Cloud: Finally we get to do some hands-on work (on @googlecloud). Set up an account, get free credits, practice managing data in storage buckets and interacting with a Docker container, get a nice custom VM set up to do some genomics.
5-First Steps with #GATK: Let's meet the workhorse of genomics! We start with a general overview, requirements, command line syntax, the usual -- then dive into calling variants with HaplotypeCaller, plus some visual troubleshooting and variant filtering concepts.
6- #GATK Best Practices for Germline Short Variant Discovery: Step by step examination of what may be the most commonly run genomics pipeline in the world, with highlights on joint calling for populations and deep learning for single-sample analysis.
7- #GATK Best Practices for Somatic Variant Discovery: Switching gears to cancer genomics with a rundown of how somatic calling is different; step by step through the pipelines for somatic short variants (Mutect2) and copy number alterations.
8-Automating Analysis Execution with Workflows: Halfway point; we pivot to the challenges of automating and scaling up these analyses, introducing the Cromwell workflow system and the portable Workflow Description Language (WDL).
9-Deciphering Real Genomics Workflows: We pretend to stumble across 2 mystery workflows, go through a systematic process of investigating their content to understand what they do and how they do it, learning useful WDL features along the way.
10-Running Single Workflows at Scale with Pipelines API: So far we've been running everything on our little custom VM. Now it's time to unleash the full power of the cloud by dispatching workflow tasks to multiple machines -- with surprisingly little effort.
11-Running Many Workflows Conveniently in Terra: Now we're scaling up to arbitrary numbers of samples, using the managed Cromwell server in the @TerraBioApp workbench. Check out the workspace, it's public (modulo the google ID sign-in) at https://app.terra.bio/#workspaces/help-gatk/Genomics-in-the-Cloud-v1
12-Interactive Analysis in Jupyter Notebook: Circling back to the #GATK work from earlier chapters, we examine what that would all look like done in Jupyter Notebooks instead of the terminal shell. Between embedded IGV and ggplots galore, it looks good!
13-Assembling Your Own Workspace in Terra: Crossing the bridge from canned examples to importing your own data and methods into @TerraBioApp in a few different scenarios. Draws on other services in the ecosystem including @DockstoreOrg and data repositories.
14-Making a Fully Reproducible Paper: Capstone case study on computational reproducibility involving synthetic data creation, #GATK, downstream analysis and real biological findings by @RealMattJM et al. https://app.terra.bio/#workspaces/help-gatk/Reproducibility_Case_Study_Tetralogy_of_Fallot
That's it, that's our book. Check it out online in the @OReillyMedia library at https://oreil.ly/genomics-cloud , the Kindle version on Amazon at https://www.amazon.com/Genomics-Cloud-Using-Docker-Terra-ebook/dp/B086Q7D47V and the paperback on Amazon and other major retailers.
As a coda, apologies for the several oopsies in this thread (screenshot swaps and unintentional image croppings bc I misunderstood a thing -- TIL). My wife is away on assignment and I've been wrangling small humans all day, so my brain is fried.
You can follow @VdaGeraldine.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: