A short story about #bioinformatics, #genomics, bacterial reference #standards and the challenges and complexity of doing reproducible science. (Names changed to protect the innocent! ☺)


A thread...
1/ Dr. Amy discovers a cool new bacteria and does whole genome sequencing on it and publishes the results. "This is the first time this genome has been sequenced, and will pave the way for groundbreaking research in the future!"
2/ 9 months later, while the manuscript is still under review, Dr. Amy deposits the strain into the ATCC collection for safe keeping. ATCC gives her a strain ID and she uses that in the publication the following year.
3/ 20 years later, Dr. Bob discovers this bug from a bioinformatics search so decides to order it from ATCC. He tries and fails to get his experiments to work, and eventually tries to reproduce some of Dr. Amy's other published experiments and those fail too.
4/ Frustrated, Dr. Bob calls ATCC and suggests that we sent him the wrong bug. We check, double check, and confirm it's the right one. He tries again, and still fails.
5/ Dr. Bob then calls Dr. Amy and she sends him the "same bug" too from her working stock. He tries his experiments again, but those also fail.
6/ Hearing about Dr. Bob's frustration, a colleague of his named Dr. Cathy decides to sequence Dr. Amy's strain. Although this area of microbiology is not her focus, she finds major differences in the public genome, and publishes her result in a peer reviewed journal anyway.
7/ NCBI receives her genome submission and decides it's the new reference. They replaces Dr. Amy's original genome reference with Dr. Cathy's new version in RefSeq, but don't remove it from GenBank. Both strains bear the same ATCC strain ID.

Dr. Bob is just confused.
8/ ATCC concerned there is some issue with the strain, decides to sequence the original founder strain that Dr. Amy deposited using both high coverage Illumina and Nanopore sequencing. We produce a fully closed genome assembly, and release the genome on our genome portal.
9/ Both Dr. Amy and Dr. Cathy (separately) call ATCC and inform the genome sequence on our portal is wrong. Dr. Amy tells us we are missing over 300kb of missing plasmid sequence in our strain. Dr. Cathy informs us we're missing (a different) 145kb plasmid.
10/ When we asked them about the cell line authenticity of strains, we discover that Dr. Amy sent us an isolate from her "working stock" months and months after they sequenced it in the lab. That working stock was discarded years ago though.
11/ Dr. Cathy informs us that she obtained Dr. Amy's strain from her postdoc who brought it with him from the lab where he did his graduate studies. His PhD advisor was a former postdoc of Dr. Amy and got the strain directly from her lab - not from ATCC.
12/ Dr. Amy retires and soon after unfortunately passes away. The fate of her original strain collection is currently unknown.
14/ Dr. Cathy abandons working on this bug, and becomes unresponsive when we attempt to obtain a sample to add to our collection.

Dr. Bob still doesn't know what to do...
13/ There are now 3 genomes in the public domain:
Dr. Amy's original strain genome is in GenBank, but not the reference. Dr. Cathy's version is now the reference, but physical isolates of either are not available. ATCC's isolate is available, but the genome is not the reference
What went wrong?
How could this have been avoided?
Who has the "right" genome. 😀😁

"Lab Adaptation" of bacterial strains is _real_, and can be costly.

The real looser here is Dr. Bob - he spent $$$$ and time on research that went no where because of it.
You can follow @bioinformer.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: