Thread by @MehrbodEstaki, This study recommends truncating reads with q=18 before DADA2. Please don't do [...]

This study recommends truncating reads with q=18 before DADA2. Please don& #39;t do this! You will lose lots of your useful reads. My first twitterant. 1/18 https://doi.org/10.1186/s12859-019-3187-5">https://doi.org/10.1186/s...

Impact of quality trimming on the efficiency of reads joining and diversity analysis of Illumina...

To increase the accuracy of microbiome data analysis, solving the technical limitations of the existing sequencing machines is required. Quality trimming is suggested to reduce the effect of the...

https://doi.org/10.1186/s12859-019-3187-5

I usually consider benchmarking tools pretty useful, but sometimes the comparisons are really forced and not necessary. And in some rare cases like this they can give really bad recommendations. 2/18

Here we see a comparison of "QIIME 1 vs QIIME 2", which right off the bat should raise concerns, since that alone gives you no information about the actual tools being compared. 3/18

If you& #39;re saying you& #39;re going to compare some tools but are actually labelling them by their & #39;wrappers& #39;, you& #39;re kind of missing the whole point of these platforms like QIIME. 4/18

In this case, what& #39;s really being compared is usearch 97% closed-reference OTU clustering to DADA2 denoising. Both without their recommended optimized parameters that would play into their strengths and/or weaknesses. 5/18

For example fastq_join is not recommend to merge reads for usearch <- Edgar& #39;s own recommendations. https://www.drive5.com/usearch/manual/exp_errs.html">https://www.drive5.com/usearch/m... 6/18

I won& #39;t even go into the massive differences between comparing closed-ref OTU clustering to ASVs and why the former should really not be used at all. I& #39;ll just focus on the DADA2 recommendations here. 7/18

I& #39;ll start by saying that we (Q2 mods) generally recommend that if OTU clustering is desired, to still use a denoiser prior to OTU clustering. The quality control of denoisers is simply far superior than simple q-score filtering. 8/18

This is partially because they rely on maxEE for their filtering and not q-based truncating. I find it very strange that a whole paper dedicated to this topic fails to mention maxEE even once, even though this has been favored over just q-based filtering for some years now. 9/18

In fact, this is why default q-score truncating parameters in DADA2 is set to 2, because it instead relies on maxEE + reasonable trim/truncating for its filtering. With good reason! 10/18

"We applied the default parameter with truncation length for both (forward and reverse) reads as zero." Bad idea! I kind of wish this parameter in DADA2 didn& #39;t have a default and users had to actually look at their quality scores and mindfully pick one. 11/18

I can& #39;t stress this part enough: with paired-end Illumina runs, if you are going to use DADA2, always truncate your poor quality tails (within your merging need limits.) Not truncating from 3& #39; will always lead to -avoidable- significant losses of total reads! 12/18

Truncating based on q-score prior to DADA2 is also not recommended. Several issues here, introducing variable lengths and inflated singleton occurrence (which will be discarded) to name a couple, plus many more I won& #39;t get into here... 13/18

Ultimately this leads to poor error model building and loss of sensitivity of the denoiser. 14/18

Then there is the whole chimera removal steps which -again- are not really comparable, so I won& #39;t even dig into that one but...14/18

I will raise a question as why the default min-fold-over-parent in DADA2 was changed to 8 (from 1) without an explanation if "default" parameters were being benchmarked. 15/18

q-score based filtering may have been standard some years ago and benchmarks like this (+ numerous other more comprehensive ones!) were useful then but the field has certainly progressed beyond this which makes me question the usefulness of a benchmark like this in 2019. 16/18

I only raise my concerns here because I have already seen a user run into problems with their analysis because they were relying on this benchmark& #39;s recommendation for use with DADA2: 17/18

"Based on our results, we recommend trimming thresholds of 10–14 for QIIME1 and 18 for [DADA2] ". In summary, DON& #39;T follow this conclusion for your DADA2 denoising! 18/18

Latest Threads Unrolled: