This study recommends truncating reads with q=18 before DADA2. Please don't do this! You will lose lots of your useful reads. My first twitterant. 1/18 https://doi.org/10.1186/s12859-019-3187-5
I usually consider benchmarking tools pretty useful, but sometimes the comparisons are really forced and not necessary. And in some rare cases like this they can give really bad recommendations. 2/18
Here we see a comparison of "QIIME 1 vs QIIME 2", which right off the bat should raise concerns, since that alone gives you no information about the actual tools being compared. 3/18
If you're saying you're going to compare some tools but are actually labelling them by their 'wrappers', you're kind of missing the whole point of these platforms like QIIME. 4/18
In this case, what's really being compared is usearch 97% closed-reference OTU clustering to DADA2 denoising. Both without their recommended optimized parameters that would play into their strengths and/or weaknesses. 5/18
For example fastq_join is not recommend to merge reads for usearch <- Edgar's own recommendations. https://www.drive5.com/usearch/manual/exp_errs.html 6/18
I won't even go into the massive differences between comparing closed-ref OTU clustering to ASVs and why the former should really not be used at all. I'll just focus on the DADA2 recommendations here. 7/18
I'll start by saying that we (Q2 mods) generally recommend that if OTU clustering is desired, to still use a denoiser prior to OTU clustering. The quality control of denoisers is simply far superior than simple q-score filtering. 8/18
This is partially because they rely on maxEE for their filtering and not q-based truncating. I find it very strange that a whole paper dedicated to this topic fails to mention maxEE even once, even though this has been favored over just q-based filtering for some years now. 9/18
In fact, this is why default q-score truncating parameters in DADA2 is set to 2, because it instead relies on maxEE + reasonable trim/truncating for its filtering. With good reason! 10/18
"We applied the default parameter with truncation length for both (forward and reverse) reads as zero." Bad idea! I kind of wish this parameter in DADA2 didn't have a default and users had to actually look at their quality scores and mindfully pick one. 11/18
I can't stress this part enough: with paired-end Illumina runs, if you are going to use DADA2, always truncate your poor quality tails (within your merging need limits.) Not truncating from 3' will always lead to -avoidable- significant losses of total reads! 12/18
Truncating based on q-score prior to DADA2 is also not recommended. Several issues here, introducing variable lengths and inflated singleton occurrence (which will be discarded) to name a couple, plus many more I won't get into here... 13/18
Ultimately this leads to poor error model building and loss of sensitivity of the denoiser. 14/18
Then there is the whole chimera removal steps which -again- are not really comparable, so I won't even dig into that one but...14/18
I will raise a question as why the default min-fold-over-parent in DADA2 was changed to 8 (from 1) without an explanation if "default" parameters were being benchmarked. 15/18
q-score based filtering may have been standard some years ago and benchmarks like this (+ numerous other more comprehensive ones!) were useful then but the field has certainly progressed beyond this which makes me question the usefulness of a benchmark like this in 2019. 16/18
I only raise my concerns here because I have already seen a user run into problems with their analysis because they were relying on this benchmark's recommendation for use with DADA2: 17/18
"Based on our results, we recommend trimming thresholds of 10–14 for QIIME1 and 18 for [DADA2] ". In summary, DON'T follow this conclusion for your DADA2 denoising! 18/18