Hi,
I did qiime1 and qiime2 analysis on a same dataset and the taxa barplot outcomes of each methods were very different. Qiime2 denoise step had filtered out almost every sequence from 5 of my 17 samples, while qiime1 didn't seem to have filtered to such extent.
Quality check results to show that the same data has been used:
fastqc Per base sequence quality used for qiime1
Below are the stats after the filtering from qiime1
The filtering steps I did after otu picking step in qiime1 include:
parallel_identify_chimeric_seqs.py (ChimeraSlayer), filter_fasta.py (filtered out chimeric seqs), filter_alignment.py (filtered highly variable regions).
Looking at denoising-stats.qzv, a huge chunk was removed after the 'filtered' step whereas no such dramatic filtering in qiime1.
Did I skip this step in qiime1 or is this just due to differences in OTU picking and Denoising methods?
Thanks for providing us with details and investigation work regarding your analyses on this question!
Denoising methods and OTU picking methods are quite fundamentally different frm each other so differences are always expected in the analysis but what you are seeing is not so much related to distance between these methods but rather the parameters you are choosing for your denoising method which I believe in dada2 in your case.
As you mentioned, the step where you’re losing the vast majority of your reads is during the initial filtering step. For dada2, the choice of trimming and truncating parameters are particularly important and can have big effects on the outcome. Could you provide us with the full commands you used to run dada2 please. My guess is that without adequate truncating values sets you will have dropped most of your reads before even reaching the actual denoising step.
I didn't truncate any sequences because I thought the quality wasn't good and I might loose too many seqs.
I went back to my qiime1 steps and found that I used the default parameters on split_libraries_fastq.py step, which had "-q" (--phred_quality_threshold
The maximum unacceptable Phred quality score) as 3. The open reference otu picking with this data gave me ridiculously high numbers of OTUs(80000s) with singletons and chimeras removed. I'm thinking the parameter -q 3 could've been too lenient compared to qiime2 filters.
Please let me know if I should change the parameters.
Cheers,
With dada2 in qiime2, you'll certainly want to truncate the poor quality tails of your reads, otherwise the reads be discarded completely prior to denoising. Basically you want to truncate as much as the poor quality tails of your reads (3') as possible but ensuring you leave at least 20bp overlap to properly merge your reads. The overlap region depends on your primer set so you'll need to calculate that for your set. There's lots of discussions on picking these values on the forum that you can browse through.
This is very typical of OTU picking methods and you'll notice that the number of amplicon sequence variants (ASV) you'll get from dada2 will be much lower. Unless you have a specific reason to stick with OTU picking methods, I would generally advise not doing that anymore. With that in mind, if you did want to use OTU picking methods, you can still do that in qiime2 using the vsearch plugin instead of having to go back to qiime1. You can use the output of dada2 in vsearch.