Differences in representative sequences between QIIME1 and QIIME2

Nicholas_Bokulich · November 14, 2018, 4:40pm

The issue with deblur appears to be that many of the sequences are unique and/or do not resemble the reference sequences you input (e.g., are non-fungal). See the fraction-artifact-with-minsize column in the stats visualization. You can adjust the min-size parameter to correct for this, if unique seqs are to blame (as I suspect) — however the deblur developers have warned against this, e.g., see here (that user has a similar issue with deblur).

Aha, yes that could be the issue — if these sequences are being filtered out as chimera on single-end, they are probably noisy reads that have issues passing filter/merging with the paired-end. QIIME 1 does not use chimera filtering by default so that could explain part of the discrepancy.

Personally, I think I would proceed with the single-end data. It is probably better to have slightly shorter reads rather than proceed with longer joined reads but potentially introduce an amplicon length bias (which is essentially what is occurring, and clearly impacting some samples more than others).

But I recognize that is not satisfying, when QIIME 1 seems to yield better results (the possibility that chimeric seqs are passing through and masquerading as real data may taint that assumption, though, depending on if you used a chimera filter with the qiime1 results). You could also use q2-vsearch to perform OTU clustering and see if that performs better for your data. I linked to the OTU clustering tutorial above. Your workflow would look like this:

use q2-quality-filter to trim/filter sequences
use vsearch dereplicate to dereplicate seqs
use q2-vsearch to cluster
use q2-vsearch to filter chimera

The chimera filter seems to suggest that it is a problem with the data — the sort of problem that can be fixed. Use single-end or try again with q2-vsearch and compare against (chimera-filtered) QIIME 1 results to see how they square up. Let us know what you find!