Q2-dada2: adding --collapse-no-mismatch option?

I have found that my q2-dada2 output always contains ASVs that are 100% identical, but either a nucleotide or two short, or the same length but shifted one nucleotide over. These represent a very small proportion of the ASVs, but they are still present. The standalone version allows these to be combined using --collapse-no-mismatch. Will this feature be added to the qiime2 plug-in?

1 Like

Thanks for the suggestion @Rae309!
I have raised an issue to track this feature request.

Dear qiime developers,

I am also interested in that option.

I am running a script to process 7 Miseq runs using the approach explained in the FMT tutorial Fecal microbiota transplant (FMT) study: an exercise — QIIME 2 2019.1.0 documentation.

I am running RAxML to build a phylogenetic tree and I got the follwing warning

IMPORTANT WARNING: Sequences fd44b1f13e212f437ca57589d1400a64 and fde2810723c36e42553b9cc5b3ba7ac8 are exactly identical

IMPORTANT WARNING: Sequences ff3a18a7ca6051263f33923b9f7a4e02 and ff5eea9f3b569138a07edab9193247f5 are exactly identical

IMPORTANT WARNING
Found 9664 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.

How come strictly identical ASV are not merged together ? Should we add the --collapse-no-mismatch option?

Thanks a ton

1 Like

Those ASVs are NOT identical. They may be sub/superstrings of one another — you should confirm that by looking at those sequences.

If they are indeed of equal length and 100% identical then you have done something wrong — I note that in another topic you described how you hot-wired q2-dada2 to change some of the harcoded parameters under the hood... that is where such an error would have been introduced.

That is not available in QIIME 2 yet.

One way to control this is to use the sequence truncation options in q2-dada2 to truncate your sequences at a uniform length.

Good luck!

Thanks for your help.

I note that in another topic you described how you hot-wired q2-dada2
That's right. I have just added minLen = 175 in filterAndTrim command and increased the MAX_CONSIST to 20 (default 10) in the dada.

One way to control this is to use the sequence truncation options in q2-dada2 to truncate your sequences at a uniform length.

I will try to find the optimal trimming parameters for R1 and R2.

Thanks !