I am QIIME 1 user and have been using pyNAST as one of the filters to remove reads not from 16s rRNA in the samples – I found it performed pretty well for my datasets. I wonder why pyNAST is no longer available in QIIME 2? Are there any pitfalls making it less favorable than MAFFT + fast-tree?
Welcome to the forum, and thanks for your question!
The NAST algorithm makes some naive assumptions during alignment that result in lower quality multiple sequence alignments than other approaches. For example, it will sometimes remove gap characters without any evaluation of how that impacts the underlying alignment quality to make the aligned sequence length equal to the template alignment length.
If you’re interested in using alignment as a step for removing non-16S sequences, you can use q2-fragment-insertion in QIIME 2 (see docs linked from this page). This filters out non-16S sequences as part of its process. An alternative taxonomy-based filter would be to filter out any features that are not assigned to at least the phylum level (see here for an example of how to do this), which should also filter out any non-16S sequences (as long as the taxonomy is assigned based on a 16S reference database, such as Silva or Greengenes).
Hope this helps!
Thank you for your explanation! A further look at our data shows that
pyNAST did select OTUs that are genuine but start to lose the accuracy for the short reads that are generated from the joining algorithm in
pandaSeq. I also noticed
fastq-join we used in QIIME 1 is not default in QIIME 2 but instead the
vsearch join-pairs. Is the new method more favorable? How will you compare to the denoise + joining pipeline from
Apologies for the very slow reply!
I don’t have an answer for you on how the different joining pipelines compare. We started with
vsearch join-pairs in QIIME 2 because we had already developed wrappers for other
vsearch functionality, so it was easy for us to add. We intended to implement some others, but it never got very high on our priority list because we had our two main use cases, joining with denoising (via
dada2 denoise-paired) and without denoising (via
vsearch join-pairs), covered. Sorry to not have a better answer!
If you end up comparing results from different paired end read joining methods, we’d love to hear about the results. This document illustrates how to import reads that have been joined with other pipelines, if that’s something you’re interested in exploring.