Why pyNAST is no longer available in QIIME2?

pynast
queued
(Yuning) #1

Hi,

I am QIIME 1 user and have been using pyNAST as one of the filters to remove reads not from 16s rRNA in the samples – I found it performed pretty well for my datasets. I wonder why pyNAST is no longer available in QIIME 2? Are there any pitfalls making it less favorable than MAFFT + fast-tree?

Best,
Yuning

(Nicholas Bokulich) assigned gregcaporaso #2
(Greg Caporaso) #3

Hi @ynshen,
Welcome to the forum, and thanks for your question!

The NAST algorithm makes some naive assumptions during alignment that result in lower quality multiple sequence alignments than other approaches. For example, it will sometimes remove gap characters without any evaluation of how that impacts the underlying alignment quality to make the aligned sequence length equal to the template alignment length.

If you’re interested in using alignment as a step for removing non-16S sequences, you can use q2-fragment-insertion in QIIME 2 (see docs linked from this page). This filters out non-16S sequences as part of its process. An alternative taxonomy-based filter would be to filter out any features that are not assigned to at least the phylum level (see here for an example of how to do this), which should also filter out any non-16S sequences (as long as the taxonomy is assigned based on a 16S reference database, such as Silva or Greengenes).

Hope this helps!

2 Likes
(Matthew Ryan Dillon) unassigned gregcaporaso #4
(Yuning) #5

Hi @gregcaporaso,

Thank you for your explanation! A further look at our data shows that pyNAST did select OTUs that are genuine but start to lose the accuracy for the short reads that are generated from the joining algorithm in pandaSeq. I also noticed fastq-join we used in QIIME 1 is not default in QIIME 2 but instead the vsearch join-pairs. Is the new method more favorable? How will you compare to the denoise + joining pipeline from pandaSeq?

Best,
Yuning

(Matthew Ryan Dillon) assigned gregcaporaso #6