How to assign all non-reference sequence hits a value? (Vsearch)

Hi @Micro_Biologist,

Using closed-reference OTU picking to assign taxonomy is essentially just assigning the top hit taxonomy. I do not recommend that approach as discussed elsewhere, and would instead recommend assigning taxonomy subsequently using the vsearch classifier available in q2-feature-classifier. This classifier has a perc-identity parameter that you can adjust for the same purposes. The consensus taxonomy assignment provided by that classifier will also determine whether other near hits exist, in which case the top hit may not be a reliable classification. However, the consensus assignment can also be disabled by setting maxaccepts to 1, if you prefer to just grab the top hit.

Additionally, that classifier would report unclassified sequences as unclassified... hence no workarounds would be required.

Using the vsearch classifier (rather than OTU picker) would be the correct way to do this in QIIME2, and provides the necessary output.

You may also be interested in exclude-seqs (see also this tutorial). That would allow you to just remove all sequences that do not match a reference database above a specified percent identity. You can use this to split your feature table into two — e.g., to analyze these data in separate batches (e.g., classify diatoms with a diatom database and analyze non-diatoms separately downstream) — or just to discard other sequences (e.g., if you do not want diatom sequences, just get rid of them).

I hope that helps!