Criteria to discard ASVs

Hi all!

I have a doubt regarding how to handle ASVs with no phylum assignation/very low relative abundance. I have detected 160/20495 ASVs without phylum, and most of the remaining ASVs are present in very low abundances (i.e., less than 0.001%).
Browsing in the forum and some tutorials, I have found that the general practice is to discard these sequences without taxonomy and to set an abundance treshold (for example, keep only ASVs with 10 raw counts or even more aggresive filters as those that retain only ASVs with 0.01% relative abundance).

However I would like to know if there is a justification to proceed with these discarding criteria.

In addition I have the following doubts:

----Regarding ASVs with no phylum

Could these ASVs with no phyla assignment be considered as phylogenetic novelty?

Or are they artifacts of non target DNA amplification?

----Regarding ASVs with very low abundance

What is the justifcation to remove ASVs with low relative abundance? Is there any justification to consider as artifacts?

Is that the case, how to discern between authentic ASVs belonging to the rare biosphere and those that are artifacts?

Thanks a lot for your time and feedback!



Hello Manuel!

That's a very complicated question.
Re1: ASVs with no phylum should not be considered phylogenetic novelty unless you can sequence the whole genome and describe it. 16S rRNA sequence is not enough to make such a claim. Even obtaining MAGs is not sufficient, as they're not bacteria, but a certain computational representation of bacteria. The opinion on that differs between wet and dry scientists.

Re2: It is common to remove singletons, as they're rather artefacts of sequencing (i.e. chimeras). Sequencing is a complex process prone to errors. Anything of low abundance is prone to errors and in statistical terms results of such comparison would be underpowered, so you don't lose much in terms of analysis removing low abundant phyla (unless you're hunting for a specific bug).

Rare biosphere is a hard one to tackle and there are a lot of research labs working on that. You can see what it takes from analysis of sequencing data (metagenomes, not marker genes) to proposal of new clades in:

DISCLAIMER: I am a PhD student in Sunagawa Lab.



Hello Valentyn!

Thanks for the quick feedback and the shared reference :slight_smile:

I have an additional doubt. What's your opinion about ASVs with no phylum? Is it justified to discard them?

Thanks a lot for your time!

Best regards,

That's a decision to make on case-to-case basis dependent on the study design, analysis and specific objectives.

Unclassified ASVs are still important for different metrices (alpha and beta diversity), as Qiime 2 operates on ASV variants by default. They only became problematic when they appear on top of different statistical metrices, so I advise to keep them.


1 Like

Thanks for your feedback Valentyn, deeply appreciated!


A couple of additional points. Overall I agree with @crusher083.

I typically think of doing ASV filtering if I'm noticing problems, such as unexpected separations in my PCoA that might suggest some noise in my sample collection. I'll typically remove sequences unassigned at the phylum level because these do typically represent off-target amplification. You can cross-reference ASV ids between your FeatureData[Taxonomy] and FeatureData[Sequence] artifacts to find the sequences for these and BLAST them against the NCBI nr database - that is a good way to try to determine what they are. As for singleton filtering, my standard approach is to filter ASVs that appear in only one sample with the reasoning that features observed only in one sample are more likely to not be representative of microbes in my samples. But these are practices more formed from experience than hard data. Hope this helps!


Surely helps!

Thanks a lot for your feedback. Deeply appreciated!

Best regards


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.