Removing Singletons and Filtering Out Low Relative Abundance

Hello,

It is my understanding that in QIIME 1.9.1 you were able to use the following commands to 1) remove singletons and 2) filter out anything with a relative abundance below a certain percentage:

  1. filter_otus_from_otu_table.py -i otu_table.biom -o otu_table_no_singletons.biom -n 2

  2. filter_otus_from_otu_table.py -i table.biom -o table_no_low_abundances.biom --min_count_fraction 0.00005

From what I understand, DADA2 should take care of my issue 1), correct? If I’m right in assuming so, how does DADA2 ensure that singletons are removed or is there some other parameter I need to be adding to make sure that is done?

For the low abundances, based on Bokulich (2013): Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing we’d like ensure anything with a relative abundance <0.005% is removed before doing our alpha and beta diversity analyses. I’ve seen that “–p-min-frequency” can be used with “qiime feature-table filter-features” to remove low abundance features from a table. However, I am trying to find a way to do something similar to what was done with the 2013 paper and remove anything at <0.005% after obtaining the relative abundances as follows:

qiime taxa collapse
–i-table table.qza
–i-taxonomy taxonomy.qza
–p-level 6
–o-collapsed-table genus-table.qza
qiime feature-table relative-frequency
–i-table genus-table.qza
–o-relative-frequency-table rel-genus-table.qza
qiime tools export rel-genus-table.qza
–output-dir RelativeAbundanceTables/Genus
cd RelativeAbundanceTables/Genus
biom convert -i feature-table.biom -o rel-genus-table.tsv --to-tsv

Is there a way to do this? If not, I’m not really sure how to equate that 0.005% for --p-min-frequency in the FeatureTable step.

Thanks in advance!

1 Like

You are correct. dada2 dereplicates the sequences, then removes all singletons BEFORE merging. So it is still theoretically possible for singletons to appear after merging, e.g., if some merges fail or if merging denoised sequence pairs results in a unique sequence. The parameter to filter singletons is not exposed, as far as I can tell.

The ol’ “Bokulich Method” was designed specifically for OTU clustering. With denoising methods, this abundance-based filtering technique is not necessary. I’ve heard that old man Bokulich doesn’t even use it anymore, unless if he’s clustering OTUs. :wink:

That said, I suppose you may still want to remove rare ASVs out of an abundance of caution:

Use qiime feature-table summarize to figure out the total # of sequences (“total frequency”). Total frequency * 0.00005 = the min frequency you want.

I hope that helps!

4 Likes

Thank you so much @Nicholas_Bokulich! This is very helpful and exactly what I was looking for! :raised_hands:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.