Different --p-min-frequency arguments result in different order of importance of the taxonomic groups

Hi! Im runnin an amplicon sequences analysis (V3-V4 region) for a single sample. For making barplots, I ran qiime taxa barplot. Of course without filtering anything the result wasnt something visually acceptable, so i decided to use "filter-features" command with "--p-min-frecuency". I ve tried different values (50,70,100) to get a good looking graph, thinking it would be the same result for the majority groups. However, majority groups changed their order of importance.

In other words, this pipeline:
qiime taxa barplot
--i-table ./table-no-unassigned.qza \ #This is the feature table without mithocondria, chloroplast and unassigned related features
--i-taxonomy ./taxonomy.qza
--m-metadata-file ./metadata.tsv
--o-visualization ./barplots/taxa_barplot_no-unassigned.qzv

qiime tools view ./barplots/taxa_barplot_no-unassigned.qzv

Gives me Methanosaeta, Smithella, Bacteroidetes_vadinHA17 and uncultured as major genus.

On the other hand, when I try this:

qiime feature-table filter-features
--i-table table-no-archaea-unassigned.qza
--p-min-frequency 50
--o-filtered-table feature-frequency50-bacteria-table.qza

qiime taxa barplot
--i-table ./feature-frequency50-bacteria-table.qza
--i-taxonomy ./taxonomy.qza
--m-metadata-file ./metadata.tsv
--o-visualization ./barplots/taxa_barplot_feature_frecuency50_bacteria.qzv

qiime tools view ./barplots/taxa_barplot_feature_frecuency50_bacteria.qzv

Gives me Methanosaeta*,Smithella*, Leptolinea and Methanoregula as major genus.

Different --p-min-frequency arguments result in different order of importance of the taxonomic groups. How I can handle with this? I expected that "filtering" option did not modify (at least) majority groups, but rather eliminate the minority groups. I really don´t get it. Can you help me?

Hi @Cele_Blua ,
The most likely explanation is that there are multiple ASVs assigned to, e.g., Bacteroidetes_vadinHA17, that were detected < 50 times in your dataset.

The barplot action is collapsing your feature table, so that the relative frequency of each ASV belonging to the same taxonomic group gets added together.

The filter-features action, on the other hand, is operating on the feature table itself without any notion of taxonomic groupings. So it will filter rare features even if they belong to a group that was otherwise observed more frequently.

So if, e.g., you have a table that looks like this (there should be 5 ASVs listed; scroll to the right to see the full table if the table is cut off in the browser window):

id ASV1 (=Methanosaeta) ASV2 (=Methanosaeta) ASV3 (=Bacteroidetes_vadinHA17) ASV4 (=Bacteroidetes_vadinHA17) ASV5 (=Methanoregula)
sample1 0 0 20 20 30
sample2 136 52 35 0 30

After filtering you would have a table that looks like this:

id ASV1 (=Methanosaeta) ASV2 (=Methanosaeta) ASV3 (=Bacteroidetes_vadinHA17) ASV5 (=Methanoregula)
sample1 0 0 20 30
sample2 136 52 35 30

When running barplot you would get different results with the filtered and unfiltered tables; the unfiltered table would yield Bacteroidetes_vadinHA17 as the second most abundant taxon, whereas the filtered table would yield Methanoregula as the second most abundant.

I hope that clarifies what is happening.

Good luck!


Very clear!! Thank you so much for your help!