Filter taxonomy after normalization


I have normalized my feature-table by rarefaction. After that I have filtered my sequences.qza file for keeping only the sequences present in my feature-table-rarefied, for doing a tree.

In addition I want to perform a Barplot and now I am not sure if

  • I need to filter my taxonomy and include in that file only the taxa present in my feature-table (I mean the taxonomy related to the IDs included in the feature-table-rarefied) or
  • if it is not needed because the program uses only the taxonomy that correspond with IDs in the feature-table and the rest are ignored.

In the case that filter taxonomy would be necessary, how can I do it?

Thank you very much in advance for your help!

1 Like

@MMC_northS - HI. Not sure I can help with your specific questions but your comment does raise another question. In my mind at least applying rarefaction before plotting your bar plots has to potential to change the proportions that are seen in your data. I have always followed the approach of plotting before rarefaction, as the plotting of % takes care of the variable library sizes. Would be good to get your opinion on why you rarefy before plotting.

1 Like

Hello @bmurph79 ,
I rarefy my data before because if I do not it I cannot sure that the different proportions of my taxa are due to my real differences in the samples or just because my sequences depth (I mean the number of sequences that I have in each sample). For example, maybe I have more taxa in one samples and It is just because I had more sequences in that. So I think normalization is needed.
I QIIME1 version I used to apply normalization by CSS or DESeq, which allow use all data without remove or do not take into account some sequences, but those options are not present in QIIME2 so I use rarefaction. I hope I am answering your question.

1 Like

@MMC_northS - Thanks for the reply. Will be interest to hear what others think. I believe that when you convert the read number to % and plot you are essentially taking care of different read depth, as long as you remove samples that fall below a required read threshold.

OK @bmurph79. Thank you for you view. The % is also useful.
By the way, I think it is interesting to know if it is necessary to filter taxonomy after any filter of sequences in the feature-table or not because the scripts only use fro the taxonomy file the information that have one ID connection in the feature-table, ignoring the rest. Do you know anything about it? Anyone?
Thanks in advance!

@MMC_northS - If I understand the question correctly I dont think you need to filter. I have filtered feature tables before by removing some low read samples and the overall number of ASV and species may drop a bit. However I have not had to remove any elements of the taxonomy file before plotting.

Hi @MMC_northS,
You and @bmurph79 are both correct about this:

On the topic raised by @bmurph79:

I also personally plot before rarefaction, but I don’t think there is anything wrong, per se, with plotting after rarefaction. As long as your sampling depth is high enough, and as long as you don’t have a ton of rare taxa the relative frequencies should not be disturbed so much to cause major visual changes to the barplot (since theoretically only rare species should be dropping out).

And even if you are concerned about adjusted proportions in this visualization (which is a justifiable concern), I have heard several times on this forum what I think is a fair argument: that plotting barplots after rarefaction allow you to compare these abundances do differences that you may see in alpha/beta diversity results that use the same rarefaction depth.


Thank you very much @Nicholas_Bokulich and @bmurph79 for your answers. They have been very informative and useful.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.