Taxa Filtering issue

Hi QIIME2 Community,

I am having an issue with the qiime taxa filter-table command. I am using qiime2-2019.4 installed on conda. Specifically, my issue is with the --p-exclude parameter. I want to exclude certain taxa from my data set. The taxa filter command completes without throwing any errors; however, when I pop the barplot into Qiime2View, the taxa are still there. Please find the code I ran below.

qiime taxa filter-table --i-table table-NoMorC.qza --i-taxonomy classify/classification.qza --p-exclude “k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;;” --o-filtered-table table-NoE.qza

Any help with this issue would be greatly appreciated.

Thank you,

Ryan

My guess is that there is a subtle typo (maybe whitespace?) in the exclude parameter value. Let's start by verifying feature counts before and after filtering:

qiime tools inspect-metadata table-NoMorC.qza --tsv | wc -l | awk '{print $1-1}'
qiime tools inspect-metadata table-NoE.qza --tsv | wc -l | awk '{print $1-1}'

:qiime2:

Hi Matthew,
Thanks for your reply. Both tables have 21882 features. I played around with this and found out that when I use --p-exclude “f__Enterobacteriaceae” the number of features is reduced to 13249. However, when I use --p-exclude “f__ Enterobacteriaceae;g__;s__” or --p-exclude “f__ Enterobacteriaceae;;” the number of features is 21882 again.

Thank you,

Ryan

Hi @Rblucas,
Sorry for the delay, we have a backlog here due to travel, an impending :qiime2: release, and this got buried in the rubble.

Check out how these taxonomies appear in the original reference database. I think what you are missing is spaces after the semicolons. This would also explain why f__Enterobacteriaceae works but f__Enterobacteriaceae;g__;s__ does not.

This will not work under any circumstances, because neither the original reference database (I assume you are using Greengenes) nor QIIME 2 represent any taxon with empty levels between semicolons. To exclude organisms that are classified as f__Enterobacteriaceae but missing genus and species classification, you would need to use the "--p-exact" parameter and:
--p-exclude k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacteriales; f__Enterobacteriaceae

Let me know if that does the trick!

2 Likes

Hi Nicholas,

That did the trick!

Thank you very much,

Ryan

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.