I am analyzing a dataset using qiime2-2020.2 where I need to remove all sequences under the domain Eukaryota and the phylum Cyanobacteria. I successfully removed these taxa with --p-mode contains using the following command:
I want to remove a specific taxa using the output of the first filtering step as the input for the second filtering step. Here is the command I have tried to use:
As the setting --p-mode exact implies, this filter is looking for taxonomy annotations that exactly match the full string. Taxa names that are slightly different will not be removed.
When I check to see if this taxa has been removed, it still remains in my dataset.
Would you be willing to post the taxonomy that you still see? Then we can compare it to the string you posted and look for differences!
You could try --p-mode contains --p-exclude "D_5__Sphingomonas", which should also remove that taxa (and have fewer letters to check for typos).
Does anyone else have a clue why Ryan’s text string did not match? It’s valid unicode and there are no issues with fancy quotes or strange characters.
If you look in the taxonomy assignments you will not find this string, that is why your “exact” mode is failing to match.
The “;__” at the end is tacked on in the barplot visualization that you shared, because you are specifying that you want the level 7 taxonomy to be displayed. That specific annotation has not 7th level, so an empty annotation is added to the end.
This is the exact mode string that you want to filter: