Removing specific taxa from analysis that aren't identified beyond o__Alphaproteobacteria but keeping others of that order that are identified beyond order


I am trying to remove a taxon that is only identified to o__Alphaproteobacteria;__ without removing the taxa in that order that are identified farther (e.g., o__Alphaproteobacteria;o__Rhizobiales). It seems like it would be easy to remove all Alphaproteobacteria but that is not what I want.

version: qiime2-2023.2

Code used:
qiime taxa filter-table --i-table reduced-table.qza --i-taxonomy taxonomy.qza --p-exclude c__Alphaproteobacteria;__,Mitochondria,Chloplast,Eukaryota,Archaea,Wolbachia,unknown,Unknown,unassigned,Unassigned --o-filtered-table table.qza

error message:
bash: __,Mitochondria,Chloroplast,Eukaryota,Archaea,Wolbachia: command not found...

Hey Bryan,

Like this?

Before filter: taxa percent
o__Alphaproteobacteria;o__ .425
o__Alphaproteobacteria;o__Rhizobiales .425
o__Alphaproteobacteria;o__example .135
o__Alphaproteobacteria;o__example2 .110
o__Alphaproteobacteria;o__example2;f__family .110
After filter: taxa percent
o__Alphaproteobacteria;o__Rhizobiales .425
o__Alphaproteobacteria;o__example .135
o__Alphaproteobacteria;o__example2 .110
o__Alphaproteobacteria;o__example2;f__family .110

Are you sure you want to remove all the percent from o__??
It makes o__Rhizobiales look much more common than it is in context...

Why do you want to remove this taxon specifically?

I should also answer your real question!

This works. You just need to wrap the string in "quotes".

qiime taxa filter-table \
  --i-table dada2_table.qza \
  --i-taxonomy taxonomy.qza \
  --p-exclude "c__Alphaproteobacteria;__,Mitochondria,Chloplast,Eukaryota,Archaea,Wolbachia,unknown,Unknown,unassigned,Unassigned" \
  --o-filtered-table dada2_out.qza
1 Like

Yes, except Alphaproteobacteria is a class.

I have large sequence libraries for my samples and an unidentified Alphaproteobacteria at > .80 that is overshadowing the other bacterial taxa. I want to take a look at the other bacteria while altering it as little as possible.

Thank you for the quick response.

1 Like

It is bacteria that is only in 5 of about 75 samples. I want to see if those 5 samples resemble the other 65 samples once that taxon is removed. Those 5 samples will still have 3000 to 5500 sequences in their libraries after that Alphaproteobacteria is removed.

I've been there :crying_cat_face:

Even if you resequence, you might as well look at the data you have!

Make sure the text matches exactly. These are all different:

--p-exclude "c__Alphaproteobacteria;o__,"
--p-exclude "c__Alphaproteobacteria;__,"
--p-exclude "c__Alphaproteobacteria;_,"
--p-exclude "c__Alphaproteobacteria;,"
1 Like

This didn't return an error but it also didn't filter the unidentified Alphaproteobacteria, or probably any other Alphaproteobacteria. Could I do a run with the "exact" option and find the whole taxonomic description? Which file could I view to find it? I read that it shouldn't have a semicolon.


Look inside the taxonomy.qza file. All .qza files are just .zip file, so you can rename it to and then open it up on your Desktop. It's a little different on Windows and OSX, so let us know if you need specific advice.