Assigning taxonomy and filtering chloroplasts/mitochondria- removing all sample IDs?

Hello,

I am trying to assign taxonomy to my environmental 16S samples and also filter out chloroplasts and mitochondria, using QIIME2 version 2018.11. I haven't been able to find a post with a question like mine, but I think this is the closest: ValueError: Feature IDs found in the table are missing from the taxonomy

I have successfully made a classifier.qza file (using the SILVA 132 database) and removed the spaces from my taxonomy file.

When I try to run the following script to filter out chloroplasts and mitochondria:

qiime taxa filter-table --i-table table.qza --i-taxonomy taxonomy-without-spaces_LP.qza --p-exclude mitochondria,chloroplast --o-filtered-table no-mitochondria-no-chloroplast/table.qza

I get the following error:

Plugin error from taxa:

All features ids must be present in taxonomy, but the following feature ids are not: 2abacc568b628cc2dd55eac831adfae6, … (and it goes on to list many, many more IDs as well)

So I ran this script to remove the IDs in my samples that are not present in the taxonomy file:

qiime feature-table filter-features --i-table table.qza --m-metadata-file taxonomy-without-spaces_LP.qza --o-filtered-table id-filtered-table.qza

And I used the new id-filtered-table.qza to re-run the previous script:

qiime taxa filter-table --i-table id-filtered-table.qza --i-taxonomy taxonomy-without-spaces_LP.qza --p-exclude mitochondria,chloroplast --o-filtered-table no-mitochondria-no-chloroplast/table.qza

But I got this error:

Plugin error from taxa:

ids_to_keep must contain at least one ID.

So it looks like none of my samples have IDs that are present in my taxonomy file. Do you know what I could be doing wrong, or have any suggestions on something to try?

Thank you for the help!

You are correct, it sounds like there is no overlap between the feature IDs in these artifacts. A few questions:

  1. bad question but I always need to ask (myself when I run into a problem like this :wink:):
    are you sure you have the right files? You can always check data provenance just to make sure that the sequences you are inputting for taxonomy classification are paired with the feature table you are attempting to filter.
  1. How and when did you remove the spaces? If you removed spaces after classification (e.g., by exporting and modifying the taxonomy classification results), it is possible you accidentally altered the feature IDs at that time.

  2. Have you looked directly at the sequences? I recommend exporting the taxonomy and spot check the feature IDs to see if they match up with those in your table (use qiime feature-table summarize to get a list of feature IDs in the table).

Let's go through those 3 questions as a first pass, then we can figure out where to go from there!

Hello,

Thank you for your help! Here are my answers to your questions:

Question 1: I’ve double checked my files, and I also decided to start from scratch and I re-ran everything through my entire QIIME2 pipeline and I’m still getting the same error.

Question 2: To remove the spaces, I followed these steps:

  1. I downloaded the reference database (Silva 132)

wget https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip
unzip Silva_132_release.zip

  1. I imported it into QIIME2

qiime tools import --type FeatureData[Sequence] --input-path SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna --output-path silva132_99_seqs.qza

qiime tools import --type FeatureData[Taxonomy] --input-format HeaderlessTSVTaxonomyFormat --input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_7_levels.txt --output-path silva132_99_taxonomy.qza

  1. I extracted the portion of the 16S sequences specific for my primers

qiime feature-classifier extract-reads --i-sequences silva132_99_seqs.qza --p-f-primer GTGYCAGCMGCCGCGGTAA --p-r-primer GGACTACNVGGGTWTCTAAT --p-min-length 100 --p-max-length 400 --o-reads ref-seqs.qza

  1. I formatted the Silva taxonomy to remove the spaces

qiime tools export --input-path silva132_99_taxonomy.qza --output-path taxonomy-with-spaces/

qiime metadata tabulate --m-input-file taxonomy-with-spaces/taxonomy.tsv --o-visualization taxonomy-as-metadata/taxonomy-as-metadata.qzv

qiime tools export --input-path taxonomy-as-metadata/taxonomy-as-metadata.qzv --output-path taxonomy-as-metadata/

qiime tools import --type 'FeatureData[Taxonomy]' --input-path taxonomy-as-metadata/metadata.tsv --output-path taxonomy-without-spaces.qza

  1. And then I trained the classifier

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy taxonomy-without-spaces.qza --o-classifier classifier_LP.qza

  1. Then I assigned taxonomy

qiime feature-classifier classify-sklearn --i-classifier classifier_LP.qza --i-reads rep-seqs.qza --o-classification taxonomy_NELP.qza

  1. And finally, I try to filter out the chloroplasts/mitochondria and I get the error.

qiime taxa filter-table --i-table table.qza --i-taxonomy taxonomy-without-spaces_LP.qza --p-exclude mitochondria,chloroplast --o-filtered-table no-mitochondria-no-chloroplast/table.qza

  1. And my output file still gives me this error:

Plugin error from taxa:
All features ids must be present in taxonomy, but the following feature ids are not: ...

But the IDs listed here are definitely present in my table.qzv and rep-seqs.qzv file (see below).

Question 3: I used the following to check the IDs:

qiime feature-table summarize --i-table table.qza --o-visualization table.qzv --m-sample-metadata-file metadata_16S_NE.txt
qiime feature-table tabulate-seqs --i-data rep-seqs.qza --o-visualization rep-seqs.qzv

And I couldn’t check for every single ID that was listed, but I didn’t see any errors (all of the IDs that I checked were present in both files without any repeats). And I searched for some of the IDs listed in the error message, and they were present in my table.qzv and rep-seqs.qzv files as well.

Please let me know if you need more information. Thank you!

Thanks for walking me through your process, @ncep112!

Not a solution to your problem (probably) but a useful tip: I recommend just making a copy of the SILVA database and removing the whitespaces in that file before importing to QIIME 2. That's where the whitespace is coming from — SILVA, not QIIME 2 — and so you can just remove at the start rather than exporting/modifying/re-importing later on.

Got it. This is very confusing then. Would you mind sharing your files? I will troubleshoot locally.

Thank you for the tip! I will try to make a copy of the SILVA database and remove the whitespaces before importing it to QIIME 2. In the mean time, can I send you my files in a direct message, instead of uploading them here? Do you need the table.qza, rep-seqs.qza, classifier, and taxonomy files?

Yes, you can send a direct message to me via the forum and only moderators will be able to see those files.

The inputs to this command:

qiime taxa filter-table --i-table table.qza --i-taxonomy taxonomy-without-spaces_LP.qza --p-exclude mitochondria,chloroplast --o-filtered-table no-mitochondria-no-chloroplast/table.qza

Really? I just checked and none of the feature IDs match. I checked both manually and with the following python code (which confirms that no feature IDs are shared)

>>> import biom
>>> import qiime2
>>> import pandas as pd
>>> tab = qiime2.Artifact.load('table.qza')
>>> tax = qiime2.Artifact.load('taxonomy-without-spaces_LP.qza')
>>> tab_ids = tab.view(biom.Table).ids('observation')
>>> tax_ids = tax.view(pd.Series).index
>>> set(tab_ids) & set(tax_ids)
set()

Unfortunately, I cannot check the provenance since you exported the data, but looking at the commands you shared above it is not clear where taxonomy-without-spaces_LP.qza came from. I recommend re-checking those commands carefully to make sure you are using the correct files.

Good luck!

1 Like

Thank you for the help! I was able to fix the problem by removing the spaces in the SILVA database prior to importing it to QIIME2, as per your suggestion.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.