Feature IDs found in the table are missing from the taxonomy

Hi All,

A newbie to QIIME2 here!
I am training my classifier, when I try to create the taxa barplot I keep getting the error below that "Feature IDs found in the table are missing from the taxonomy", but I don’t know how to convert the feature table IDs to their taxonomy. This now has me stuck at creating taxa barplot.

This is not a new question, it was asked before but the feedback wasn't clear (at least to a newbie).
Kindly advice..I would also appreciate a script on converting the feature table IDs to their taxonomy beforehand.

I have attached my taxonomy and table files.

taxonomy.qzv (1.5 MB) table_3.qzv (951.2 KB)

###The Qiime2 version I am using in our server
Python version: 3.6.7
QIIME 2 release: 2019.7
QIIME 2 version: 2019.7.0
q2cli version: 2019.7.0

#####Here is the script I ran.

qiime taxa barplot
--i-table /home/elise/Practice/table_3.qza
--i-taxonomy /home/elise/Practice/97_taxonomy.qza
--m-metadata-file /home/elise/Practice/metadata3.txt
--o-visualization /home/elise/Practice/taxa-bar-plots.qzv
--verbose

####Here is the error message I received:

Traceback (most recent call last):
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/decorator.py:decorator-gen-144>", line 2, in barplot
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 445, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/q2_taxa/_visualizer.py", line 34, in barplot
collapsed_tables = _extract_to_level(taxonomy, table)
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/q2_taxa/_util.py", line 42, in _extract_to_level
collapsed_table = _collapse_table(table, taxonomy, level, max_obs_lvl)
File "/opt/anaconda2/envs/qiime2/lib/python3.6/site-packages/q2_taxa/_util.py", line 20, in _collapse_table
'taxonomy: {}'.format(missing_ids))
ValueError: Feature IDs found in the table are missing from the taxonomy: {'ebed46494daae0c7dd626145c6a75b71', 'c8b3ad2a65f9c529a8d88d100388009f', '920723f1693e70acce269de3b7b89ea1', '7a094a8585ea34e0d7fb733575407dc3', '4adea76bb541afb91c8aa59a42301a4a', '38bf51e8c843cab9027d542b13771359', '9236119b56cce433d97c1419cf6e6947', 'db3adcf1d3e7f1b66466c5b01f540495', '38dbfba581802660b4b5a9900fb247a9', '6850714f3ba9323ff831a27ef91cfbe3', 'c353d962476a23d40065088fefffb445', '1c127f9eac1ca545c7735d49c1811c7c', 'b7ae3efee1f9e4fc3813ccdad2db4b4a', 'ef171c7978f7c18aea5e157fa1ecadc8', '9f5658a7e54fa044b49aafb70ae6f842', '8ecd49f1de951867cd34cd2b647bebf6', '2

Hi, @elise_nghalipo

I looked at your qzv files in QIIME 2 View and I think I see the problem. Your table has alphanumeric feature IDs (e.g. ebed46494daae0c7dd626145c6a75b71) while your taxonomy file has numbered feature IDs (e.g. 4, 7, 13). You need to find some way to relate both sets of feature IDs so that they match between your table and your taxonomy file.

2 Likes

Hi, @gibsramen

Thank you for pointing that out, I appreciate the feedback. Any advise on HOW to relate both sets to match? Any idea where this problem might have come from?

Many thanks,
E

I think we need a bit more information in order to say, @elise_nghalipo. Its a bit unusual to wind up with different types of feature IDs like this - can you tell us a bit about your process?

According to your provenance, it looks like you might've accidentally classified your reference reads, rather than your dataset:

Above I have highlighted the step where you extracted the reference reads (after importing them) - you'll see that that same Artifact is input into two separate stops - on the right as the reference reads, and on the left as the reads to classifiy. I think you might just need to rerun the classify-sklearn step providing your data.

Give that a shot and let us know.

:qiime2:

1 Like

Hi, @thermokarst

Many thanks for the feedback, truly appreciate it.

I am not sure where I have gone wrong. I have followed the tutorial here https://docs.qiime2.org/2019.10/tutorials/feature-classifier/, but all seem okay.

Here are my scripts, maybe another eye could help pick up the error in the scripts:

#When training a classifier you need to have reference sequences and reference taxanomy files in .txt format.

STEP 1*

#Since our files are in text format the first step is to import them into QIIME to make the QIIME 2 qza artifact
#Let’s import the reference sequences first.
qiime tools import
–type ‘FeatureData[Sequence]’
–input-path /home/elise/Practice/97_otus.fasta
–output-path /home/elise/Practice/97_otus.qza

Next is the reference taxonomy file. Please remember (97_otu_taxonomy.txt) is a tab-separated (TSV) file without a header, we must specify HeaderlessTSVTaxonomyFormat as the source format since the default source format requires a header.

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path /home/elise/Practice/97_otu_taxonomy.txt
–output-path /home/elise/Practice/97_ref-taxonomy.qza

***STEP 2

#NB Taxonomic classification accuracy improves when a Naive Bayes classifier is trained on only the region of the target sequences that was sequenced.
#NB Your reference sequences should be of equal size or more than the sequences you want to assign taxonomy to.
#Now we need to extract reads from the reference sequence file. You need to know the primers used sequence to perform this step.
#Here we’re truncating at position 150 and min length of the reference sequence is 100 and it can’t be more than 400

qiime feature-classifier extract-reads
–i-sequences /home/elise/Practice/97_otus.qza
–p-f-primer CCTACGGGNGGCWGCAG
–p-r-primer GACTACHVGGGTATCTAATCC
–p-trunc-len 150
–p-min-length 100
–p-max-length 400
–o-reads /home/elise/Practice/97_rep-seqs_3.qza

STEP 3*
#Now let’s train our classifier.
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads /home/elise/Practice/97_rep-seqs_3.qza
–i-reference-taxonomy /home/elise/Practice/97_ref-taxonomy.qza
–o-classifier /home/elise/Practice/97_classifier.qza

STEP 4***
#Now let’s test our classifier to see if it works
qiime feature-classifier classify-sklearn
–i-classifier /home/elise/Practice/97_classifier.qza
–i-reads /home/elise/Practice/97_rep-seqs_3.qza
–o-classification 97_taxonomy.qza

qiime metadata tabulate
–m-input-file /home/elise/Practice/97_taxonomy.qza
–o-visualization /home/elise/Practice/97_taxonomy.qzv

##################################

qiime taxa barplot
–i-table /home/elise/Practice/table_3.qza
–i-taxonomy /home/elise/Practice/97_taxonomy.qza
–m-metadata-file /home/elise/Practice/metadata3.txt
–o-visualization /home/elise/Practice/taxa-bar-plots.qzv

Thanks for the kind assistance.

1 Like

Thanks for sharing that info, @elise_nghalipo!

Let's revisit the original error you shared, first (when running qiime taxa barplot):

This visualizer is telling us that it has no way to associate the feature IDs in your FeatureData[Taxonomy] Artifact with the feature IDs in your FeatureTable[Frequency].

So the next step is to figure out why (hint: this is related to the provenance graph I showed above!).

Let's take a closer look at "Step 4":

According to the earlier steps in your script 97_rep-seqs_3.qz is actually from your reference sequences dataset (the data used to train the classifier in the first place). So I think this makes sense - you are trying to combine two completely different datasets here, and they don't line up.

I think you should revise Step 4 to point the --i-reads step to the FeatureData[Sequence] that is produced in whatever step made table_3.qza (did you use q2-dada2 or q2-deblur)?

Also as a brief note. Filenames don't matter to QIIME 2 - you get to pick the filename, all QIIME 2 cares about is the semantic type of the Artifact. With that in mind though, I think you have a typo in "Step 2" which might've lead to this confusion. In Step 2 you named your output reference reads 97_rep-seqs_3.qza. Usually though "rep" means "representative" - as in "these sequences are representative of the samples in my study" - "ref" is shorthand for "reference - as in "these sequences are the reference sequences in which I will compare my representative sequences to."

Hope that helps!

:qiime2:

Hi, @thermokarst,

Thank you so much, this was so helpful. I will trace my steps back in my analysis and give feedback on what I got.

And, I used dada2.

Many thanks,
E

1 Like

Hi, @thermokarst!

I traced my steps back into my process, I was able to point --i-reads step to the correct FeatureData[Sequence], so I finally produced my taxa barplots.

Many thanks for your kind help, I got the solution.

Best,
E

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.