ValueError: Feature IDs found in the table are missing from the taxonomy

Hi,
I'm using the June 2018 release of QIIME2 and I'm having a problem I didn't have with previous versions. I'm doing a community analysis of environmental samples taken at different time points and trying to merge the analyses together after sampling and sequencing. I've previously done this by importing the new sequences, demultiplexing, denoising with DADA2, using sklearn to assign taxonomy (Silva128 release) and then merging the feature tables and taxonomy tables.
When trying to add my latest batch of sequences to the previously merged data set, I can merge and filter the feature table (remove singletons) and merge the taxonomies as before with no problem, but when I try to generate the barplot (qiime taxa barplot) I get the value error indicated in the subject line.
I've gone back and repeated the merge step from my previous analyses (that worked before) and got the same error, so I wonder if something has changed in the new version that I'm missing?
I've used the same feature classifier for all of the taxonomy assignations to try to avoid messing around with the feature id's but that strategy seems to have failed :slight_smile:

I've attached the merged and filtered feature table as well as the merged taxonomy and screen shot of the error message.


July2018-merged-sklearn_taxonomy_silva128-97_515-806.qza (672.5 KB)
July2018-merged-table-no-singletons.qza (684.0 KB)

Thanks for any help or advice you have!

Also - for those of us that can't make it ISME this year, it'd be great to have a workshop in Asia some time in 2018!!!

Hi Again,

My last one is still pending so if you have a way to push these together on your system, then you should do that :slight_smile:
I've rolled back to the 04-2018 release and this problem disappears. Same commands, same input files. I'm guessing something has gone a bit haywire in the new release where this feature

Fixed a UX bug by improving error transparency in the collapse method. Users are now notified if the Feature IDs in their FeatureTable[Frequency] are missing from their FeatureData[Taxonomy]. Thanks

Was added that it's not just notifying me, it's throwing an error and stopping the script.
Is there a way around this?
Thanks!

Hey there @matt_rogers!

This is the intended behavior --- this missing ID situation is a problem --- previous versions of q2-taxa silently ignored this condition, now we are raising the warning and stopping execution. Its a problem because the old behavior was removing features that were in your table, but not in your taxonomy, which could be misleading when interpreting data.

I would recommend sticking with the latest and greatest. As I mentioned above, this warning message is not a bug, it is warning you that you have mismatched feature IDs.

Yep, filter those missing features out of your feature table. In particular, you can provide your taxonomy artifact as metadata when performing ID-based filtering:

qiime feature-table filter-features \
  --i-table table.qza \
  --m-metadata-file taxonomy.qza \
  --o-filtered-table id-filtered-table.qza

Hope that helps! :t_rex: :qiime2:

1 Like

Hi! Thanks so much @thermokarst!

I’ve filtered the feature table using the taxonomy as you described and this error has gone away. When I look at the new feature table artifact I can see that 1882 features are present in the unfiltered table and not in the filtered table.

Just to make sure I understand why I need this step:

  1. When I merge feature tables and taxonomies from different runs not necessarily all of the features in the merged table are present in the merged taxonomy.
  2. I need to remove those features from the frequency table because otherwise the output from anything I do that requires both the frequency table and taxonomy artifacts (e.g. the barplot) won’t necessarily match up with outputs from things I do with the feature table artifact alone (e.g. alpha diversity) because there are features in the frequency table that aren’t in the taxonomy.
  3. So I suppose that it only makes sense to perform this filtering step before doing any of the core metrics, etc. on the feature table so that everything is calculated using the same set of features.

Thanks so much for your help. You mods do an excellent job on this forum. Now if only there was a workshop somewhere in East Asia - Singapore and Hong Kong are nice almost every time of the year :wink: -

You can close this topic; thanks again!

I am not sure I understand the question - how are you "merging taxonomy"? When I am merging multiple runs, I do that first, then do taxonomic assignment --- the resulting taxonomy has exactly one entry per feature in the table/seqs.

:+1:

Talk to your institution, we teach QIIME 2 workshops worldwide. For more info, check out this link:

I merge taxonomies with "qiime feature-table merge-taxa" because if I try to assign taxonomy on the merged feature table my machine hangs or memories out :frowning: I'm already trying to use this as a lever to get my PI to buy me a new machine :crossed_fingers:

I think most of the feature discrepancies between the feature table and the taxonomy arise from table curation (singletons removal, etc), but there are definitely some features that are present in the merged feature table that are not present in the taxonomy. I'm not quite sure how this happens. Here is the merged feature table (without curation), the merged taxonomy and the error log. I assigned these taxa to the DADA2 output with an sklearn classifier trained on the Silva128 database and then merged with qiime-feature-table merge with the following commands:

qiime feature-table merge
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/July2018/2018-07-table-dada2.qza
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/Feb2018/2018-02-table-dada2.qza
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/May2018/2018-05-table-dada2.qza
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/Dec2017/2017-12_table-dada2.qza
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/Feb2017/2017-02-table-dada2.qza
--i-tables /home/matt/Desktop/community_analyses/combined_watersheds/Oct2016/2016-10-table-dada2.qza
--p-overlap-method sum
--o-merged-table merged-table.qza
qiime feature-table merge-taxa
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/Oct2016/2016-10-sklearn_taxonomy_silva128-97_515-806-new.qza
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/Feb2017/2017-02-sklearn_taxonomy_silva128-97_515-806-new.qza
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/Dec2017/2017-12-sklearn_taxonomy_silva128-97_515-806.qza
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/Feb2017/2017-02-sklearn_taxonomy_silva128-97_515-806-new.qza
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/May2018/2018-05-sklearn_taxonomy_silva128-97_515-806-new.qza
--i-data /home/matt/Desktop/community_analyses/combined_watersheds/July2018/2018-07-sklearn_taxonomy_silva128-97_515-806-new.qza
--o-merged-data merged-sklearn_taxonomy_silva128-97_515-806.qza
qiime taxa barplot
--i-table merged-table.qza
--i-taxonomy merged-sklearn_taxonomy_silva128-97_515-806.qza
--m-metadata-file /home/matt/Desktop/community_analyses/combined_watersheds/merged/July2018-merged-map.tsv
--o-visualization merged-sklearn-taxonomy-table.qzv

The barplot is where the error occurs, as before. I know how to fix this problem now but it is a bit of a mystery to me where these features are coming from.

I've used the "sum" on the feature table merge because there is one sample that was rerun. In my actual analysis I've omitted that sample before merging the feature tables, but I wanted to run everything here with the original outputs.

If you have some ideas about where those features are coming from, I'd be grateful to hear them. But if you're busy with other people's actual problems don't worry about it since you've already solved mine :clap:

qiime2-q2cli-err-ro6tti4u.txt (72.2 KB)

merged-sklearn_taxonomy_silva128-97_515-806.qza (672.5 KB)
merged-table.qza (892.9 KB)

Hey there @matt_rogers!

Thanks for sharing some data, that is super helpful — honestly I have no clue what is wrong, and something isn’t quite adding up here. Would you be willing to share the rep-seqs outputs from DADA2, too? You can send in a DM to me if you don’t want to share publicly. These should be 5 FeatureData[Sequence] Artifacts with the following UUIDs:

  • 5290cf7e-4e94-4b86-bed5-1b61218529ce
  • e09ed347-0920-4e94-a6e1-f96582e8ffa1
  • 10eab8f1-b0d6-4fdc-a07e-cda750c627a0
  • b5dc90b5-1b72-44fa-b091-467459a33892
  • a415d117-916a-4e2a-a134-7c80af04fd64

You can check an Artifact’s UUID by running qiime tools peek on it.

Thanks for your help!

2 Likes

Thanks for following up with more data in a DM, @matt_rogers!

As you pointed out in your DM, the artifact with UUID 3b05cddb-75a2-4636-8889-cca97589ba59 was merged twice, on accident:

So, that would explain why there were missing features in your taxonomy! Phew, glad you figured this one out - I was sweating over here :sweat_smile:

:t_rex:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.