RIM DB classifier training

I’ve been trying to train a classifier specific to the rumen taxa. However I keep coming across the same issue.

From my understanding the point of the classifier is to only match my sequences to a known sequence and its taxonomy. However, the issue with the classifier seems to be it’s only trying to match my sample Feature IDs to a matching Feature ID in the classifier. Which makes me think the classifier isn’t performing the correct way. I’ve previously used classifiers from the qiime2 website had no issue with my samples Feature IDs (which seem like a unique code specifically for my samples and not universal like the Feature IDs in the RIM DB/other databases).

Using qiime has been extremely difficult for so any suggestions or help is greatly appreciated. Thanks!

Version: qiime2-2019.10

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path RIM_DB_14_07.fasta
–output-path RIM_Db_14_07otus.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path 1RIM_DB_14_07_c.txt \
–output-path RIm_DB_14_07_taxonomy.qza

qiime feature-classifier extract-reads
–i-sequences RIM_Db_14_07otus.qza
–p-min-length 100
–p-max-length 500
–o-reads ref-seqs_RIM_Db_14_07.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads ref-seqs_RIM_Db_14_07.qza \
–i-reference-taxonomy RIm_DB_14_07_taxonomy.qza
–o-classifier RIM_DB_classifier.qza

qiime feature-classifier classify-sklearn
–i-classifier RIM_DB_classifier.qza
–i-reads ref-seqs_RIM_Db_14_07.qza \
–o-classification taxonomy.qza

qiime metadata tabulate
–m-input-file taxonomy.qza
–o-visualization taxonomy.qzv

qiime taxa barplot
–i-table table.qza
–i-taxonomy taxonomy.qza
–m-metadata-file metadataa.tsv
–o-visualization taxa-bar-plots2.qzv
Error at making taxa barplot:
Plugin error from taxa:

Feature IDs found in the table are missing from the taxonomy: {‘597b81d5a99fb4cc87e3adee12678044’, ‘79519db77b2092809ad8e4dbd6dba7b5’, ‘2b17f348d8675ca3a630e48452bfc6d9’, ‘e7d8089635a5bf9da663cf8ef662e506’, ‘8ac74fa55ae72f62aaefc08064524c37’, ‘0b1a00a4cba0853e37a10816c7a603fb’, ‘43cd0e6d70e9db30bf1f7a3826de1ef4’, ‘9832731d797cac11f1137b415c767699’, ‘f6fbb356a17f0b0ed2e4995fd7ded0f1’, ‘7edbf217b92457b3816792ffaee45e38’, ‘be2408435e8d0f6c298d90f417bfab40’, ‘b15b0024785d32fb16dc51f85794ba30’, ‘6e11c2905949394e6f7939433d0d361b’, ‘4fdfc9d73421b74499bb214c9d342d37’, ‘00ddcea7e6dd49dd2e6ffebb414158ba’, ‘7757b56d2fc7d91bb960b48747653c97’, ‘0dbf217435082bee4f431f17a857bed1’, ‘72ce0f443b9c6d910675338459a3adbe’, ‘4c425e998cca20bb6503573cc136110e’, ‘011be4645eb0cff717a53baae2f9b4b7’, ‘68b5a71871e78c124aa993447c1c51a2’, ‘d4e8874344b9fa1c5e62622f7199ff8b’, ‘7404b6276a8ff249394240f4a0b03be1’, ‘683742bd58252f1821cef2c3b46a6d5b’, ‘db505b6608cc02989b922571d0c4de4b’, ‘deae0334adde073b15dc35336a544c12’, ‘0a8898a0a7f2894b52465d6cd27c51fb’, ‘282d523a33b4cd514b63df675f4f3766’, ‘565f63052272185eba6d5415ed27c875’, ‘f74c290898b6fc477924d086f340eb7e’, ‘73f6d7f64df4ba9306ef836c8d53dd31’, ‘8bbeb4484167d1c49a75cef0ed18f521’, ‘5a016786a0623077649689049b39e55f’, ‘f85de9ef69e1fbe910988c3f95a657f7’, ‘5750f76e5b86bb07fd279cedfde39129’, ‘58f2240ab1429874c2bcaccbbd199961’, ‘0044a9ce832be433a54e2f7a118929f7’, ‘9fc55600caff43ced2169d4c772c4d7a’, ‘ee1866e981005a16f6a27807cca933bc’, ‘e404157f42ab00a8f17b0ec5d7bbb37e’, ‘86624c24905d50a6a0c6fff34465468e’, ‘6eb57dc07511e5ea2958c9423b86b09b’, ‘e33cbb8729a7ad3740d1ca8982b5149d’, ‘511c0e4ed7c2446bedf3aee3ad301b32’, ‘aa3ffc60e1d185a52abb50462e454d6d’}

Debug info has been saved to /var/folders/m7/f3lgm2bx7yn6vtpcc1d6jmv40000gn/T/qiime2-q2cli-err-xo_wwa0x.log

1 Like

Hi @matthewbur!

There is a little typo in your commands - you appear to be classifying your reference sequences, when you probably actually intend to classify the representative sequence - these should be the sequences that came from the same step that produced your FeatureTable[Frequency]. The error message is telling you that, in a roundabout way - its saying "hey look, I have all of the features in the feature table you gave me, but they aren't matching any known features in the taxonomy!"

Once you identify the correct sequences (again, these should be the ones produced at the same step as your feature table) - rerun the classify-sklearn step (swapping out ref-seqs_RIM_Db_14_07.qza for the right file) - and the rerun tabulate & barplot (since these will be impacted by the change in classify-sklearn.

Check out Training feature classifiers with q2-feature-classifier — QIIME 2 2021.2.0 documentation for a full example (note the rep-seqs.qza, and pay close attention to where it is used, and how that differs from the commands you have shared here).

Keep us posted!


PS - QIIME 2 2019.10 is ancient - I would suggest upgrading to the latest version - we release QIIME 2 4 or 5 times a year, and each release has lots of enhancements, bug fixes, etc.

1 Like

It worked. Thank you!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.