classify-hybrid-vsearch-sklearn: Features Omitted from Taxonomy?

I earlier ran classify-hybrid-vsearch-sklearn with a trained classifier and the reference reads and taxonomy that were used to train the classifier as the reference inputs along with a query artifact. This ran presumably fine with no error message outputs or anything. However, when trying to use these with taxa collapse, the following error happened

Plugin error from taxa:

Feature IDs found in the table are missing from the taxonomy: {'bb1a55d80d8cbef4dde0b521256785de', '83a80f42edefedecd8f119cdd296ea9d', '0fa96f49bb45475db680eb8b58fd4022', '5dceff6789381005144075ec09259571', '9f7eb301194f70ea2181ed3ba0dc18a5', '025be4e9675638f97b766c4ef40648d4', '706b57891980a8eac734014afa3e069a', '80d8e7f086dff760d0124f6a973f2cca', '15c00734df9b80afd2288dae47f6d6ee', '10f7f4a08f3097bdae1946ca30f4e19b', '9a35c6065cbd25d51e60e73814101a00', '4342dd71629d793453b9e6d5548d5e10', '7a6d6b560ad050447928134e767a1af6', 'ce148c5262a1692445d7c5d697dc8b38', '797bb1f0130fe57502c44d7b34d7559a', '068aeb5350f024f10aaa8ebbb77c62aa', 'b0c12aa01458706615f940417ec86fc8', '7dd01adf51786cfa5e27672f85c91e6a', '046bbc9c2255d3ec93f56e627a5f1709', '47a020432071b52a131b62955ca5577d', '35085a587cbc0be0673c44e1afc2d9f8', '6f6c28b8c9d5c541f8e2ac636b4fd23d', 'f547025b576db19da690b6d010704a52', '531f51ca18aeb21cae6746eb72587c51', 'b35891a8b770d2f65d612ec85c07dfe2', '27d20b3b7d9e5f67b1d5fe6ed43bb731', 'fb07e5cc95013b6d60e54b6798f6fbd0', 'a51263eb6e3615e87672f021fa48f9eb', '3c2b7ef4ce715dbeca4c79645daa7c6c', '35a60e19c6cdb5552d9f62c071c18a25', 'c986a15036d57b5e54f3d0ee2e510a5f', 'e5153f80e953180125c6618db3fa26c2', '746cc52f2d0104b309ecc209173571e4', '6dcb7fd53ad53490caab5b05b0c26472', '4d86c1ba7469207d3fb27b34390d8399', '35653a38f3cda8ea84a71d21425daae1', '98b04008af020b09271941ea2af614f8', 'a319b34e3d14dd07734177a1f9af3012', '8944bcc0ef88760f172feba7ae1b7ab0', '1f5409e31f927c8d432b72d6f4b9190d', '56371caab2cc702325b083201ed1bf0b', '9031013d15a5a53907f92743ba2f066d'}

I did this again with the same seed (413) and other parameters in the classify-hybrid-vsearch-sklearn script on a different file and encountered the same situation in which many features failed to be retained.
Also, unfortunately, I am not able to access the debug info due to permissions on the system I am using. (Related: does the --output-dir option that is in most methods and plug-ins allow the redirection of debug info? its description is relatively vague)
While I could simply filter these features out of the table, some of them are abundant enough that I am hesitant to completely drop them.
Is there a better way to approach this? Is this a common result for the classify-hybrid-vsearch-sklearn pipeline of feature-classifier?
Is there a way to forcibly add these features to the extant taxonomy artifact?

Hi @szymanski ,
Thanks for trying out this method. This is expected behavior, see here:

Yes this is what should be done. These features are most likely non-target DNA (e.g., host DNA), which is why they are abundant but are filtered out due to non-alignment to the reference (< 60% similarity I think).

You could create an artificial taxonomy for those features outside of QIIME 2, import, and then merge with the output of classify-hybrid-vsearch-sklearn.

No this cannot be redirected to the output directly, but you could run all commands with the --verbose flag, so that all messages are written to stdout.

Good luck!

1 Like