Plugin error from feature-classifier: this classifier does not support confidence values

Hello Everyone,

I hope everyone is well and healthy. I am attempting to Train my own classifier using MaarjAM database and have been at it for 10 hrs straight and I am still working on it needless to say I need help.

I have following the qiime2 tutorial for Training Feature Classifier

and I have designed my own FeatureData[Sequence] using the code below

(qiime2-2020.2) Calvin-Cornells-MacBook-Pro:Taco eyehillentertainment$ qiime tools import --type ‘FeatureData[Sequence]’ --input-path MercatorValidFasta.fa --output-path please_work.qza
Imported MercatorValidFasta.fa as DNASequencesDirectoryFormat to please_work.qza

and I have ‘FeatureData[Taxonomy]’ using the code see below

(qiime2-2020.2) Calvin-Cornells-MacBook-Pro:Taco eyehillentertainment$ qiime tools import --type ‘FeatureData[Taxonomy]’ --input-format HeaderlessTSVTaxonomyFormat --input-path tax_maybe.txt --output-path please_work_taxa_please.qza

Imported tax_maybe.txt as HeaderlessTSVTaxonomyFormat to please_work_taxa_please.qza

However I skipped the next step of Extract reference reads and gone to Train the classifier which completed and I used the code see below
(qiime2-2020.2) Calvin-Cornells-MacBook-Pro:Taco eyehillentertainment$ qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads please_work.qza --i-reference-taxonomy please_work_taxa_please.qza --o-classifier please_god_work_class.qza
Saved TaxonomicClassifier to: please_god_work_class.qza

However it is when I run into Testing the classifier i run into problems I used the code below

(qiime2-2020.2) Calvin-Cornells-MacBook-Pro:Taco eyehillentertainment$ qiime feature-classifier classify-sklearn --i-classifier please_god_work_class.qza --i-reads maybe_rep-seqs.qza --o-classification taxonomy.qza

**t got the error Plugin error from feature-classifier:
this classifier does not support confidence values

I have looked through almost all of the other forums and topics in search of the answer but I haven’t found any and after 10 hrs of continuously working on it I am getting really worried therefore if someone could please help me that would be amazing

Here is my --verbose code for it

Additionally I have semicolon’s between each of the taxa levels (e.g between g_; s_)

Here is the txt file for my FeatureData[Taxonomy]

tax_maybe.txt (44.1 KB)

As a result if someone could please help me solve my specific problem it would be amazing and be so much appreciated,

Thank you very much and if you need more info most defiantly let me know

And the txt version of my FeatureData[Sequence]’ (please note it is within the fasta format its just for this message/post it is in txt

MercatorValidFasta copy.txt (284.6 KB)

Therefore if someone could please help me in solving this specific problem it would be greatly appreciated and it would be a lot because I am getting very stressed. Thank you very much

Hi @CalvinCornell,

My 2 cent after a quick look at the files:
In the ‘MercatorValidFasta.txt’
Many sequences id start with ‘gb|’, which is not in the respective id in the ‘taxa_maybe’ file, I suspect any sequences starting with ‘gb’ will be found as missing in the taxonomy table by the ‘feature-classifier’ plug-in.

In the ‘taxa_maybe’ file, the ‘;’ between taxonomy levels is fine, I think the problem is the ‘\t’ (‘tab’) character after the ‘;’.
My understanding is that the taxonomy file should includes only two columns:
sequences id (exactly matching the id in the fasta file) and the taxonomy description (all levels in one column, separated by ‘;’).

Hope this helps,
Cheers

1 Like

Also, I don’t know if it is needed but here is the visualization of my --i-reads for feature-classifier classify-sklearn

rep-seqs.qzv (309.1 KB)

Hello @llenzi

Thank you so very much for your help it is greatly appreciated and with your amazing advice and with some trial and error and combining all of my taxonomic info into one column each (see below as visualization New-Microbiology_tax) I was able to get past this step of testing my classifier. Hence thank you so much for your help it is much appreciated

New-Microbiology_tax.qzv (1.2 MB)

However when I went to view/visualize my result I found that all of the Feature IDs (each being different) had the exact same Taxonomic description (I have attached the visualization below) which is very odd and strange.

NEWER_taxonomy.qzv (1.2 MB)

As a result, could I please have some help in over coming this step as it would be very much appreciated.

Thank you very much and have a wonderful day

Hi @CalvinCornell,
This issue you are observing usually occurs when abnormally short sequences are left in the reference database. These become the top hit merely because they are short snippets of sequences that match to all of your queries “erroneously” (they may be perfect matches, but only because they are extremely short and do not fully cover your query). You should filter out abnormally short/long sequences from your database before proceeding.

Hello @Nicholas_Bokulich

Awesome thank you very much for the help and for the advice it is much appreciated and I found the source of the problem was that it was a FeatureID with lots and lots and lots of N within it.

However the moment I deleted it from both my reference sequences and the corresponding taxonomic classifications my taxonomic classification worked perfectly.

Hence thank you so much for you help it is much appreciated

Have a wonderful day!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.