Thank you for explaining about this issue!
I have another separate problem:
I extracted 626 rep sequences in UNITE ITS (ver7_99, 2017) reference set “sh_taxonomy_qiime_ver7_99_01.12.2017_dev.txt” labeled as, “k__Fungi;p__unidentified;c__unidentified;o__unidentified;f__unidentified;g__unidentified;s__unidentified”.
But when I classified these sequences, many of them were assigned with high confidence (>0.7) to lineages other than this label by Naive Baysian Classifier trained on the same reference set containing them.
In fact, 130 out of these 626 rep sequences were assigned to species level resolution.
My three related questions are:
What could be the cause of these discrepancies?
Isn’t “k__Fungi;p__unidentified;c__unidentified;o__unidentified;f__unidentified;g__unidentified;s__unidentified” equivalent to “k_Fungi” ?
Does the inclusion of these type of sequences in the training set add any value to the trained classifier or they are merely noises?
Thank you very much for the help!