Some questions about taxonomy of classify-sklearn

I trained a classifier which is applied to classify the V3V4 region seqence of bacteria. However, there were two questiones that i can’t understand.
I ran the command “qiime feature-classifier classify-sklearn --i-classifier classifier_for_V3V4.qza --i-reads rep_seq.qza --o-classification greengenes_taxonomy.qza”, and at the same time, I got the following results.
There are three columns in this results. They are Feature ID, Taxon and Confidence. The first question is that there are different taxonomy levels such as family level , genus level and species level in taxon column. why features are assigned to different taxonomy? The second question is that whether I should filter features according to the size of confidence value. If so, what is the threshold?

Hi @Nanaaaaa, welcome to :qiime2:!

Are you asking why some are classified to down to genus while others are only classified to family, etc…?

The classifier is trying its best, given your reference database, to determine the what organism a given sequence is from. Often there is not enough information in the sequence to be specific, but it can determine that a sequence might be from a particular taxonomic group.

For more details see the following:

I’d check out this post: taxonomy confidence values. You can simply adjust the default --p-confidence 0.7 setting. But I’d not recommend doing this.


thanks, what you answered is just what i need :grinning: