Evaluation of the self-made classifier: How much F-measure is sufficient for a classifier?

Recently, I have self-made a classifier following the instructions at https://forum.qiime2.org/t/processing-filtering-and-evaluating-the-silva-database-and-other-reference-sequence-data-with-rescript/15494

I have used the following primers and "lca" mode.
27F: AGAGTTTGATCCTGGCTCAG
533R: TTACCGCGGCTGCTGGCAC

I have got the evaluation results, and found that at level 6 (genus level), the F-measure was 0.924.
silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-evaluation.qzv (470.8 KB)

I wonder would this be adequate for a taxonomic analysis?
Is there a recommended " minimum F measure"?
What dose F measure refer to ? Can it be thought of as "accuracy" (90% F-measure means 90% of reads have been correctly designed?)

Many thanks for your help!

My detail codes were as follows:

#step1 Culling low-quality sequences with cull-seqs
qiime rescript cull-seqs \
    --i-sequences silva-138-ssu-nr99-seqs.qza \
    --o-clean-sequences silva-138-ssu-nr99-seqs-cleaned.qza
#step2 Filtering sequences by length and taxonomy
qiime rescript filter-seqs-length-by-taxon \
    --i-sequences silva-138-ssu-nr99-seqs-cleaned.qza \
    --i-taxonomy silva-138-ssu-nr99-tax.qza \
    --p-labels Archaea Bacteria Eukaryota \
    --p-min-lens 900 1200 1400 \
    --o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza \
    --o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza 
#step3 Dereplication of sequences and taxonomy
qiime rescript dereplicate \
    --i-sequences silva-138-ssu-nr99-seqs-filt.qza \
    --i-taxa silva-138-ssu-nr99-tax.qza \
    --p-rank-handles 'silva' \
    --p-mode 'lca' \
    --o-dereplicated-sequences silva-138-ssu-nr99-seqs-derep-lca.qza \
    --o-dereplicated-taxa silva-138-ssu-nr99-tax-derep-lca.qza

#step4 Make amplicon-region specific classifier
qiime feature-classifier extract-reads \
    --i-sequences silva-138-ssu-nr99-seqs-derep-lca.qza \
    --p-f-primer AGAGTTTGATCCTGGCTCAG \
    --p-r-primer TTACCGCGGCTGCTGGCAC \
    --p-n-jobs 20 \
    --p-read-orientation 'forward' \
    --o-reads 27F-533R-v1-v3-seqs.qza

#step5 Dereplicate extracted region
qiime rescript dereplicate \
    --i-sequences 27F-533R-v1-v3-seqs.qza \
    --i-taxa silva-138-ssu-nr99-tax-derep-lca.qza \
    --p-rank-handles 'silva' \
    --p-mode 'lca' \
    --o-dereplicated-sequences 27F-533R-v1-v3-lca-seqs-derep.qza \
    --o-dereplicated-taxa  27F-533R-v1-v3-lca-taxa-derep.qza


#step6 construction and evaluation
qiime rescript evaluate-fit-classifier \
    --i-sequences 27F-533R-v1-v3-lca-seqs-derep.qza \
    --i-taxonomy 27F-533R-v1-v3-lca-taxa-derep.qza  \
    --o-classifier silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier.qza \
    --o-observed-taxonomy silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-predicted-taxonomy.qza \
    --o-evaluation silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-evaluation.qzv

WOW!

When I exchanged the former and reverse primers, F-measure maintained 1 at all the levels? Why?
And I found the newly extracted seqs and newly produced classifier had less size.
Maybe the classifier requires the sequence orientation. And if we have reversed sequences, we need to set the parameter 'p-read-orientation reverse-complement or auto ' in the classify-sklearn ?

#step4 Make amplicon-region specific classifier
qiime feature-classifier extract-reads \
    --i-sequences silva-138-ssu-nr99-seqs-derep-lca.qza \
    --p-f-primer TTACCGCGGCTGCTGGCAC \
    --p-r-primer AGAGTTTGATCCTGGCTCAG \
    --p-n-jobs 20 \
    --p-read-orientation 'forward' \
    --o-reads 27F-533R-v1-v3-seqs.qza

#step5 Dereplicate extracted region
qiime rescript dereplicate \
    --i-sequences 27F-533R-v1-v3-seqs.qza \
    --i-taxa silva-138-ssu-nr99-tax-derep-lca.qza \
    --p-rank-handles 'silva' \
    --p-mode 'lca' \
    --o-dereplicated-sequences 27F-533R-v1-v3-lca-seqs-derep.qza \
    --o-dereplicated-taxa  27F-533R-v1-v3-lca-taxa-derep.qza


#step6 construction and evaluation
qiime rescript evaluate-fit-classifier \
    --i-sequences 27F-533R-v1-v3-lca-seqs-derep.qza \
    --i-taxonomy 27F-533R-v1-v3-lca-taxa-derep.qza  \
    --o-classifier silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier.qza \
    --o-observed-taxonomy silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-predicted-taxonomy.qza \
    --o-evaluation silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-evaluation.qzv 

silva-138-ssu-nr99-27F-533R-v1-v3-lca-classifier-evaluation2.qzv (470.2 KB)

Hi @Moon, I think I should be able to address your questions. Before we begin, I just want mention that we cover several aspects of F-measure interpretation, etc.. in our paper.

It depends. That is, the context of that number is everything. For example, in our paper, we discuss a case in which we observed an F=1 when we clustered our reference sequences. That is, sometimes a high F can be an artifact of a much smaller reference database, etc... The key is to compare your results across different filtering criteria and try and observe what is changing and why. Like most things, you want to avoid relying on one piece of information to make a decision.

Check out the F-measure wiki. In our paper, we refer to this as classification accuracy. That is, our ability to classify taxa.

I am not sure why you'd exchange the primers, as the SILVA reference database is curated in the correct orientation. That being said, using primers to extract sequence segments can spuriously align to off-targets on occasion, especially if you are mapping incorrectly. This would account for the length differences you see. Thus, you should visually check the outputs to confirm you are extracting what you think you are extracting. I suspect, there are very few reads that where extracted compared to your prior attempt. As I mentioned earlier, artificial inflation of F values can be a result of a very small reference database. Again, read our paper for more insight into how we interpreted these values.

This would depend on your reference database. There can be cases where reference data is of mixed orientation (e.g. downloading sequences from GenBank, etc...), and altering the --p-read-orientation would probably be a good idea.

-I hope this helps!
-Mike

1 Like

Thank you MIke! :grinning:
I really appreciated QIIME2 and the specialists and stuffs who kept maintaining it, updating it. I have learned a lot.