I wonder would this be adequate for a taxonomic analysis?
Is there a recommended " minimum F measure"?
What dose F measure refer to ? Can it be thought of as "accuracy" (90% F-measure means 90% of reads have been correctly designed?)
When I exchanged the former and reverse primers, F-measure maintained 1 at all the levels? Why?
And I found the newly extracted seqs and newly produced classifier had less size.
Maybe the classifier requires the sequence orientation. And if we have reversed sequences, we need to set the parameter 'p-read-orientation reverse-complement or auto ' in the classify-sklearn ?
Hi @Moon, I think I should be able to address your questions. Before we begin, I just want mention that we cover several aspects of F-measure interpretation, etc.. in our paper.
It depends. That is, the context of that number is everything. For example, in our paper, we discuss a case in which we observed an F=1 when we clustered our reference sequences. That is, sometimes a high F can be an artifact of a much smaller reference database, etc... The key is to compare your results across different filtering criteria and try and observe what is changing and why. Like most things, you want to avoid relying on one piece of information to make a decision.
Check out the F-measure wiki. In our paper, we refer to this as classification accuracy. That is, our ability to classify taxa.
I am not sure why you'd exchange the primers, as the SILVA reference database is curated in the correct orientation. That being said, using primers to extract sequence segments can spuriously align to off-targets on occasion, especially if you are mapping incorrectly. This would account for the length differences you see. Thus, you should visually check the outputs to confirm you are extracting what you think you are extracting. I suspect, there are very few reads that where extracted compared to your prior attempt. As I mentioned earlier, artificial inflation of F values can be a result of a very small reference database. Again, read our paper for more insight into how we interpreted these values.
This would depend on your reference database. There can be cases where reference data is of mixed orientation (e.g. downloading sequences from GenBank, etc...), and altering the --p-read-orientation would probably be a good idea.