Understanding q2-clawback weights learning

biojack · June 15, 2022, 6:06am

Hi, all

On the forum guys adviced that bespoke classifier ( classifier with non-uniform class weights. Those weights proportional to observed taxon frequencies in domain of interests ) could lead to much better taxon detection results on species level.

For example there is article Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin - PMC there are just perfect results on mock-communities. More over there is q2-clawback plugin which could improve to build such classifier Using q2-clawback to assemble taxonomic weights

So looks like great opportunity. But my concerns are

Could dataset giving obtained weights also give technical or biological batch effect to these weights?
Suppose we have dataset called A. We analyzed this dataset A with uniform classifier and then constructed bespoke classifier from read counts. What happens if we will repeat again this process according dataset A but with bespoke classifier on input? Will be process stabilized after 10 such iterations? Did anyone check this?
Would be possible to catch rare patogenes with bespoke classifier?

Also optional question related to article mentioned above. In article there is written

Where we have set the class weights to the known taxonomic composition of a sample, we have labeled the results “bespoke”

For my understanding it means that non-zero weights was assigned only for species which exactly tested from mock-community (20 species) by prior knowledge of bacteria composition. So there are almost no way for classifier work wrong. Is that correct understanding? If so, do you still think that results of this article confirming benefits of bespoke classifier? Because in real analysis there are no such prior knowledge

Thank you much for your attention.