Can we use q2-sample-classifier (supervised machine learning) as confirmation of indicator taxa

Hello,
I am testing q2-sample-classifier on a subset of samples including control and pathogen-treated groups. Since all samples are representing the same plant tissue and the most important parameter in metadata is pathogen treatment, so the taxonomic composition is pretty similar between tested samples and the major changes were reduction in diversity and increase of frequencies of specific taxa. I can see from applying the tutorial’s commands of q2-sample-classifier that the most important features are the indicator taxa identified in DESeq2 in R.
So, my question is; can I use these results as a confirmatory test for indicative taxa calculation?
N.B. The accuracy stats as follow:
Accuracy ratio 1.44
Overall accuracy 0.76
control (AUC) 0.88
Pathogen group (AUC) 0.88
One more question, for --p-n-estimators , I use almost half of the total samples’ count i.e. if I have 99, I use 50
Also, how far the number used here ( --p-random-state ) could influence the results?

Thanks

2 Likes

Sounds like a reasonable confirmation of the DESeq2 results.

Use more estimators, it will increase your accuracy. You do not need to calibrate n-estimators as a function of n samples.

If you have a sufficiently large number of samples, random state should not really matter, i.e., you should get more or less the same results each time (some variation is expected but not too much). You can also run the classify-samples-ncv method to train K different classifiers and get a sense of the performance variance (use --verbose to report mean accuracy and variance).

Good luck!

Thank you very much!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.