Sorry for struggling with this…
I want to assign taxonomy to my reads, say
Am I correct that I start by training the classifier first, using my reference sequences, say
COIdbSeqs.qza and reference taxonomy
To do this, I run
qiime feature-classifier fit-classifier
COIdb*.qza as inputs, and generate a
The next step in the tutorial talks about testing the classifier. Maybe that’s where I’m mistaken. I wanted to do that, but also want to actually classify my guano sequences -
I have mock communities as well. Are each of these tests just separate inputs for this function:?
qiime feature-classifier classify-sklearn
My concern about doing this wrong was because, thus far, I’ve only supplied the reference sequences as my input for all steps (either 2 million, or about 1.6 million depending on which database) yet in each case i got back a test result with 10,000 sequences… If I trained with 2 million sequences, and test with 2 million, shouldn’t my results include 2 million comparisons?
Thanks for the help!
ps. the full code executed for a particular database was as follows:
REFSEQ=/path/to/COIdbSeqs.qza REFTAX=/path/to/COIdbTax.qza ## train the classifier qiime feature-classifier fit-classifier-naive-bayes \ --i-reference-reads "$REFSEQ" \ --i-reference-taxonomy "$REFTAX" \ --o-classifier classifier_all_raw.qza ## test the classifier qiime feature-classifier classify-sklearn \ --i-classifier classifier_all_raw.qza \ --i-reads "$REFSEQ" \ --o-classification classifierTax_all_raw.qza ## export data for analysis qiime tools export --input-path classifierTax_all_raw.qza --output-path classifierTax_all_raw