It returned a large percentage (30-90%) of “D_0__Bacteria;;;;;" in about half my samples. I would like to compare these to the BLAST results available through rep-seqs.qzv however I don’t know how to match the feature id’s to their current taxonomic assignment. How do I know which sequences were assigned to "D_0__Bacteria;…”
(also - would I be better off training the Silva-132 classifier downloaded from your data resources page instead?)
Sounds like you are probably using the wrong classifier… either something went wrong when training that classifier or you are not using an appropriate classifier for your data (e.g., using a classifier trained on the V4 domain but you are not classifying V4 sequences).
You can use qiime metadata tabulate to merge rep-seqs and taxonomy files into a single visualization. It will not link directly to NCBI BLAST, but it will allow you to browse. Another option is to use qiime taxa filter-seqs to filter your sequences to contain only those with that classification (see the filtering tutorial at qiime2.org for more details), and then summarize that QZV to get the NCBI BLAST links.
Give it a try — a “second opinion” usually never hurts.
the data resources page contains pre-trained classifiers for use with the classify-sklearn method… I had thought you were describing those classifiers.
If you want to train your own classifier and/or use SILVA with the classify-consensus-vsearch or blast methods, you are following the right steps but should use the unaligned fasta files in the rep_set directory, not the rep_set_aligned directory.
It worked great. However, I didn’t get any difference between this custom classifier and the V4/V5 classifier provided by the sequencing center. I must have repeated their steps to make classifier.qza? I have to admit that I’m very surprised that this could possibly be the case. They are exactly the same down to the thousandths of a percent. Is that possible?
It is certainly possible — and in fact you can use the data provenance to confirm that you used the same exact steps. The read extraction and classification steps are not random, results should be consistent if you have identical classifiers.
Want a great way to quantitatively compare these? Use qiime quality-control evaluate-taxonomy — it will compare the annotations for each individual sequence, and look at the correlation. If you get an error then the accession IDs must be somehow different even if the content is not…
So the question, I suppose, is really why you are getting so many unclassified reads? Start here: