how to compare taxonomy assignments from two different SILVA classifiers

Ryan_Kerney · January 10, 2020, 2:46pm

I have a quick question based on this advice:
"by comparing classification results with BLAST results and/or classification against another reference database such as Greengenes"

I ran a first pass analysis with The Silva132 classifier made available by my sequencing center. http://kronos.pharmacology.dal.ca/public_files/taxa_classifiers/qiime2-2019.7_classifiers_new_stringent/

It returned a large percentage (30-90%) of "D_0__Bacteria;;;;;" in about half my samples. I would like to compare these to the BLAST results available through rep-seqs.qzv however I don't know how to match the feature id's to their current taxonomic assignment. How do I know which sequences were assigned to "D_0__Bacteria;...."

Any advice?

(also - would I be better off training the Silva-132 classifier downloaded from your data resources page instead?)

Thanks!

Nicholas_Bokulich · January 10, 2020, 3:41pm

Hi @Ryan_Kerney,

Sounds like you are probably using the wrong classifier... either something went wrong when training that classifier or you are not using an appropriate classifier for your data (e.g., using a classifier trained on the V4 domain but you are not classifying V4 sequences).

You can use qiime metadata tabulate to merge rep-seqs and taxonomy files into a single visualization. It will not link directly to NCBI BLAST, but it will allow you to browse. Another option is to use qiime taxa filter-seqs to filter your sequences to contain only those with that classification (see the filtering tutorial at qiime2.org for more details), and then summarize that QZV to get the NCBI BLAST links.

Give it a try — a "second opinion" usually never hurts.

I hope that helps!

Ryan_Kerney · January 11, 2020, 12:22am

Thanks Nick,

I clicked on the Silva 16s/18s link:

Which was found on the QIIME Data Resources

and downloaded the large Silva_132_release.zip file which contains loads of stuff, but no Fasta files.

Instead there are .fna files, which typically contain FASTA files, but apparently not readable ones by my prompts.

I managed to import these using a manifest based on my metadata earlier, but I doubt the same trick will work here since I don't have metadata for these sequences. Or maybe it would?

It looks like the same issue came up in this post, but I don't think it was resolved.

The author was taking the same approach as me and likely many others, which is to insert our files into the moving pictures tutorial for as many steps as possible before retreating to the forum.

Nicholas_Bokulich · January 11, 2020, 12:25am

the data resources page contains pre-trained classifiers for use with the classify-sklearn method... I had thought you were describing those classifiers.

If you want to train your own classifier and/or use SILVA with the classify-consensus-vsearch or blast methods, you are following the right steps but should use the unaligned fasta files in the rep_set directory, not the rep_set_aligned directory.

Let me know if that gets you on the right track!

Ryan_Kerney · January 15, 2020, 11:59pm

Thanks again Nick,

It worked great. However, I didn't get any difference between this custom classifier and the V4/V5 classifier provided by the sequencing center. I must have repeated their steps to make classifier.qza? I have to admit that I'm very surprised that this could possibly be the case. They are exactly the same down to the thousandths of a percent. Is that possible?

Nicholas_Bokulich · January 16, 2020, 12:28am

It is certainly possible — and in fact you can use the data provenance to confirm that you used the same exact steps. The read extraction and classification steps are not random, results should be consistent if you have identical classifiers.

Want a great way to quantitatively compare these? Use qiime quality-control evaluate-taxonomy — it will compare the annotations for each individual sequence, and look at the correlation. If you get an error then the accession IDs must be somehow different even if the content is not...

So the question, I suppose, is really why you are getting so many unclassified reads? Start here:

system · February 16, 2020, 6:28am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.