Combination the NCBI Taxonomy and Silva Taxonomy

nmgduan · October 25, 2019, 5:48am

In the previous postThe high ratio of unassigned sequences, the "unassigned" reads generated from feature-classifier are kept to do downstream analyses, because they can be classified by NCBI-Blast.

There is another question. Could I merge the NCBI taxonomy classification and Silva taxonomy classification to do abundance analyses at the level of phylum or genus?

Thanks!

Morton

Nicholas_Bokulich · October 25, 2019, 2:43pm

Yes, in your case I think this makes sense since the BLAST report is showing some good hits (though I cannot see what the taxonomy is — you should not just take the top hit but try to find consensus among the top hits).

You would need to make the taxonomy conform to the SILVA taxonomy (i.e., have the same lineage), otherwise this will cause a nightmare bioinformatically.

It may be easiest to re-classify these unassigned sequences with the classify-consensus-blast or classify-consensus-vsearch classifier against the SILVA database, to avoid issues with incompatibility of taxonomy.

Speaking of nightmares, you may have some difficulty explaining and justifying this approach in publication — not to say it can't or shouldn't be done, just that reviewers may "raise an eyebrow" at your description of assembling a frankenstein taxonomy. But I think it is okay to say something along the lines of "sequences were classified with q2-feature-classifier's naive Bayes classifier (cite) against the SILVA version XXXX database (cite). A small number of unclassified sequences (XXX % of total) were re-checked using NCBI-BLAST with the XXXX database (cite) to manually assign consensus taxonomy among the top hits with a minimum of XXXX% query coverage and XXX% identity.

nmgduan · October 25, 2019, 4:05pm

Thank you very much for your further explanation!

I will re-classify these unassigned sequences against the SILVA database with above two methods.

So should I need to re-classify those "classified" sequences by above two methods, to avoid incompatibility of taxonomy? I think if I use the same database, then it is not necessary.

Nicholas_Bokulich · October 25, 2019, 4:13pm

Not necessary to reclassify everything, you are correct. You would just filter out the unclassified sequences using qiime taxa filter-seqs, classify those, then merge back in with qiime feature-table merge-taxa --i-data new-classification.qza --i-data old-classification.qza. Order matters for merge-taxa, see the help documentation for more details.

classify-consensus-vsearch is better than the blast-based classifier, in my opinion, since I recently added an option to do the LCA consensus classification across the top hits only, but certainly worth trying both.

Good luck!

system · November 25, 2019, 10:13pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.