Using other classifiers

Hello,

What is the ‘best’ way to classify sequences with RDP?

I’ve browsed some older posts, and it seems like there are two options:

  1. Create a FASTA file and use the RDP database (and thus the RDP Classifier too) directly via Galaxy Portal Server or locally. From my understanding, there aren’t any plugins that export DADA2 ASV’s as a multi-sequence-FASTA organized by sample headers. I may be able to create this FASTA file via Unix for loop instead.

  2. Download RDP taxonomy and rep-seqs, then follow QIIME2’s feature classifier tutorial. The “fit-classifier-naive-bayes” and “feature-classifier classify-sklearn” are based on the RDP Classifier implementation.

Are the above two approaches adequate?

Which might be ‘cleaner’ to implement?

Does anyone suggest a different approach?

Thanks!

Welcome to the forum, @Jasmine!

Do you want to use RDP classifier or use the RDP database? The two are not necessarily the same thing…

If you want to use RDP classifier then yes this is the only way — there is no RDP classifier in QIIME 2 (though there is in QIIME 1 if you have a working installation of that).

Not sure I’m following. You can export the dereplicated ASVs file output by dada2, and this is what you want to classify. Creating some sort of per-sample FASTA would just duplicate the work of the classifier and slow you down.

Instead, if you want to use RDP classifier, export that dereplicated sequences file, classify with RDP, then import those taxonomy classifications back into QIIME 2 to annotate your feature table (linking taxonomy classification to per-sample ASV abundances).

Yes, if you want to use RDP database (not classifier), then go this route.

Not based on: both RDP classifier and fit-classifier-naive-bayes use similar implementations of naive Bayes classifiers to classify sequences based on kmer frequency, and yield quite similar results (when default settings are used).

Yes, either will give you taxonomy classifications.

Option 2 (import RDP sequences and taxonomy) is probably the easiest and most streamlined, since the naive Bayes classifiers are similar so importing the RDP sequences to QIIME 2 for classification will be “easier”. More importantly, your methods will be stored in provenance so you can retrace these steps by examining the provenance in an individual QIIME 2 output — exporting to RDP classifier will break this provenance.

Those sequences and taxonomy need to be in the correct format, however, so that will be some additional work

I hope that helps!

1 Like