Import RDP database to .qza file to use it with feature-classifier

Hi!

I want to import RDP database to a .qza file in order to use it with feature-classifier classify-sklearn. When you download Greengenes or Silva database you get 2 different files: sequences and taxonomy and you can import them to .qza files and obtain the classifier with fit-classifier-naive-bayes. I would like to do the same for this database.

I’ve downloaded the RDP database from [here]
(https://rdp.cme.msu.edu/misc/resources.jsp), specifically Bacteria16S fasta unaligned. There is a unique txt file with the sequences and the taxonomic assignment. I've attached an extract from this file.
exampleRDPdatabase.txt (2.3 KB)

I wonder if I could get the classifier.qza based on this database? Is there any source-format specific for this file?

Finally, I have another question: does somebody know if there are big differences between RDP database and Greengenes?

Thanks in advance,
SLa

Hi!

I want to import RDP database to a .qza file in order to use it with feature-classifier classify-sklearn. When you download Greengenes or Silva database you get 2 different files: sequences and taxonomy and you can import them to .qza files and obtain the classifier with fit-classifier-naive-bayes. I would like to do the same for this database.

I’ve downloaded the RDP database from here, specifically Bacteria16S fasta unaligned. There is a unique .txt file with the sequences and the taxonomic assignment. I've attached an extract from this file.

exampleRDPdatabase.txt (2.3 KB)

How can I get the classifier.qza based on this database?

Finally, I have another question: are there big differences between RDP database and Greengenes?

Thanks in advance,

SLa

Hey there @SLa!

Check out this quote from another post that came up on the forum when I searched for "RDP":

Check out that tutorial and let us know how it goes!

There are almost certainly differences (two different approaches, two different databases, one has been updated more recently than the other), but as to how "big" those differences are, I can't say. Why don't you compare results produced by classifying against both databases? That would give you the confidence to stand by one DB vs another, in theory.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.