RDP Reference Database in QIIME2 format

Thank you so much for the suggestions @Nicholas_Bokulich. I have looked at both and am not sure if I have an answer yet.
For suggestion #1, below are the headers of my OTU.fa and taxonomy files (note, these were copied into notepad so the formatting looks strange). The feature ids seems to match up, but is there a way to check if they are labeled as numeric?

head rep_set_99_rdp.fa
>S000494589
 GCGGCGTGCTACACATGCAGTCGTACGCGGTGGCACACCGAGT
 GGCGAACGGGTGCGTAACACGTGAGGAACCTACCCCGAAGTGGG
 GGATAACACCGGGAAACCGGTGCTAATACCGCATACGCTCCCCGGAC
 CGCATGGTCCAGGGAGCAAAGCCTCCGGGCGCTTCGGGACGGCCTC
 GCGGCCTATCAGCTTGTTGGTGGGGTAACGGCCCACCAAGGCGA
 CGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGGACT
 GAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGC
 GCAATGGGCGAAAGCCTGACGCAGCAACGCCGCGTGGAGGACGAAG
 GCCTTCGGGTTGTAAACTCCTTTCAGCAGGGACGAAACTGACGGTACC
 TGCAGAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTAAG
>S000632122
GACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGTACGCGGT
GGCAACACCGAGTGGCGAACGGGTGCGTAACACGTGAGGAACCTAC
CCCGAAGTGGGGGATAACACCGGGAAACCGGTGCTAATACCGCATA
CGCTCCCCGGACCGCATGGTCCA

head rdp_qiime_taxonomy.txt
S000494589      
Bacteria;domain;Actinobacteria;phylum;Actinobacteria;class;Acidi                                        
microbidae;subclass;Acidimicrobiales;order;Acidimicrobineae;suborder;
Acidimicrobiaceae;family;Acidimicrobium;genus
S000632122      
Bacteria;domain;Actinobacteria;phylum;Actinobacteria;class;
Acidimicrobidae;subclass;Acidimicrobiales;order;Acidimicrobineae;                                  
suborder;Acidimicrobiaceae;family;Acidimicrobium;genus

For suggestion #2, I used the dos2unix code as you suggested:

dos2unix rep_set_99_rdp.fa                                      

The file shows that it was updated, but do I need to add any flags/options? After converting the file, I imported without problem, and tried to run vsearch taxonomy classification and got the following errors. Note, I ran classification on several v-regions simultaneously and so there is one error for each job. Does the classification automatically terminate if something in the taxonomic search results is not present in reference taxonomy. Is this error saying that identifiers S002156889, S000615995, and S000269333 are missing from one or both of the files?

Plugin error from feature-classifier:
'Identifier S002156889 was reported in taxonomic search results, but was 
not present in the reference taxonomy.'
Debug info has been saved to /tmp/qiime2-q2cli-err-xjrxpvce.log

Plugin error from feature-classifier:
'Identifier S000615995 was reported in taxonomic search results, but was 
not present in the reference taxonomy.'
Debug info has been saved to /tmp/qiime2-q2cli-err-9evq16yi.log

Plugin error from feature-classifier:
'Identifier S000269333 was reported in taxonomic search results, but was 
not present in the reference taxonomy.'
Debug info has been saved to /tmp/qiime2-q2cli-err-irecsajs.log

Interestingly, if I extract the variable region from the RDP database based on primers and run the Vsearch workflow, I am able to classify taxonomy. Unfortunately, we cannot recommend this process for the IonTorrent ThermoFisher analysis workflow because ThermoFisher’s primers are proprietary and therefore cannot be inputted to extract V regions from RDP. I am curious, though, why things would run without issue on the extracted RDP database, but not on the full RDP database. Logically, if there was an missing feature issue it would make more sense for it to be on the smaller, extracted database right?

I also attached the qza files for the otus and taxonomy. I am not sure if these files or the otu/taxonomy files would give a hint to why I am getting these errors?
QZA fasta OTUs
QZA taxonomy

Thank you so much for helping me troubleshoot! :hugs:
Katherine

1 Like