Replace missing taxonomic ranks

Please read the following before posting!

The database that I used for the taxonomic assignment has gaps in taxonomic ranks:
Seq1 d,Eukaryota;p,Bacillariophyta;c,Coscinodiscophyceae;o,Thalassiosirales;f,Thalassiosiraceae;g,Thalassiosira;s,Thalassiosira_pseudonana
Seq2 d,Eukaryota;c,Oomycota;o,Peronosporales;f,Peronosporaceae;g,Phytophthora;s,Phytophthora_infestans
Like Seq2 miss phylum level. Therefore, --qiime taxa barplot displays class and order levels at the same time if I set Taxonomic Level 3. Is it any way how I can limit results to one rank, class for example, and if the class is missing just display NA instead of order (or reformat the database file to fill a missing levels with NA)

Hi @sbombin,

They only way I can think to do what you want is to use a bit of scripting to close the gaps you have in your taxonomy annotation. You can fill in something like ā€˜p,naā€™ in the annotation for your Seq2, and so on.
Cheers

3 Likes

Hi @sbombin,

Which database is this? Custom?

I recommend trying RESCRIPt to construct your reference database.

You might be able to leverage qiime rescript dereplicate after running several ā€œfind and replaceā€ steps. That is, search for d, and replace with d__, etcā€¦ Though I do not think this will handle completely missing rank information.

If this does not work, then we might consider adding this functionality (i.e. filling in missing rank information) to RESCRIPt.

What do you think @Nicholas_Bokulich?

3 Likes

Hi @sbombin,

Having uneven taxonomic ranks like this could cause more severe, fundamental issues down the lineā€¦ so I strongly agree with @SoilRotifer that using a better reference database (or fixing yours) is the solution.

4 Likes