Taxonomy assignment output dosen't give full taxonomy even if its present in reference database

Hello,
I am doing taxonomic assignment to my features using Greengenes_13_8 database with 99_otu_taxonomy and 99_otus.fasta. I used both naive-bayes as well as consensus-blast method.

At first I obtained features having no species level or genus level taxonomic annotation. I understand this might be because some reference sequences are itself not annotated upto that level.

But I filtered my reference database and removed all the taxonomies and their corresponding sequences which were not having genus level or species level taxonomic annotation. (Just curious to know to which fully annotated taxonomies will my features represent, because a big chunk or I should say the most abundant features are getting taxonomic annotation just upto order/class level, we really want to look for in depth taxonomy to infer some information from that) I totally understand it is difficult to get species/genus level annotation using only 16S sequences.
But after the taxonomic assignment using both naive-bayes and consensus-blast using the filtered database I still get taxonomies up to phylum/class/order level.
Ideally it should give me the species level annotation.
Why does this happen even after giving a database which has all 7 level taxonomic annotation?

These are my commands and their outputs

#Classifying using consensus-blast
qiime feature-classifier classify-consensus-blast --i-query MB_rep_seqs.qza /
--i-reference-reads 99_V3V4_otus_with_s_level.qza /
--i-reference-taxonomy 99_taxonomy_with_s_level.qza /
--p-maxaccepts 100 /
--p-perc-identity 0.35 /
--p-query-cov 0.35 /
--o-classification MB_gg_with_s_taxonomy_consensus_blast_V3V4.qza

#classifying using naive-bayes

##Extracting V3V4 specific sequences from the reference dataset
qiime feature-classifier extract-reads /
--i-sequences gg_99_otus_with_S_level.qza /
--p-f-primer CCTACGGGNGGCWGCAG /
--p-r-primer GACTACHVGGGTATCTAATCC /
--p-min-length 30 /
--o-reads gg_99_otus_with_S_level_V3V4.qza

##Training the classifier
qiime feature-classifier fit-classifier-naive-bayes /
--i-reference-reads gg_99_otus_with_S_level_V3V4.qza /
--i-reference-taxonomy 99_taxonomy_with_S_level.qza /
--o-classifier classifier_with_S_level_V3V4.qza

#Test the classifier on MB rep-seqs data
qiime feature-classifier classify-sklearn --i-reads MB_rep_seqs.qza /
--i-classifier classifier_with_S_level_V3V4.qza /
--o-classification MB_taxonomy_with_S_level_V3V4.qza

Welcome to the forum @vkk_24!

This is quite common and you already gave the answer:

basically, some of your sequences are too short to differentiate those specific groups. It looks like for many of your sequences you are getting species-level classification, but for others you get phylum-level classification, because those query sequences are matching reference sequences from two or more different orders!

If you think there is a problem with the sequences, you can see this topic for some more details and troubleshooting:

That's where I'd begin — look at the sequences that are underclassifying, since many of your classifications look good.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.