Question about taxonomy

I use "SILVA_128_QIIME_release/rep_set/rep_set_18S_only/99/99_otus_18S.fasta" and "majority_taxonomy_7_levels.txt " to Train my classifier. First, I extract reference reads by my primer and ‘--p-trunc-len 200’ to get ref-seqs.qza. And then train the classifier and test the classifier as the steps of "Moving Picture tutorial". In the end, I got taxa-bar-plots of my library, I found the some samples in my library can not have a long annotation.(As the picture shown). It may be near kingdom?? I may have some problems in my process of the analysis. I need a help!

Hi @liucong2018,
When I see a mixture of shallow assignments (kingdom level) and deep assignments (species?) such as you are seeing, I begin to suspect the query sequences, rather than the database or classifier (which are usually at fault if all sequences are poorly classified).

See this post and this post for some other examples on the forum, and related advice.

I recommend looking at the unassigned query sequences:

  1. what is the length? if these are particularly short, that’s a very clear reason for why they are receiving very shallow classifications.
  2. try using NCBI blast to classify a handful of these unassigned sequences and see what their closest match is (make sure to exclude uncultured sequences).

If these are in fact 18S sequences that receive good hits with NCBI blast and are of adequate length, then perhaps we should examine the classifier that you trained and the steps you used to train it. But I’d start there.

I hope that helps!

how can I find the length of the unassigned query sequences? or what can I do to pick out the unassigned query sequences?

Use qiime metadata tabulate to merge feature metadata artifacts, e.g., your representative sequences artifact and taxonomy artifact. You can then open this visualization with qiime tools view and sort by taxonomy.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.