Some samples are classified with only kingdom-level

Hi to all.

I tried taxonomy analysis, but some of my samples of another project in taxa bar plot show only class level classification. I used SILVA full length pre-trained classifier. Do I have to train my own classifier for these samples? How I understand about SILVA full length pre-trained classifier is that it is trained with ALL taxonomy and rep-seqs(16S, 18S ,ITS) not limited on V4, but whole length of them. Which means, if I have enough time to analyze my samples, I don’t need to train my own classifier since the pre-trained classifier is more specific and precise. That’s why I chose to use it even though it is time-consuming…
If there is no problem what I did, then is it a problem of my samples sequence quality??

Thank you!

My colleague had the same issue with pretrained silva classifier, the problem was solved by training his own classifier. Not sure that it is the case with your samples too but definitely you should try, it’s not very long process.

Hi to all!

I’m trying to train the classifier for one project and the description of it says they used different primer set for bacteria and archaea each.(They used a-1 and a-2 primer set for bacteria, b-1 and b-2 primer set for archaea)

I used SILVA full length pre-trained classifier but it resulted in unassigned;;;_ and only classified in kingdom level…

So, I decided to train a classifier for the project and here I come :frowning:

Do I have to train 2 classifiers for each? or is it possible to train one classifier with two extracted reference sequences??

And what is wrong with the pre-trained classifier? Is it a problem of the project’s data?

Thank you!

Hi @1115,
See this post. Kingdom-level classification is almost always due to using the wrong reference sequences/classifier for your sequences. If you are sure you have 16S query sequences and a 16S classifier, the issue could be non-biological sequence (e.g., adapters) attached to your sequences.

Nothing is wrong with the pre-trained classifier. This is either a problem with your data, or just that the sequences are a different marker gene.

You will need to use different classifiers for different projects if they use different marker genes.

I hope that helps!

Hi @Nicholas_Bokulich ! Thank you for your reply!

I have a additional question of your reply. I double checked what I use is full length pre-trained classifier so I think there is no problem on choosing classifier. I’ll check whether there is a contaminant in my query sequences.

In case of project, there is one project using 2 different sets of primers(bacteria for one set and archaea for the other set). So you mean, do I have to prepare 2 classifiers for the project, one for bacteria the other for archaea?

Thank you!

If both are 16S subdomains, both can be classified with the full-length classifier.

If they are different marker genes, they need separate classifiers.

I hope that helps!

Thank you, @Nicholas_Bokulich!! All problems are solved :slight_smile:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.