how to train my classifier for V3-V4 16SRNA gene region

Cele_Blua · July 10, 2023, 9:18am

Hi everyone! Im new Qiime2 user and this is my first post. First of all I want to thank you for your patience and this wonderful job! Really love it.
Secondly, Im trying to understand how to train my classifier for V3-V4 16SRNA gene region (primers 341F and 805R). I think I got it, but had to issues:
1- my computer crashed when I get the "make our classifier for use on full-length SSU sequences" step. Is that that worthening?, I mean I couldnt get the "silva-138.1-ssu-nr99-classifier.qza" Do I actually need ir? It is available on the web?
2- On the other hand, I got (nicely) my silva-138.1-ssu-nr99-341f-805r-classifier.qza, but I dont understand how to test it. How can I do that? Just to check if Im okay.
Thank you again so much for your job!!

SoilRotifer · July 10, 2023, 1:17pm

Hi @Cele_Blua,

Training the full-length 16S rRNA gene classifier can take a substantial amount of memory. Often requiring anywhere from 24 - 64 GB of RAM! You can make use of premade classifiers from the data resources page. However, if you do not have the memory to train the classifier, there is a chance you might not have enough memory to use the classifier, at least without modifying some options.

You can...

simply use the classifier to see if the outputs make sense.
use some of the built-in evaluation functionality of RESCRIPt.

-Cheers!
-Mike

Cele_Blua · July 10, 2023, 5:14pm

I do really appreciatte your nicely and fast answer.
In this context, I have the following doubt: do I really need the SSU full length classifier for taxonomic classification? (Lets say that Im trying to apply a pipeline similar to the Parkinsons Mouse Tutorial) .

Preliminarily I understand that answer is no, being that my sequences do not correspond with SSU full lenght, but rather correspond to V3-V4 region. Since ive got my classifier trained for V3-V4 region, I could use it and that would be appropiate. Am I rigth?

Thank you so much!

SoilRotifer · July 10, 2023, 7:30pm

Either classifier is appropriate. Many like to use the amplicon specific classifier.

Cele_Blua · July 10, 2023, 9:03pm

Got it! Thank you so much.

Cele_Blua · July 12, 2023, 5:42pm

Following this question, ive got a problem:
Ive found a lot of features classified like
d__Eukaryota

That makes no sense since im working with 16S amplicons... I guess Ive made a mistake with the classifier. Wich pre-made classifier should I use?

i want to apply this command
qiime feature-classifier classify-sklearn
--i-reads ./dada2_rep_set.qza
--i-classifier ./gg-13-8-99-515-806-nb-classifier.qza \ ##I need Silva database here
--o-classification ./taxonomy.qza

Im upset, dont know what to do. Thank you so much for your help

SoilRotifer · July 12, 2023, 6:13pm

Not necessarily. You can validate by running against pre-made full-length GreenGenes or SILVA classifiers. If you obtain similar results then it is likely that one or more of the following is occurring:

contamination
too many off-target taxa were sequenced.
your sequence data is in mixed or reverse orientation
- That is your data is a reverse compliment with respect to the classifier.
- You an try running qiime feature-classifier classify-consensus-vsearch ... as this does not care about orientation. If you get reasonable results then your sequence orientation is an issue. However, you'll need to fix the mixed orientation issue as any sequence alignment or phylogeny will be inaccurate.
- If your vsearch results are still not reasonable then I'd suspect one of the prior issues, or something else, is occurring.

You can find other discussions in the forum about these issues.

system · August 13, 2023, 12:13am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.