How to choose between full-length Greengenes or SILVA for 16s rRNA V1-V2 analysis with qiime2 pipeline?

kida_miska · August 26, 2024, 2:00pm

I'm a beginner when it comes to 16s analysis, microbiome work, and qiime2 so I apologize if my question is naive, but I'm struggling to understand the justification between using Greengenes or SILVA (the full-length classifiers of each) when it comes to working with 16s rRNA sequences that are V1-V2 region?

I came across papers/forum posts highlighting the rollout of Greengenes2 but my understanding is that if I'm not using data that is V4 region, this new pre-trained classifier is not a good fit for my needs. My lab doesn't have anyone who's very proficient in the bioinformatics details of qiime2 or workflows for this kind of data, so I don't know where to turn to get a better sense of why the guy who worked here previously (that I've been trying to self-teach from his old scripts) always used greengenes full-length classifier when doing taxonomy IDs.

His old notes emphasize that I should be using Deblur and not DADA2 (and I've stuck with this for now because I'm still learning the ins and outs of OTUs versus ASVs) but is this the reason why greengenes is a better fit? If greengenes is outdated compared to SILVA, shouldn't that outweigh the OTU versus ASV concerns? Do the 18s sequences in SILVA make any kind of impact that I have to worry about?

In short, can anyone help me understand how I should be methodically choosing what classifier to use in my situation? If it matters, I'm working with human stool samples for one project and bat stool samples for another (both are 16s)

I'm getting lost trying to work it out for myself. Any assistance is greatly appreciated!

colinbrislawn · August 27, 2024, 3:15pm

Hello Kida,

Welcome to the Qiime2 forums! :qiime2:

This is a great question. First, you can choose a denoising/clustering pipeline like deblur or DADA2, then select a database later.

The database should match your amplicons, like Greengenes for 16S or Unite for ITS.
Some classifiers are customized for a region, like V4 or V1-V2 like you have.

If you want to use Greengenes2 for V1-V2, use the qiime greengenes2 non-v4-16s action.
Or go for something simple and classic like classify-consensus-vsearch

Human stool is well-studied, so any database should work well!