Closed reference OTU picking of V3-V4 amplicon against full-length reference database

Hello, everyone !

I have sequencing data with two different 16S rRNA regions with V3-V4 and V1-V9 (full-length HiFi reads) and want to minimize the effect of different regions. After denoising each sequencing data (V3-V4 / full-length), I would merge representative sequences and use them for closed-reference clustering (vsearch-global). However, I’m a bit confused about a couple of things:

  1. Which reference database should I use for OTU clustering — full-length or region-specific (e.g., V3–V4)?
    I'm not sure whether using a full-length 16S reference database (e.g., SILVA 138 99% OTUs) to cluster short-read sequences from the V3–V4 region would cause any issues.

  2. Would it be appropriate to merge the tables and sequences from both datasets after closed-reference OTU clustering and taxonomy assignment using classifiers specific to each region?

Hi @SEOK ,

You could cluster both against the full-length 16S reference. It is fine (and typical) to cluster the short amplicons against the full-length reference, this is perfectly fine for 16S domains.

Short answer, yes. This would be part of the reason to use closed-reference clustering: so that you can directly compare (both are clustered against the same reference so will share OTU IDs and taxonomy). But the longer answer is that you will need to be aware that there will be technical biases between the methods (e.g., amplification bias, sequencing bias, batch effects, etc), which you should look out for, so proceed with caution when analyzing the results.

I hope that helps!

2 Likes

Hi @Nicholas_Bokulich

Your answer was very helpful, thank you !
Can you recommend typical methods to confirm the technical bias you mentioned (e.g., amplification bias, sequencing bias, batch effects, etc)?
In my case, I usually look at whether there is a clustering pattern across amplification regions in a PCoA plot for beta diversity and perform a statistical test (such as Adonis or Permanova) to confirm whether there is a statistical difference between different sequencing regions.

Hi @SEOK ,
This will really depend on your experimental design etc. But yes, something like a PERMANOVA test can be one way to check for batch effects or other biases.

Good luck!