Thinking about Hypervariable regions/ database

annguyen9724 · August 7, 2021, 2:36pm

Hi everyone, I'm new in metagenomic analysis. I have some questions about hypervariable regions, hope you can help me to solve it.
1- Why V4 is the most selected among the regions?
2- V1-V2 or V3-V4 or V4 or 16s are the best for taxonomy at genus level?
3- If I analyze V1-V2 region, can i use 16s rrna datababse of SILVA to be reference or i need others database? does SILVA has data of V1-V2 region?
Thanks for helping!

jwdebelius · August 9, 2021, 4:23pm

Hi @annguyen9724,

Welcome to the :qiime2: forum!

V4 is a relatively universal target and able identify multiple regions. The V4 (515F-806R) primers are also used in the EMP Protocol, which is one of the more popular protocols. Using the same primer pair improves comparability across studies, so there's potentially a benefit if you plan to meta analyze or compare.
My best advice here is that if you're looking to start for a specific project/environment, look at what others in that environment are doing. Certain primers are less likely to amplify chloroplasts, for instance and so if you're working in plants, you may want to look for those primers.
if you're setting up a general pipeline, or planning to do a wide variety of environments, then I'd think about universality (across the tree of life) and popularity overall because it will increase your ability to meta analyze.

This depends on what you're interested in. Different regions have higher or lower specificity for specific organisms/genera. So, you might improve classification if you''re interested in a specificc target with a specific region.

You should be able to mix and match databases. My experience is that coverage tends to be lower for more extreme hypervariable regions (V1-3, and V7-9). So, you may lose information for your classifier if you use a poorly covered region. However, if V12 is standard for your ecosystem, then maybe that's the right region to explore.

Best,
Justine

P.S. If you're using 16S rRNA primers, you're doing marker gene sequencing. Metagenomic sequencing involves the full bacterial genome