Alpha diversity values and sequencing biases

I have a question regarding the values that are obtained for Shannon alpha diversity.
I am analyzing gut microbiota samples sequenced with Illumina, for some of these samples the V3/V4-region was sequenced, and for others only V4. The alpha diversity for V3-V4 is approximately 5-8 while for V4 is around 1-5.

Do these results make sense? Which factors can have an effect on alpha diversity?


Hi Marina

That is a great question! Alpha diversity is usually calculated using ASVs/OTUs, which means that taxonomic assignment (which may be affected by the region of interest) doesn’t have a role. Shannon diversity will be high if you have many ASVs (evenly distributed) and that’s it. That is, it doesn’t matter if your ASVs are V3-V4 or V4 amplicons. In this sense, if in sample A you have 2 reads, representing 2 ASVs that come from V3-V4; and in sample B you have 2 reads, representing 2 ASVs that come from V4, the Shannon index for both samples will be the exact same.

Therefore, it makes sense that V3-V4 samples have overall higher Shannon diversity, because as you are interested in 2 variable regions, there is a bigger chance you will find more ASVs than when looking to V4 only.

Actually, for this reason, as far as I know it is not advised to compare diversity metrics between samples sequenced with different primer pairs.

Hope this helps!


My first reaction to @Marmur’s question was: “this feels a lot like a test question, and thank goodness I’m not in school any more!” :smile:

One additional thought regarding @vheidrich’s response:

While it’s true that alpha diversity might not care about the taxonomic classification of an ASV/OTU, the sequences themselves can be used to convey information in particular alpha diversity estimates. See the alpha diversity section in the Moving Pictures tutorial, specifically, the alpha diversity calculation using Faith’s Phylogenetic Diversity.

Please note that I’m not disagreeing with Vitor’s post in any way; I just wanted to point out that you can also include additional information when calculating alpha diversity beyond the number of sequences of a feature.


Thank you very much for your answer and feedback, it was very useful !!

Regarding this issue

I have read that you can extract the V4 region from the V3-V4 datasets using cutadapt to be able to compare between datasets. So, which would be the correct process to do that?


1 Like

I never did anything like that, but I will take a wild guess here.

First I would make sure which exact regions are spanned by each of your primer pairs. Then, as you suggested, I would trim the raw sequences (V3-V4 and V4, if necessary) so that all amplicons span the very same region. Afterwards, I would separately denoise each dataset (originally V3-V4 and originally V4) using dada2. Finally, I would merge the outputs of dada2, ending up with a single dataset with comparable samples ready for diversity analyses.

Good luck!

1 Like

My guess is that V3-V4 is much longer than V4 alone, so simply the V4 unique ASVs or OTUs have further variations in the V3 region, which make richest collection of unique sequences, hence higher diversity. Or, in the other direction, different V3-V4 OTUs or ASVs may collapse if the differences present only in the V3 do not separate them.

1 Like