Extraction 16S V4 from 16S V3-V4 reads

SoilRotifer · September 19, 2022, 9:41pm

I am not sure why you'd do this. Perhaps I am mis-understanding...

Why not simply extract separate V4 and V3V4 amplicon regions from the initial full-length marker gene sequence data, i.e. the available SILVA and GreenGenes files on the Data resources page, or via RESCRIPt?

This would not be a good approach. PCR primer amplification bias would be an issue, especially if you are extracting V4 sequences from data generated from V3V4 reads. That is, V3V4 primers will have different amplification biases than V4. In fact, different primer sets that target these same regions can be biased from one another.

The same goes for in silico extraction of V3V4 (e.g. from the full-length SILVA reference sequences) and then using that extracted region to then extract V4. You'll bias the V4 output from the V3V4 output based on how successful in silico primer pair search operates across different taxa with different primer sequences.

If you are trying to merge data from a V3V4 study and V4 study for some combined analyses, then a word of caution... Just because you can extract the V4 sequences does not make it easier to compare your sequence data across studies. Mainly due to the inherent PCR amplification biases between different primer sets. Even closed-reference OTU picking will not help much in this case. You may artificially inflate differences among samples in the study simply due to the biases of the different variable regions and primer sets used. Unless you have a way of controlling or minimizing this effect.

I am not sure if I was able to answer your questions, as it is not clear what you are trying to do.

-Cheers!