Combining Datasets with 2 sets of Primers

Nicholas_Bokulich · February 17, 2018, 12:28am

This is a great suggestion but a potential issue occurs to me — the nucleotide positions for these primers are approximate and may not be exact in all bacteria (e.g., slight length variation may cause the V3/V4 domains to be slightly longer/shorter). I do not know off the top of my head how much variation there is in these domains — and it probably is not very extreme — but even a 1 nt difference is enough to cause two otherwise identical sequences (with 1 nt difference) to become separate features. So trimming N nucleotides (e.g., 515 minus 341) to approximate the position could land you in hot water...

You can instead trim at the actual primer sites (in one or both datasets) with q2-cutadapt trim-single or trim-paired. Then denoise with dada2, then merge as @jairideout has suggested.

You can also check out this post from a user who wants to perform what sounds like the same analysis. There are a few different options (trim to the same primer pair, collapse on taxonomy, or use q2-fragment-insertion) to compare datasets. Trimming is possibly the easiest and possibly also the best (depending on your analysis goals).

I hope that helps!