Combining Datasets with 2 sets of Primers

Hi @jbethany,

This is a great suggestion but a potential issue occurs to me — the nucleotide positions for these primers are approximate and may not be exact in all bacteria (e.g., slight length variation may cause the V3/V4 domains to be slightly longer/shorter). I do not know off the top of my head how much variation there is in these domains — and it probably is not very extreme — but even a 1 nt difference is enough to cause two otherwise identical sequences (with 1 nt difference) to become separate features. So trimming N nucleotides (e.g., 515 minus 341) to approximate the position could land you in hot water... :man_playing_water_polo:

You can instead trim at the actual primer sites (in one or both datasets) with q2-cutadapt trim-single or trim-paired. Then denoise with dada2, then merge as @jairideout has suggested.

You can also check out this post from a user who wants to perform what sounds like the same analysis. There are a few different options (trim to the same primer pair, collapse on taxonomy, or use q2-fragment-insertion) to compare datasets. Trimming is possibly the easiest and possibly also the best (depending on your analysis goals).

I hope that helps!

5 Likes