Combining studies from same 16S region but with different primers

azan · May 10, 2021, 3:57am

Hi everyone,
I am attempting to perform meta-analysis, where all of the obtained studies examined V3-V4, but using different sets of primers. There are obvious problems with comparing different variable regions, but I’m not sure if these problems apply when using the same variable region with different primers. To my understanding these sequences too do not align at 3’ and as a result can cause same problems with defining unique sequences abundance (as described here ).
This leads me to the following question - should sequences from same region but generated with different primers be treated the same way as sequences from different regions, i.e. preferentially using q2-fragment-insertion or are there perhaps other, more appropriate methods (unfortunetely the obtained sequences are with primers already removed) to combine this kind of data?

Thank you for your help. I look forward to your response.

SoilRotifer · May 10, 2021, 2:20pm

Hi @azan, welcome to :qiime2:!,

For the case in which you have data from amplicon region, but with different primer pairs, there are a few options you can try below. Note: regardless of the approach you use, you'll still have to worry about slight variations in PCR / sequencing biases in your results. Which may inflate differences between data sets. It only takes a single base in a primer to alter what you observe in your data.

Alter the trim & truncation settings for each run, using these slightly different primers, such that the the same ASVs will be generated. Then you can merge the tables and sequence files for down stream processing.
Closed reference OTU picking is another option. I suggest you read He et. al. 2015, Rideout et. al. 2014, Wescot et. al. 2015, and finally Callahan et. al. 2017 for more information.

EDIT: Do not use approach 3 below for combining multiple data sets. I erred in my thinking. See later post.
~~3) You can use q2-sidle, to treat these as different amplified regions to combine your data for each primer set.~~

Just as an FYI, I am also currently combining data sets with different V3V4 primer sets. In my case, only the reverse primer is different between the two. Thus, I am currently using the 1st approach I mentioned above, as I only have to trim a few bases of the end of one of the sequencing runs.

Anyway, that's my two cents.

-Mike

azan · May 10, 2021, 8:49pm

Thank you for this wonderful advice! q2-sidle looks very promising for my datasets

SoilRotifer · May 10, 2021, 8:57pm

Hi @azan, I just received some comments from the main q2-sidle developer. It is not a good idea to use sidle to merge regions for meta-analyses. I sort of half-forgot you are combining different datasets. Sidle doesn’t really support this use case. So, I’d recommend using either option 1 or 2! I’ll make a note in my original post.

azan · May 11, 2021, 6:35pm

I will explore other options you suggested then, thank you for this clarification.