Combining QIIME2 artifacts at the phylogenetic tree creation step for meta-analysis

Hi all, still very much a beginner in using QIIME2. I'm doing a meta-analysis involving 6 studies. 3 different primer pairs were used. What I did was process each dataset separately up to the taxonomy assignment step. Before creating the phylogenetic tree, I merged the pertinent QIIME2 artifacts, then proceeded to creating the tree. I used DADA2 for denoising.

Is this the right approach?

Thank you in advance.

HI @Jean0521, I'd be cautious about interpreting phylogenies and performing microbial community analysis from data that have been generated with different primers. :warning:

The reason is that each primer pair will have it's own set of amplification biases, i.e. some primer pairs are better at amplifying (or not) some taxa over others. Which will skew your interpretation of which taxa are present and/or more or less abundant.

Also, you will not be ale to construct a robust phylogeny if the sequenced regions do not overlap. You'd likely have to perform closed-reference OTU picking, or use GreenGenes2 to map your reads to a sequence / phylogeny. Again, this will not remove primer biases and will lead to erroneous interpretation of your data.

I'd advise against merging the data in this way. Especially, if these samples are from different environments!

If they are from the same environments, then you can compare / contrast which primer pairs work best.

2 Likes

Hi @Jean0521

While 3 primer pairs sounds like it could be messy, I think using a logical approach should allow you to gain some valuable outputs.

When you say each dataset was processed separately, does that mean each dataset = 1 primer pair? If so, running each dataset through DADA2 individually sounds good. Assigning taxonomy separately (by primer pair) is also good, as different pairs will target different regions, leading to variations in assignation.

As @SoilRotifer mentioned, be cautious merging these datasets as the regions amplified by these different primer sets might not overlap. You won't be able to compare the results as the phylogenetic relationships have been inferred from different regions.

I wonder could you try to align the amplified regions from the different primer pairs prior to merging, though I'm not sure if this is the best approach.

Best of luck!

Thank you so much, @Mike_Stevenson and @SoilRotifer! (and yes, 1 dataset = 1 primer pair). I'm thinking of doing away with the phylogenetic tree and just use a different distance metric that won't require it (will still work for my research question).

1 Like

This will not remove the primer bias issues we've discussed. The biases arise from the PCR amplification of the sequences. This will occur with or without phylogeny based methods.

If anything, the best you could do is a qualitative / richness comparison, i.e. presence or absence. That is, avoid evenness / abundance based metrics. You can try 'detecting' certain taxa... but again you run into the bias issue.

Either way, making inferences across these different primer sets is generally not advised, for the reasons outlined earlier.

-Mike

I suggest you also read these papers about primer choice and bias. There are many more, but these are a few highlights:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.