Combining data from different sequencing centers and primers

ninaxhua · May 22, 2018, 2:49pm

If we used two sequencing centers that used different primers (V4 only and V3+V4), would it be fine to generate two separate DADA2 feature tables, then merge the feature tables later to analyze the samples together? All samples are unique in both data sets.

Mehrbod_Estaki · May 23, 2018, 1:11am

Hi @ninaxhua,

You should for sure run DADA2 separately on each run as you suggested but I wouldn't merge the tables just yet. Using 2 different regions will give you different sequences for the same potential taxa simply because you are looking at a different region. This as you can imagine can create false separation on a PCoA plot and can have confounding effects on further analysis. The 2 options that come to mind are
a) Treat the 2 tables separately all the way until you assign taxonomy to them, then merge the tables.
b) A better and more powerful approach would be to use the fragment-insertion plug-in described here which is specifically designed to deal with combining data with different regions.

antgonza · May 23, 2018, 1:48pm

Just to complement this answer perhaps worth checking this other discussion:

ninaxhua · May 24, 2018, 6:53pm

Would you recommend doing what Nicholas Bokulich stated in the discussion Antonio linked?: trimming both datasets to the same primer sites, dada2 separately, then merging?

So I would trim the run that used V3+V4 primers to the region that only includes V4? Would I trim off the adapter sequence and preceding bases?

The sequencing center that used the V4 primer linked this as their adapter sequences. Will the adapter sequence be the rRNA gene-specific primer sequences or Illumina platform-specific sequences?

I'm a little confused on where the fragment insertion plug in can be used?

antgonza · May 25, 2018, 5:41pm

Hi @ninaxhua,

To be honest, I haven't seen any published meta-analysis with DADA2 that combine different primers and sequencing technologies but that doesn't mean that they don't exist or it's not possible to generate them. Anyway, in my personal experience, we normally use close reference but in recent months we are moving to use deblur. In fact, the fragment-insertion tutorial is an example of deblur meta-analysis combining different regions.

Well, in theory this should work but in praxis I haven't seen great success using close-reference. I believe this is due to the primer biases. However, I don't know if anyone has actually done this test with either DADA2 or deblur, perhaps worth checking, if this is something you are interested in.

I'm not sure I follow this question. Anyway, the primer normally is in the forward and sometime in the reverse read; however, this depends on the sequencing protocol (not the sample preparation where you add the primers). My suggestion will be to ask your sequencing center to be sure. BTW a lot of times is pretty easy to see if your primer is in your sequences cause you will see them "clearly" once you inspect your sequences.

Basically, fragment insertion is used after you get your denoised sequences (produced by DADA2 or deblur) and this seems to reduce the effect of the different primers at least based on the tutorial linked in my first answer in this message.

Hope this helps.

ninaxhua · May 25, 2018, 7:06pm

So following the insertion tutorial, I need to generate the denoised sequences (using DADA2 or deblur) then use the insertion plug in. At what point are the datasets merged? After insertion?

antgonza · May 25, 2018, 9:37pm

Both DADA2 and deblur will produce a feature table and a representative set, both as qza's. You can merge them, just after that step via: qiime feature-table merge and qiime feature-table merge-seqs. The output of merging your sequences will be the input to fragment insertion, and the output of the feature table merging will be your new merged sequences.

system · June 26, 2018, 4:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.