Using datasets created with different technologies

tenguzame · June 8, 2018, 4:23pm

Hi everybody,
I have some different prokaryotic 16S amplicon sequencing datasets I'd like to analyze. However, they have all been created at different times and by means of either 454 pyrosequencing or Illumina MiSeq. Are either DADA2 or Deblur suitable for the analysis of all these datasets at once or should I analyze them separatedly? If so, how can I compare them?
Best regards

Michael Tangherlini

Mehrbod_Estaki · June 8, 2018, 4:46pm

Hi @tenguzame,

Unfortunately no. Deblur's error model is designed for reads from Illumina MiSeq & HiSeq platforms and while DADA2 can handle 454 data it does require each run to be denoised separately. Your best bet is to use dada2 separately for each run and merge downstream.

tenguzame · June 11, 2018, 4:36pm

Hi Merhbod,
Thanks for clarifying. How should I then merge the two feature tables? I guess that I'd get both identical and different ASVs for the two tables... how can I handle this?
Best regards

Michael Tangherlini

Mehrbod_Estaki · June 11, 2018, 7:06pm

Hi @tenguzame,

To be frank, I've never combined 454 with Illumina data, and I'm not aware of any benchmarking that I could point you to. As you suggested due to the differences in the reads, especially the different lengths, you will get multiple ASVs of the same features so certainly we need to acknowledge this. First, do the primers target the same region or are they different? There's a few strategies that might be worth trying, depending on the answer.
According to this reply from the DADA2 developer, if you can truncate your 454 reads to match the illumina reads length and position then merging them after denoising should be fine. This assumes the target region is the same. If they are from different regions then you might benefit from the fragment-insertion plugin, though I'm not sure how these being from different sources affect its performance, if at all, perhaps @Stefan can weigh in on this.
As a last resource you could simply process the 2 reads separately until you assign taxonomy to them, then merge the feature tables based on their taxonomic classification.
Sorry I couldn't provide any definite answers, curious as how this turns out though!

Stefan · June 11, 2018, 7:24pm

Hi @tenguzame,
I don't see a way how to assess quality of a merged analysis. Thus, be very careful with any interpretations. Technically, if you are able to run DADA2 separately on both sequencing methods you should be able to insert both fragment sets into one reference tree with the q2-fragment-insertion plugin, to then compute beta-diversities.
Honestly, I would be very surprised if any metadata variable clusters samples stronger than the separation by sequencing method. But it might be worth to check. If I am right, I don't see any purpose in analysing in a combined fashion.

Mehrbod_Estaki · June 11, 2018, 8:12pm

There was recently a related discussion regarding merging different runs from different primers and technologies that might also be of interest. Forgot to include it earlier.

tenguzame · June 12, 2018, 9:10pm

Hi @Mehrbod_Estaki,
Thanks for showing me the DADA2 developer's reply. My situation is pretty much the same shown in the post: I have two datasets produced using the same primers (so on the very same region) but with two different technologies (MiSeq vs. 454).
It's very interesting to see that, more or less, the suggested procedure looks very much like that suggested for using Minimum Entropy Decomposition with 454 datasets.
Thus, if I understand correctly, the steps involved should be:

pre-process MiSeq sequences and denoise them to produce a Feature Table as usual;
truncate my 454 reads at the same length employed for the MiSeq ones and produce a Feature Table;
merge the two Feature Tables and go ahead with the analysis.
Is it correct?
I'll give it a try!
Thanks again!

Michael Tangherlini

Mehrbod_Estaki · June 12, 2018, 10:07pm

With the same primer regions I'd say that's a good place to start @tenguzame! I'm very interested on the result of this set up so do keep us posted.
My concerns echo @Stefan's so practice some caution in your interpretation but would be cool if it all works out. Also don't forget to use dada2's 454 plugin when you are denoising your 454 data, keeping in mind that they are only single end reads.
Good luck!

system · July 14, 2018, 4:08am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.