Merging Raw Reads from Multiple Runs for the Same Samples

Ellenphant · October 3, 2019, 2:33pm

Hello!

I currently just got back sequencing data (Yay!) but am a bit confused on how to handle it.

All of my samples (17 in total) were run on three sequencing runs to increase my number of reads. But now I am at the tricky part of deciding whether to merge the reads before or after denoising with DADA2.

Normally when I have used the merge-table (that definitely might not be the command, but you know the one hopefully) they have been for separate sample sets run at completely different times so it made sense (to me at least) to have them denoised separately.

However, now I can’t decide what to do! I want to say since they are from different runs that they should be denoised separately but maybe not? What is the standard to do here?

In addition to this, if I do denoise them all separately…does that complicate the merging of the table since I will have the same Sample I.D’s showing up 3 times?

Thanks!

jwdebelius · October 3, 2019, 3:07pm

Hi @Ellenphant,

If you’re denoising with DADA2, you need to denoise them separately: one for each run. This will also be really expensive because I think DADA2 training scales with the number of samples and number of reads, and so if I remember correctly, training will take a long time. With deblur, you can denoise the pool, and then perhaps collapse the replicates via qiime feature-table group.

You may find your sequencing depth is far more than you need, though. I’ve gotten very good results with 2500 or 5000 sequences/sample and worked with samples that were multiplexed with 400 samples on a single sequencing run. In my experience, the limitation in microbiome analysis is usually sample size rather than depth. Just… food for thought for the next time?

Best,
Justine

Ellenphant · October 3, 2019, 5:12pm

I don’t really understand your response, sorry. What do you mean by expensive?

I want to continue to use DADA2 because I have used this for the rest of the samples in a larger experiment so… and it wasn’t a plan to submit it for multiple sequencing runs, it’s just what the lab technician ended up having space to run so with the data I feel like I should put it to use.

If I am going with the route of running DADA2 on my multiple runs, is there a way to merge them so that the reads from the same Sample I.D’s so that it’s just one sample with all the reads? If that makes sense…

jwdebelius · October 3, 2019, 8:38pm

If you’re running Dada2, it behaves best if you use the full sequencing run, and not just single samples. It trains a model based on your error profiles. Fewer samples ends up being resource intensive and time consuming (computationally expensive) to build the error model (Sorry for the jargon!), so if you’ve got 17 samples on a single sequencing run, it’s going to take a long time.

Whether you use Dada2 or deblur, you can still use the qiime feature-table group to collapse your samples. So, I’d name them something like s1.1, s1.2 and s1.3 when you do your import, denoise as you see fit, and then merge the tables. Then, you have two options. You can chose to compare the samples (Id use Bray-Curtis and Jaccard, probably) to make sure they’re similar before combining them (or not). If you’re happy with the similarity, I would make a new column in your mapping file that is your original sample name, and use the group function.

So, your combined map would look like this:

sample_name	original_name	…
s1.1	s1	…
s1.2	s1	…
s2.1	s2	…

And then you should be able to use the group function to collapse back to a single sample.

Best,
Justine

system · November 4, 2019, 2:38am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.