Can I merge the sequences from two runs with the same sample?

Susun · May 19, 2021, 3:29am

Dear all,
Since the amount of data sequenced from several samples is relatively small, there are only a few thousand sequences. I used the same DNA , the same primers and the same parameters for the second sequencing of these samples, only a few thousand sequences were obtained.
Can I combine these thousands of sequences with the sequence obtained from the first sequencing for analysis? Is this feasible? What methods can be used?
I think it is necessary to use DADA2 to denoise separately, how should the results obtained in this way be combined?

Thank you!

wburgess · May 19, 2021, 12:04pm

I'm not positive I understand precisely what you have, nor what you want, nor am I a great expert. But I'll share a thought and you can see if it's useful to you.

One inelegant approach, assuming computer-time and some redundant work is no issue to you, is that you could put all the sequences you may want to merge into one run, denoise them all, etc. Get a feature table. Then use qiime feature-table group to combine what needs combining.

Try qiime feature-table --help, or possibly (again, not 100% sure what you have or want) qiime feature-table group --help or qiime feature-table merge --help or qiime feature-table merge-seqs --help for some possibly useful tools.

I hope this helped. But if not, this seems a common enough problem that I'm sure you can find (or be pointed to) a superior solution.

[Edit for grammar.]

llenzi · May 19, 2021, 2:27pm

Hi @Susun,
If you want to use dada2, yes it is necessary to denoise the samples separately, then merge the feature table and sequence with feature-table merge and feature table merge-seqs, as @wburgess is suggesting.
In this way, you will work with the samples as biological replicate run on different lanes.
If you would like to denoise together samples from different runs, you can use deblur instead of dada2.
If you would like to create new sequence files by adding together sequences from two runs (although they are form same samples with same kits), I never done that. In my mind would make more sense to process them as biological replicates to give more statistical power to your analysis rather than try to pull sequences up to get deeper coverage.

Hope it helps
Luca

Susun · May 21, 2021, 3:03pm

Thank you, I will try your suggestion

Susun · May 21, 2021, 3:05pm

Thank you for your suggestion, I will seriously consider it