Hi everyone!
I have sequenced 19 samples (Illumina 2x300 PE, without multiplexing) in three different illumina runs (as technical replicates) to reach high reads number. My question is how to merge the three different runs?
Following tutorial, it seems to perform DADA2 separately on the three different runs and then merge it all together, but in the FMT tutorial the different runs have different samples…
How can i do?
Hi @besimauda! We currently don’t support merging feature tables that have overlapping samples and features. In the current release of QIIME 2 (2017.7) we support merging overlapping features but not overlapping samples. In the upcoming QIIME 2 release (2017.8) we’ll also support merging overlapping samples, but we still won’t support merging both overlapping samples and overlapping features. There’s a few ways to tackle this and we have an open issue to port this “sample collapsing” functionality from QIIME 1 into QIIME 2. That functionality should be available within the next couple of releases (a month or two); we’ll follow up here when it’s available!
In the meantime, you can either analyze your technical replicate samples separately (i.e. treat each replicate as its own sample), or you can export your data, use QIIME 1’s collapse_samples.py script, and then import the resulting data into QIIME 2.
Hi @jairideout! Thank you for your reply!
I followed your suggestions and I tried all ways you suggested but the best is to treat samples separated as technical replicates...
I also tried to merge fastq files using UNIX 'cat' command, but you have to be sure of the sequence order of PE... and it is to long writing command....
The way using 'collapse_sample.py' afterwards treated single sequencing lines/runs separately, as reported in the
... It leaves me in doubt because treating lines/runs separately may produce different OTUs/SVs (both sequences and/or names!) then it become difficult to merge data... is it right my opinion?
There's a few sides to this coin, but in short, this isn't a problem when the trimming parameters you set for your multiple denoise steps are the same. This means that each read from your replicates has the same opportunity to be the same sequence (in length and identity). Now because there is the error-correction and merging step, it is possible for the same sequence to be interpreted differently in the context of their independent run, but that is kind of the point of denoising separately.
What should ultimately happen is that real errors are corrected or removed, and you are left with a set of real forward reads that are the same length and real reverse reads that are the same length (between all replicates). When the reads merge well (sufficient quality trimming was done with enough overlap to avoid the merged read from being discarded), you'll end up seeing the same alignments between replicates. And so, you'll get the same merged sequences. Once you've got that, we by default create feature IDs that are just a hash of the sequence. That means that the same sequence always receives the same ID. It is also possible (and how DADA2 works under the hood) to just use the sequence itself as the ID.
From there, you should be able to merge the tables and the rep-seqs like in the tutorial (each sample ID will need to be unique between replicates). Also, you'll need to merge twice, once for replicates A and B, to make INTERMEDIATE, and then again between INTERMEDIATE and C to make FINAL.
Once that is done, using collapse_sample.py as suggested should give you what you need.
I'm not sure where in the process you are attempting this, but it shouldn't be necessary.
Let me know if I misunderstood something, or if the above doesn't make sense. Basically, I think the merge step is the "riskiest" part of making your replicates match in sequence identity, but choosing conservative trimming parameters should help, and worst case, you can always just analyze the forward reads without looking at the reverse (pass the same data to denoise-single).