Questions about DADA demux/merge workflow

moonlight · February 7, 2020, 2:53am

1>“The option of justConcatenate is not available in Qiime2 as there is in native DADA2 in R”

Just to confirm – if I have 100 F reads and 100 R reads. If their quality are perfect, it should show as “200” reads in the merged column (https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2019.10%2Fdata%2Ftutorials%2Fatacama-soils%2Fdenoising-stats.qzv).

Am I correct? since we don’t do any concatenate, just simply pair them.

2> I was advised not to concatenate to long reads. – I am wondering if long reads would have some advantage to assign to taxa (more accuracy) for downstream analysis?

3> Thanks for the suggestion on Fungal analyses. I will read that link. Hmm, my fungal primers’ barcode is on reverse reads (EMP primers). Hmm, not sure if I should use all reverse reads in this case. Basically, I got three fastq files. F.fastq, R.fastq, barcodes.fastq. – since F.fastq is no barode, I might not be able to demux if I just use single reads denoising workflow for Forward. Normally, Foward reads quality is better. It doesn’t make sense to me if we use reverse reads. I will see, if I have any questions. I will ask? – Any suggestion in advance?

colinbrislawn · February 7, 2020, 8:18pm

Good to hear from you again, John.

If you have 100 forward reads and 100 reverse reads, after pairing you will have 100 paired reads. Illumina sequences each amplicon from both ends, making 2x reads per each amplicon. Pairing... uh... pairs them together to make one read per amplicon.

That's correct! Long reads == more information == more taxonomic resolution.

The reads should match up between forward and reverse, so the barcode only has to be in one read to cover both.

Once demultiplexed, you can use the visualize to see the quality and chose which read to use!

Colin

moonlight · February 7, 2020, 8:54pm

Hi Colin,

1>" The reads should match up between forward and reverse, so the barcode only has to be in one read to cover both." – I will try this. If I have a problem, I will ask you again. Does normally people only use F reads for fugnal analysis? (EMP primers).

2>“That’s correct! Long reads == more information == more taxonomic resolution.” – I agree, but currently DADA2 workflow in QIIME2 only supports to merge F and R reads (which means it will treat F and R as to reads rather than concatenate it ). I have hear Dada2 for R can do this? Will QIIME supports concatenate choice in future?

Mehrbod_Estaki · February 7, 2020, 10:19pm

Hi @moonlight,
I think you may have misunderstood the purpose of justConcatenate vs merging paired-end reads.

Paired-end reads get merged (including in q2-dada2) by aligning the overlap regions, for example:

Forward reads:  =======================
                         overlap_region
Reverse reads:           ========================== 
Merged reads: =====================================

Whereas justConcatenate operates under the assumption that there is no overlap region, instead it inserts a 10 N nt between the 2 reads and pastes them together. So…

concatenated reads: =======================NNNNNNNNNN ==========================

So, while this may seem like it is a ‘longer’ read which would give better resolution, it ultimately will almost certainly will not. First because these reads were likely suppose to share some overlap anyways and perhaps just were a bit short, so the the artificial insertion is causing this feature to be a non-true biological read. Second, this is going to align very poorly to a reference read, if at all, again limiting the taxonomic classification you can assign it. For these (and a bunch others) reasons this method is not recommended in most cases, including yours. If you have insufficient overlap in your paired-end reads that were meant to overlap, I would just discard your reverse reads and carry on.
And while its true, and intuitive, that taxonomic resolution is higher with longer reads, this is not as significant as you may think. It has been a while though since a proper benchmark of this has been done.

moonlight · February 8, 2020, 12:27am

What if there is no overlap region because of it is too short, does q2-dada2 just discard these reads? I suppose only the reads successfully joined to go to next step.

Would it be possible to keep these reads? Sometimes it just too short to join but not necessarily bad for example fungi ITS.

Mehrbod_Estaki · February 8, 2020, 1:11am

Yes, if merging fails then there is no use for those reads so DADA2 discards them.

Not if you are running denoise-paired.

That is debatable. How would you really use these unmerged reads downstream? When you don't know their expected length, inserting 10 Ns in between them doesn't really do you any favors in downstream analyses. How would you align those to a reference database? How do you distinguish different features that are different within those overlap or 10N region? There's lots of issues with it which is why most people don't use unmerged pairs.
If you really want keep everything I would suggest analysing your reads separately (say do everything with your F and R reads separately). Then there is no merging issue.

system · March 10, 2020, 7:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.