Low quality forward reads


(Meha) #1

I used the tool. I now have the plots. But my forward base sequence are in low compared to reverse.
*I used demux plugin for paired end. Since Qiime2 does currently not have for dual index sequence I used the method only for forward. Is taht the case?

Thanks a lot


Interactive plots missing in demux summarize plot
(Meha) #2


(Mehrbod Estaki) assigned Mehrbod_Estaki #3

(Mehrbod Estaki) #4

Hi @Mehrdad,

Thanks for providing your plots. You’re right that Qiim2 currently does’t have a way to deal with dual indexed reads but don’t let that stop you from utilizing the full potential of your reads. You can demultiplex your dual indexed reads outside of qiime2, for example qiime1 could handle this and I believe bcl2fastq does this also. Once demultiplexed then you can simply import into qiime2 and go from there.
That being said a few notes I noticed.

  1. In your plots your forward reads actually appear to be in much better shape than your reverse. You can see that the drop in quality scores in your forwards occurs around 170 bp but in your reverse reads it happens around 100 bp. So if you plan on using only 1 set of your reads, I would stick with forwards and not the reverse.
  2. Your dada2 denoising stats summary certainly implies that you have used both reads and you lose much of your data at the merging step. This usually means that you had insufficient overlap in your reads for merging, which is a common issue on this forum and you could look that up for further details. You’ll have to adjust your truncating values.
  3. If you actually do only use forward reads you wouldn’t run into this issue. So if that was your intentions, just re-run DADA2 but make sure you use the dada2 denoise-single option.
  4. Finally, the quality scores plot you are showing appear very clean-cut compared to a typical plot that looks more fluid. This may actually be how the distributions are but in my experience more often this is indicative of some prior quality control or filtering which you want to avoid for DADA2 as it may interfere with the error-model it builds. Just something to consider.

(Mehrbod Estaki) unassigned Mehrbod_Estaki #5

(Meha) #6

@Mehrbod_Estaki I like your description way!
Thanks a lot for the informative explanation, quite simple and comprehensive.

In reference to the issue, I would like to ask some questions to be more clarified.

  1. If I follow the single-end method, should I import only forward fastq file? Or can I use the previous imported file including both forward and reverse fastq file?

  2. Would not be problematic if I take up the single-end method in the final result although I have two type of reads (forward and reverse)? I mean does it have as same as the result when I process the paired-end one?

Thanks a zillion


(Meha) #7
  1. Does a PCR amplicon length impact quality (in my case it is around 500 bp)?
    *The sequencing platform was Hiseq Illumina

(Mehrbod Estaki) #8

Hi @Mehrdad,
Glad you found it useful!

  1. Yes! You can simply reuse your already imported file that has both froward and reverse files and just run dada2 single. The reverse reads will simply be ignored.

  2. The difference between the outcome of using paired-end vs single-end reads really depends on your samples and questions. In general, the overall patterns should be the same but with single-end reads you may have lower resolution. With paired end reads you have the added advantage of having longer reads which may enhance taxonomic classification and in some instances better quality reads as the overlap region can help correct ambiguous base calls. But paired-end data requires some extra considerations for it be used effectively. In the case of DADA2, you need some good overlap between your forward and reverse reads (20bp min) for proper merging. If your reads fail to overlap and thus merge then they will simple be dropped off and you may actually end up with fewer reads than if you were to just use your forward reads.

  1. Yes certainly! If your expected amplicon length is 500bp and you had a 2x250 bp HiSeq run as it appears you did from your quality plots, then you probably have no overlap region or at least not sufficient overlap. Your stats summary figure certainly shows a massive drop in the merging step which supports this notion. In your case you may not benefit from paired-end use so I would just stick with the forward reads. In the future you may wish to consider revising your primers so they have more overlap or increase your 2x250 bp run to 2x300 as to capture more reads in the overlap region.
    All the best.