sequence length increase after pair-end merge?

Hi all,
I received pair-end fastq files for 20 samples(10WT and 10KO) separately. So I used “Fastq manifest” format to import in qiime2 v2020.8. After I ran "qiime demux summarize", I saw the quality plot as below:



I didn't see any boxplot, and I'm not sure how long should I trim the reads. So kept all and used 0 for both --p-trunc-len-f and --p-trunc-len-r in dada2 denoise-paired step, then I got the sequence length in rep-seqs.qza as this:

It seems before the ends merge I have sequence length 250 for both ends, but after merge I got mean length 300? How is that possible? Should I trim my reads? Is there something wrong in the quality plot?Thank you so much for your time!

Hi!
Which rRNA region you targeted on PCR step?
It is logical that joined reads are longer than forward and reverse reads since forward and reverse reads are merged by overlapping region.merge

Thank you for your answer. I got the data from my collaborator and I was told that the hypervariable V3 region of 16S rDNA was amplified and sequenced using an Illumina second-generation sequencing platform. Does the pattern for quality plot make sense? I'm not sure why it looks 'weird', as I saw other quality plots usually have a decreasing trend of quality scores at the end.

They looks nice to me. Check the merging stats - how many of reads were succesfully merged and retained for the analysis. If you think that you lost too much - try to apply trimming and check again. You should know approximate size of amplicons and only trimm the reads in a way that you have at least 20 nt of overlapping region between forward and reverse reads

Hi @ynano,

just to add a note on the excellent answer from @timanix, on your box plot looking ‘weird’. What it is showing is that your quality score are so close that the box-plot is shown as a horizontal line. That means potentially two things: (a) your sequences are really good; or (b) the quality score were transformed. The case (b) is not really unusual, it may happen for some sequencing provider or even upstream, the newest Illumina machines (NovaSeq) performs by default a binning for the quality scores which tend to look as your. So, I suggest to go back to your collaborator to ask more information on the used primers (which will lead to the expected amplicon size) and the exact Illumina machine used to get those sequences.
If the machine used is either MiSeq or HiSeq, double check if the facility did perform any quality transformation on the data, if not you a lucky one and good to go.
In case of the quality were transformed or the machine used is a NovaSeq (or you are in doubt), I would suggest to use deblur for denoising instead of dada2. The reason is that dada2 needs unchanged quality scores for its processes while deblur does not rely on quality scores!
Hope it helps
Luca

3 Likes

Hi @llenzi!
Thank you for a clarification.
So, to process reads from NovaSeq you would recommend qiime deblur denoise-16S plugin? In my case, I have paired reads (and @ynano as well). Should we use qiime vsearch join-pairs to merge the reads before deblur, or there is another way to denoise paired NovaSeq reads

Hi @timanix,
I think is a good question!
Last I read on this was:

and

I am no sure if the most recent dada2 releases address the issue, as well as if any of these have been released into qiime2.

maybe @benjjneb could help on this to clarify?
Still, we are now go astray from the initial question, so I suppose if there are more question we better create a new topic for the sake of the forum!
Cheers
Luca

2 Likes

Hi @llenzi,

This is very helpful, I will check more details with my collaborator. Thank you so much!

The percent of reads merged range from 82 to 88 across samples. So I thought it is ok without any trimming. But I will check more information about sequence steps first. Thank you again for your help!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.