Demultiplexed joined sequences quality

Hello everyone,

I recently learned that Deblur works with only forward reads, and if we have forward and reverse reads for sequences, we must join the sequences prior to performing sequence filtering thru Deblur. I am done with joining my sequences. Right before I denoise my sequences with Deblur, I was curious to check the quality of their bases. Attached below is a screenshot of the quality score graph for my joined reads:

The following are the questions that I need help with:

  1. Can someone help me in understanding if the quality of my joined reads are good to proceed further? This is the first time I am seeing the values take a dip and then rise even further, to be honest. Does this happen quite often?
  2. The first 30-40 bases have surprisingly low scores, therefore I was thinking of using the "--p-left-trim-len n" parameter of the deblur-16S method to eliminate the presence of the initial few bases. Would that be the right way to approach this issue?
  3. For bases between 150 and 300, the quality scores are above 40 - is that normal and can these be included while performing the deblur-16S filtering step? In other words, what would be a good --p-trim-length parameter value for me? The tutorial mentions that the Deblur developers recommend 115 to 130, however if I were to choose the same for my sequences, I would be dropping majority of my bases.

Looking forward to hearing your thoughts on this.

Best wishes,
Aakarsha.

Hi @aakarsharao,

  1. Quality scores are looking fine to me. Did you already remove primers? If not, I would recommend you to run cutadapt on raw reads before merging with discarding sequences without primers option enabled. Another option is trimming those bases.
  2. Yeah, you can try it. But, in my opinion, if primers still attached it is a good idea to cut them with cutadapt before merging and check quality plots again.
  3. On the plots you can see the overlapping region, on which forward and reverse reads were merged. You can not get quality scores for this region, so they are set to this level artificially.
    No need to truncate the length to 110-130, since it were recommended parameters for training dataset, not universal recommendation. You can choose based on the lengths distribution and quality plots of your data

Hi @timanix ! Thanks a bunch for your response.

I do not know if the primers have already been removed since these samples were sequenced at another facility. I have written to the concerned authority, will know more in some time. If the primers have not been removed, then I will try your suggestion of applying cutadapt on raw reads before merging the sequences. Thanks again for your advice.

To your point in #3, are you referring to the region between bases 150 and 300 as the region in which forward and reverse reads are merged? I am sorry, I could not follow this portion.

Additionally, did I understand correctly that it is not mandatory to include a truncation length if the quality plots of my data look good?

Thank you,
Aakarsha.

Yeah. Quality scores are lost due merging and artificialy set to higher values. No need to worry about this part.

If I remember correctly, truncating parameter is required for Deblur. But you can choose a value, higher than 400 and keep most of the bases in reads.