Query about false read merging or no read merging

Hello everyone! I hope that you are well.

I have performed 2x250 bp paired-end sequencing of the V4-V5 region using a V2 500-cycle kit (7.5pM library with 20% PhiX). I got a cluster density of 1100K/mm², and a cluster passing filter of 79%. Q>30 at the start of the sequencing was 93%. However, when I got the data the read quality was not as expected. There was a sharp decrease in read quality with Q-scores falling below Q20 at 180bp for the forward read. The reverse read was even worse. I have the following questions about this:

  1. What would have gone wrong that I obtained such bad quality data when the sequencing run started off so well?
  2. Because the reads quality decreases so early, DADA2 will drop most of the reads so at what length should i trim the reads?
  3. If I trim the reads too much, I will not get an overlap between R1 and R2. How to handle this?
  4. Is it a good approach to only use forward reads trimmed to a length of 180bp at which the quality scores are good?
  5. Is single-end data still acceptable for a publication

Thanks in anticipation!

1 Like

Hello!

I don't have experience with the region you are working with, so it is hard for me to tell whether such drops are typical for that region. You can try to contact the facility that performed the sequencing. But V4 region alone usually works pretty well for me.

That's right, you should try on of the following:

  • relax max ee settings. It will allow more errors in the reads that pass the filter.
  • select optimal truncation parameters based on the quality plots.
    My advice here is to try multiple combinations to see if you can get good results with at least one.

That's right. I assume you meant "truncate" instead of "trim". You can try to reduce the overlapping region, but if your expected region size is around 400 and R1 reads are dropping to 180, with R2 even worse, I doubt you will manage to properly merge R1 and R2 reads. But again, you should try several combinations just to be sure.

Good question! I wouldn't say this approach is good, but it is valid and sometimes it is the only option to recover the data. I use it as a last resort when other options are not working. In your case, I would go for it after checking other options, since R1 reads are covering V4 region.

Yes, you can specify in the Methods section that, due to quality issues, you weren't able to merge the reads and therefore used only forward reads for the analyses. Optionally, you can attach a relevant figure as supplementary material.

The good news is that the V4 region is pretty short and often used alone for metabarcoding, and your forward reads cover it, at least partially.

Hope that helps,
Timur

1 Like