Quality in reads suddenly decreases with no apparent reason

Hey guys,

I recently started a new project with 2x250bp Illumina Reads. My FastQC files for the forward reads looked all pretty decent:

I then separate my reads into my specific libraries using cutadapt and furthermore demultiplex them using my barcodes without cutting them. Those files I import using qiime import and check them for their quality again and find this:

This is quite a problem for me because I usually cut at a Quality Score of 20 and then merge the paired reads using dada2. However, here for the first time I have to cut off at ~180bp and wont get a big enough overlap for successful merging. Of course I could include a few more bp with lesser quality but I dont really get where and how my sequences got so bad all of a sudden?

Hi @mfbeuq!

Cutadapt doesn’t change the sequence quality scores, so I don’t think you’re seeing a “change” or a jump - my guess is that the demux summarize viz is showing a subset of data that has this particular quality characteristic. Let’s take a step back - have you reviewed the q2-cutadapt --verbose logs from when you demultiplexed? What did it have to say about reads matched vs filtered? What does the “Overview” tab on the demux summarize viz say?

Finally, I’ll point out that the fastqc plot only has ~58 boxplots on it - I’m not really sure how the base pairs have been grouped or collapsed in that plot, but I don’t think you can directly compare with demux summarize.

Thanks for the quick reply.

I used cutadapt as standalone for demultiplexing: here it finds my barcodes in the forward reads as expected (around 2% for each barcode as I have a library with 48 samples). I import those sequences with qiime import and with SampleData[PairedEndSequencesWithQuality].

I wanted to try subsetting my data and see if there are differences in quality within different libraries.

So I`ve tried increasing the number of reads included in the demux summary but it still looks the same, always a drop in quality quite early.

I have thought the FastQC plot would always include all sequences and as all my FastQC reports look similar to the one shown above, I would not think that by chance a few bad sequences were chosen as representation.

Also using subset, the quality always decreased after ~180bp for the forward reads. For now I`ve cut at 200 and 220 bp for the forward and reverse reads, respectively and rely on the dada2 algorithm.

Are there any other ideas what could have caused the quality to drop?

Like I said before, cutadapt won’t change your quality scores, and neither will importing. I am 99% sure that the FastQC report is misleading you:

  • FastQC plot is showing you ~58 histograms
  • qiime demux summarize is showing you ~245 histograms

I’m not sure how, but the FastQC plot is grouping many of the bases in the plot together, which I think is a bit misleading.

One option, if you really want to convince yourself, is to import your multiplexed reads into QIIME 2 using a manifest format (which assumes reads are demuxed). Doing this you would basically just have one sample. Then, you could run that imported Artifact through this viz - you should see something very similar to what you are seeing post-cutadapt.

Finally, if you want to share a private link to your data, I am happy to take a look for you.