Quality in reads suddenly decreases with no apparent reason

mfbeuq · July 17, 2020, 1:41pm

Hey guys,

I recently started a new project with 2x250bp Illumina Reads. My FastQC files for the forward reads looked all pretty decent:

I then separate my reads into my specific libraries using cutadapt and furthermore demultiplex them using my barcodes without cutting them. Those files I import using qiime import and check them for their quality again and find this:

This is quite a problem for me because I usually cut at a Quality Score of 20 and then merge the paired reads using dada2. However, here for the first time I have to cut off at ~180bp and wont get a big enough overlap for successful merging. Of course I could include a few more bp with lesser quality but I dont really get where and how my sequences got so bad all of a sudden?

thermokarst · July 17, 2020, 3:27pm

Hi @mfbeuq!

Cutadapt doesn't change the sequence quality scores, so I don't think you're seeing a "change" or a jump - my guess is that the demux summarize viz is showing a subset of data that has this particular quality characteristic. Let's take a step back - have you reviewed the q2-cutadapt --verbose logs from when you demultiplexed? What did it have to say about reads matched vs filtered? What does the "Overview" tab on the demux summarize viz say?

Finally, I'll point out that the fastqc plot only has ~58 boxplots on it - I'm not really sure how the base pairs have been grouped or collapsed in that plot, but I don't think you can directly compare with demux summarize.

mfbeuq · July 20, 2020, 3:07pm

Thanks for the quick reply.

I used cutadapt as standalone for demultiplexing: here it finds my barcodes in the forward reads as expected (around 2% for each barcode as I have a library with 48 samples). I import those sequences with qiime import and with SampleData[PairedEndSequencesWithQuality].

I wanted to try subsetting my data and see if there are differences in quality within different libraries.

mfbeuq · July 30, 2020, 3:27pm

So I`ve tried increasing the number of reads included in the demux summary but it still looks the same, always a drop in quality quite early.

I have thought the FastQC plot would always include all sequences and as all my FastQC reports look similar to the one shown above, I would not think that by chance a few bad sequences were chosen as representation.

Also using subset, the quality always decreased after ~180bp for the forward reads. For now I`ve cut at 200 and 220 bp for the forward and reverse reads, respectively and rely on the dada2 algorithm.

Are there any other ideas what could have caused the quality to drop?

thermokarst · July 30, 2020, 4:17pm

Like I said before, cutadapt won't change your quality scores, and neither will importing. I am 99% sure that the FastQC report is misleading you:

FastQC plot is showing you ~58 histograms
qiime demux summarize is showing you ~245 histograms

I'm not sure how, but the FastQC plot is grouping many of the bases in the plot together, which I think is a bit misleading.

One option, if you really want to convince yourself, is to import your multiplexed reads into QIIME 2 using a manifest format (which assumes reads are demuxed). Doing this you would basically just have one sample. Then, you could run that imported Artifact through this viz - you should see something very similar to what you are seeing post-cutadapt.

Finally, if you want to share a private link to your data, I am happy to take a look for you.

mfbeuq · August 20, 2020, 8:03pm

I tried around quite a bit and it appears that as you suggested FastQC is giving me misleading quality reads. Thanks for your input, I really appreciate it!

Cheers!

system · September 21, 2020, 2:03am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.