Removing primer and quality control

Hallo everyone.

I have a question related to primer remove and quality control. I would appreciate it if you can help me!

My quality plots get worse rapidly especially in end.
Before Cutadapt:


After Cutadapt:

I don't know why are my sequences quality in end getting worse after removing primers.

The cutadapt command:
qiime cutadapt trim-paired
--i-demultiplexed-sequences ./F_rawdata.qza
--p-cores 20
--p-front-f GTGARTCATCGAATCTTTG
--p-front-r TCCTCCGCTTATTGATATGC
--p-match-read-wildcards TRUE
--p-match-adapter-wildcards TRUE
--o-trimmed-sequences F_TrimPri.qza
--verbose

Thanks for help!
After cutadapt.qzv (333.2 KB)
Before cutadapt.qzv (326.6 KB)

Hello @lishaoran0917,

Welcome to the forums! :qiime2:

I hope I can provide some clues as to what is happening before and after running cutadapt on your data.

This is normal for Illumina sequencing, and your quality is still pretty good! Remember that a Q score of 20 is a prediction of 99% accuracy per-base.

After running Cutadapt, reads with adapters sequences will have been removed. So the reads that are left either

  1. Do not have adapters sequences in them
  2. DO have adapters sequences in them, but have so many errors that the sequences could not be detected by Cutadapt

I suspect #2 is happening in your data; the untrimmed reads still have adapters in them, but they have too many errors to be detected and removed. These error filled reads also have low q-scores, as you noticed.

When you move on to the DADA2 step, you can truncate the reads to move that low quality region.

EDIT: Consider using --p-discard-untrimmed, this will remove those spurious sequences in which the cutadapt is unable to find and trim the primers. This should drastically remove the trailing low-quality bits. Also, by leaving in the untrimmed data, you will have variable length data, which may negatively affect denoising and inflate ESV counts.

Thank you very much! I have truncated the low quality region in DADA2 step.

I don' understand why adapter sequences will represent in end. Shouldn't adapter sequences only appear at the beginning in forward or reverse reads?

If your region is short and your reads are long, the sequences could cover the amplicon with 100% coverage in both directions then extend into the reverse primer on the other end of the read.

In this case, you would find the reverse complement of the R2 start primer at the end of the R1 reads. Near the end of the R2 reads, you would find the reverse complement of the R1 start primer.

This is option #2 in Overlap Status.

Try it and find out!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.