I am working with paired-end ITS data and while I have no problem removing forward adapters, no matter what I put for my reverse primer after the -A flag, I always get primers left over.
And here is my general pipeline where I first just look at how the primers are distributed in the reads, followed by running cutadapt, and then running my sanity check again to check that all primers are removed. While I do see that some primers are removed for the reverse reads in the RevComp column, it is still not all 0…
FWD <- "CTTGGTCATTTAGAGGAAGTAA"
REV <- "GCTGCGTTCTTCATCGATGC"
> FWD.orients <- allOrients(FWD); FWD.orients
Forward Complement Reverse RevComp
"CTTGGTCATTTAGAGGAAGTAA" "GAACCAGTAAATCTCCTTCATT" "AATGAAGGAGATTTACTGGTTC" "TTACTTCCTCTAAATGACCAAG"
> REV.orients <- allOrients(REV); REV.orients
Forward Complement Reverse RevComp
"GCTGCGTTCTTCATCGATGC" "CGACGCAAGAAGTAGCTACG" "CGTAGCTACTTCTTGCGTCG" "GCATCGATGAAGAACGCAGC"
# Check primers in original uncut/unfiltered files; not filtering because I lose way too many reads
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs[[1]]),
FWD.ReverseReads = sapply(FWD.orients,primerHits, fn = fnRs[[1]]),
REV.ForwardReads = sapply(REV.orients, primerHits,fn = fnFs[[1]]),
REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs[[1]]))
# Output before any trimming is done:
Forward Complement Reverse RevComp
FWD.ForwardReads 27211 0 0 0
FWD.ReverseReads 1720 1715 1707 18415
REV.ForwardReads 0 0 0 26660
REV.ReverseReads 26766 1826 1728 1744
FWD <- FWD.orients[["Forward"]]; FWD # keep the same as is
REV <- REV.orients[["RevComp"]]; REV # because it matches to the revcomp
> REV.orients <- allOrients(REV); REV.orients
Forward Complement Reverse RevComp
"GCTGCGTTCTTCATCGATGC" "CGACGCAAGAAGTAGCTACG" "CGTAGCTACTTCTTGCGTCG" "GCATCGATGAAGAACGCAGC"
Run cutadapt (in terminal). I have noticed that removing the anchors makes this work better for forward reads, because if I use ^ or &, there is always a few primers left in forward reads in the forward orientation. Linking without anchoring has given me the best results thus far but I need them to all show 0 primers present at the end.
Cutadapt code:
for r1 in *_R1.fastq; do r2=${r1/_R1.fastq/_R2.fastq}; sample=${r1%%_R1.fastq}; /home/chauk/.local/bin/cutadapt \
-a CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC -A GCTGCGTTCTTCATCGATGC...TACTTCCTCTAAATGACCAAG \
-n 2 -o original/${sample}_R1_cut1.fastq -p original/${sample}_R2_cut1.fastq $r1 $r2; done
Sanity check these cut files and here’s the output:
Forward Complement Reverse RevComp
FWD.ForwardReads 0 0 0 0
FWD.ReverseReads 1720 1715 1707 1518
REV.ForwardReads 0 0 0 0
REV.ReverseReads 1834 1826 1728 1744
As you can see, forward reads are all good and reverse reads have most of the primers removed in the RevComp orientation…but they still exist at every orientation throughout. I don’t get it.
One more thing to add in case this matters…the sequencing place did use 4 length variations of adaptor + primer sequence as shown below, and I don’t know if this matters. I did try running cutadapt on just the adaptor sequence as well and that made virtually no difference. Also, does not explain why the forward reads are completing fine after trimming but the reverse reads are essentially contaminated with primers after trimming,
Forward1 = ACACTCTTTCCCTACACGACGCTCTTCCGATCTCTTGGTCATTTAGAGGAAGTAA
Forward2 = ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCTTGGTCATTTAGAGGAAGTAA
Forward3 = ACACTCTTTCCCTACACGACGCTCTTCCGATCTACCTTGGTCATTTAGAGGAAGTAA
Forward4 = ACACTCTTTCCCTACACGACGCTCTTCCGATCTCAACTTGGTCATTTAGAGGAAGTAA
Reverse1 = GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCTGCGTTCTTCATCGATGC
Reverse2 = GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTGCTGCGTTCTTCATCGATGC
Reverse3 = GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTACGCTGCGTTCTTCATCGATGC
Reverse4 = GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCATGCTGCGTTCTTCATCGATGC