Hi @colinvwood,
Okay, so yesterday I used grep
to do a pretty comprehensive search for the V3–V4 primers and the two adapter trimming sequences in my forward and reverse reads. I searched for each primer and adapter trimming sequence separately, and in every paired combination of "primer plus adapter trimming sequence".
V3–V4 primers:
- F primer: CCTAYGGGRBGCASCAG
- R primer: GGACTACNNGGGTATCTAAT
Adapter trimming sequences:
- Read 1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
- Read 2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Here's a summary of the results:
Separate searches:
-
F primer in F reads:
— 18376 total hits across 17414 reads
— 776 reads containing two or more copies of the F primer
— 106 reads containing three or more copies of the F primer
— 36 reads containing four or more copies of the F primer
— 20 reads containing five or more copies of the F primer
— 15 reads containing six or more copies of the F primer
— 4 reads containing seven or more copies of the F primer
— 3 reads containing eight or more copies of the F primer
— 2 reads containing nine copies of the F primer
-
R primer in F reads:
— 7 total hits across 7 reads (those 7 reads each contain one copy of the R primer)
-
F primer in R reads:
— 11 total hits across 10 reads (1 read had two copies of the F primer)
-
R primer in R reads:
— 11449 total hits across 9969 reads
— 834 reads containing two or more copies of the R primer
— 360 reads containing three or more copies of the R primer
— 169 reads containing four or more copies of the R primer
— 77 reads containing five or more copies of the R primer
— 27 reads containing six or more copies of the R primer
— 9 reads containing seven or more copies of the R primer
— 4 reads containing eight copies of the R primer
-
Read 1 adapter trimming sequence in F reads:
— 136 total hits across 136 reads (those 136 reads each contain one copy of the Read 1 adapter trimming sequence)
-
Read 2 adapter trimming sequence in F reads:
— 109 total hits across 109 reads (those 109 reads each contain one copy of the Read 2 adapter trimming sequence)
-
Read 1 adapter trimming sequence in R reads:
— 33 total hits across 33 reads (those 33 reads each contain one copy of the Read 1 adapter trimming sequence)
-
Read 2 adapter trimming sequence in R reads:
— 31 total hits across 31 reads (those 31 reads each contain one copy of the Read 2 adapter trimming sequence)
Combined searches:
-
F primer and Read 1 adapter trimming sequence in F reads:
— 16 reads contain at least one copy of both
— Read 1 adapter trimming sequence always present in only one copy, whereas multiple copies of F primer sometimes present (max: 4 in one read)
— F primer sequence(s) always upstream of the Read 1 adapter trimming sequence
-
F primer and Read 2 adapter trimming sequence in F reads:
— 12 reads contain at least one copy of both
— Read 2 adapter trimming sequence always present in only one copy, whereas multiple copies of F primer sometimes present (max: 3 in one read)
— F primer sequence(s) always upstream of the Read 2 adapter trimming sequence
-
R primer and Read 1 adapter trimming sequence in F reads:
— no hits
-
R primer and Read 2 adapter trimming sequence in F reads:
— no hits
-
F primer and Read 1 adapter trimming sequence in R reads:
— no hits
-
F primer and Read 2 adapter trimming sequence in R reads:
— no hits
-
R primer and Read 1 adapter trimming sequence in R reads:
— 1 read contains one copy of both
— R primer upstream of the Read 1 adapter trimming sequence
-
R primer and Read 2 adapter trimming sequence in R reads:
— no hits
Now, to answer your question...
Answer: In my reads, no adapter trimming sequences are ever found upstream of either one of the V3–V4 primers.
So, how to proceed...
If both the F reads and R reads contain hits to the F primer, R primer, Read 1 adapter trimming sequence, and Read 2 adapter trimming sequence, I guess I need to remove all of these sequences from both F reads and R reads.
Given that the V3–V4 primers are sometimes found in multiple copies (max: 9), I should probably use --p-times 10
in my qiime cutadapt trim-paired
commands, to ensure that all copies get removed.
Adapter trimming sequences only ever appear as a single copy, so I guess I don't need to use --p-times
on them.
For now, I'm thinking of this approach:
Removing the Read 1 adapter trimming sequence from F and R reads:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux.qza \
--p-adapter-f AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--p-adapter-r AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--verbose > cutadapt_output1.txt \
--o-trimmed-sequences demux2.qza
qiime demux summarize \
--i-data demux2.qza \
--o-visualization demux2.qzv
Removing the Read 2 adapter trimming sequence from F and R reads:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux2.qza \
--p-adapter-f AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
--p-adapter-r AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
--verbose > cutadapt_output2.txt \
--o-trimmed-sequences demux3.qza
qiime demux summarize \
--i-data demux3.qza \
--o-visualization demux3.qzv
Removing the F primer from F and R reads:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux3.qza \
--p-front-f CCTAYGGGRBGCASCAG \
--p-front-r CCTAYGGGRBGCASCAG \
--p-times 10 \
--verbose > cutadapt_output3.txt \
--o-trimmed-sequences demux4.qza
qiime demux summarize \
--i-data demux4.qza \
--o-visualization demux4.qzv
Removing the R primer from F and R reads:
qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux4.qza \
--p-front-f GGACTACNNGGGTATCTAAT \
--p-front-r GGACTACNNGGGTATCTAAT \
--p-times 10 \
--verbose > cutadapt_output4.txt \
--o-trimmed-sequences demux5.qza
qiime demux summarize \
--i-data demux5.qza \
--o-visualization demux5.qzv
Does this seem reasonable to you, @colinvwood?
Also, two other questions:
- Have you ever encountered a situation like this before, where some of the F reads and R reads contain hits to the F primer, R primer, Read 1 adapter trimming sequence, and Read 2 adapter trimming sequence? I'm surprised that all of these sequences are found in both F and R reads.
- Do you know why some of my reads have multiple internal copies of the F or R primer in them? I mean, obviously it's an artefact of some sort. Is it common to find multiple internal copies of amplicon primers in 16S reads?
Thanks, as always, for the help!
EDIT:
I just also searched for the reverse complements of the V3–V4 primers, and the reverse complements of the two adapter trimming sequences:
-
Reverse-complemented F primer: CTGSTGCVYCCCRTAGG
-
Reverse-complemented R primer: ATTAGATACCCNNGTAGTCC
-
Reverse-complemented Read 1 adapter trimming sequence: TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
-
Reverse-complemented Read 2 adapter trimming sequence: ACACTCTTTCCCTACACGACGCTCTTCCGATCT
No hits anywhere for either of the reverse-complemented adapter trimming sequences.
No hits for the reverse-complemented F primer in the R reads.
No hits for the reverse-complemented R primer in the F reads.
But...
Reverse-complemented F primer detected in F reads (8 hits across 7 reads; one read contains two copies).
Reverse-complemented R primer detected in R reads (7 hits across 6 reads; one read contains two copies).
I suppose these reverse-complemented primers should be removed as well? Another cutadapt
step needed?