I fell into the swamp of 273bp after trimming adapters and primers through cutadapt...

Hello, everyone! I'm a newbie of QIIME2.

I ran Illumina iseq 100 amplifying the V4 region (515F/806R) and set the cycle by 1X300. I used Nextera XT index v2 (S502, N701...) to barcode sequences.

Now I'm having trouble removing the adapter and primer. After removing primers, expected size of V4 region is 252 bp but it wasn't.

I amplified V4 region using primers with Illumina Nextera transposase adapter sequence like this.

1) 16s V4 FW primer : TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG (Nextera transposase) + GTGCCAGCMGCCGCGGTAA (515F)

2) 16s V4 BW primer : GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG (Nextera transposase) + GGACTACHVGGGTWTCTAAT (806R)

Then I used the trimming command as follows.

**First, I tried using adapter trimming sequence in the primers. **
qiime cutadapt trim-single
--i-demultiplexed-sequences abxmouse-demux.qza
--p-cores 32
--p-front GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT
--p-adapter TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
--p-error-rate 0.1
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--o-trimmed-sequences abxmouse-trimmed-demux2.qza
--verbose > cutadapt-log.txt
Adapter trimming
-> Most of them ended up with 273 bp left over.

**Second, I tried using adapter trimming sequence as follow Illumina webpage. **
qiime cutadapt trim-single
--i-demultiplexed-sequences abxmouse-demux.qza
--p-cores 32
--p-front GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT
--p-adapter CTGTCTCTTATACACATCT
--p-error-rate 0.1
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--o-trimmed-sequences abxmouse-trimmed-demux2.qza
--verbose > cutadapt-log.txt
image
-> Most of them ended up with 273 bp left over too.

**Third, I tried using --p-anywhere parameter. From this step on, I've set all options to default except minimum length. **
qiime cutadapt trim-single
--i-demultiplexed-sequences abxmouse-demux.qza
--p-cores 32
--p-anywhere TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT
--p-minimum-length 250
--o-trimmed-sequences abxmouse-trimmed-demux2.qza
--verbose > cutadapt-log.txt
image
-> Most of them ended up with 273 bp left over too.

Fourth, I tried using both --p-front and --p-adapter parameters.
qiime cutadapt trim-single
--i-demultiplexed-sequences abxmouse-demux.qza
--p-cores 32
--p-front TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT
--p-adapter TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT
--p-minimum-length 250
--o-trimmed-sequences abxmouse-trimmed-demux2.qza
--verbose > cutadapt-log.txt
image
-> Most of them ended up with 273 bp left over too.

I found the trimmed sequence from trimmed fast file, but they were not exist. So I am so confused why the most of forward sequence is 273 bp.

I really wondering what is the most appropriate command for Illumina 1x300 single end sequencing and why the sizes of sequences are mostly 273 bp, not 252 bp.

Thank you for reading this long and chaotic post.

Hi @ehdud0505,

Simply provide the primer(s), often there is no need to provide the adapter sequence. In fact, you are likely to get spurious outputs using the combined adapter and primer sequence. I suggest starting with providing the reverse-compliment of the reverse primer. The adapters will occur before the forward primer or after the reverse primer (on the same strand). So, I'd suggest simply providing the forward and reverse primers.

However, given that the quality towards the end of a single long forward read is often poor, you may not be able to find the reverse primer due to sequencing error. So, it might be best to run cutadapt twice. Once for only the forward primer with discard untrimmed, and a second time with the reverse primer without discard untrimmed.

First let's use both primers at once. Note: we apply the reverse compliment of the reverse primer with the --p-adapter flag, as that searches the 3' end, where as --p-front searches the 5' end. You can use this handy tool, to obtain your reverse compliment.

qiime cutadapt trim-single \
    --i-demultiplexed-sequences abxmouse-demux.qza \
    --p-cores 8 \
    --p-front GTGCCAGCMGCCGCGGTAA  \ 
    --p-adapter ATTAGAWACCCBDGTAGTCC \ 
    --p-error-rate 0.1 \
    --p-match-read-wildcards \
    --p-match-adapter-wildcards \
    --p-discard-untrimmed \
    --o-trimmed-sequences abxmouse-trimmed-demux.qza \
    --verbose > cutadapt-log.txt

Or run in two steps:

# we're likely to find the forward primer so let's discard where we can't find it
# also use reverse compliment of reverse V4 primer (as it is on the same strand).

qiime cutadapt trim-single \
    --i-demultiplexed-sequences abxmouse-demux.qza \
    --p-cores 8 \
    --p-front GTGCCAGCMGCCGCGGTAA  \
    --p-error-rate 0.1 \
    --p-match-read-wildcards \
    --p-match-adapter-wildcards \
   --p-discard-untrimmed \ 
    --o-trimmed-sequences abxmouse-trimmed-01-demux.qza \
    --verbose > cutadapt-log-01.txt

# we're less likely to find the reverse primer at the 3' end so let's keep 
# regardless if we find the primer. Probably rely on truncation / trimming.
qiime cutadapt trim-single \
    --i-demultiplexed-sequences abxmouse-trimmed-01-demux.qza  \
    --p-cores 8 \
    --p-adapter ATTAGAWACCCBDGTAGTCC \  
    --p-error-rate 0.1 \
    --p-match-read-wildcards \
    --p-match-adapter-wildcards \
    --o-trimmed-sequences abxmouse-trimmed-02-demux.qza \
    --verbose > cutadapt-log-02.txt

Try these commands or any other iteration and let us know what you get.

-Cheers!

1 Like

Given you're consistently getting 273bp as the dominant amplicon length from several variations in the processing, I suspect your expected length of 252bp is wrong. Maybe do an in-silico PCR with your primers on a reference sequence to confirm the expected amplicon sequence?

I would also take the top few unique (273 bp) sequences by read count, and put them into NCBI BLAST to see what they are. Do they match V4, perhaps from a slightly different species than you expected?

Thank you so much @SoilRotifer!

I tried above two command to cutadapt the primers. And I got the results from two method.

First, when I cutadapt the primers at once, It was like this.

Second, when I use double cutadapt method, I could get the result like this.

I don't know if I'm thinking correctly, cutadapt only once is more appropriate for us, because when I get more than 250 bp of sequences it just left only 1800 bp at all.

I'll proceed next step, thank you again!

-ehdud0505-

1 Like

Hi, @peterjc!

Thank you for your response.

I did NCBI BLAST to follow your suggestion and there wasn't a difference between my expect and the BLAST result.

It was really helpful, thank you so much!

-ehdud0505-

Hi @ehdud0505,

I'm glad one of those approaches worked! Good luck on the rest of your analyses!

-Mike

Hi @ehdud0505,

I forgot to mention, the reason why your reads are longer than you expect is because you are running a 1x300 sequencing run. Which means you might be getting "read through" into the reverse primer and adapters at the 3' end. So, you should consider truncating those reads to ~ 250 bases when running dada2, or deblur. Otherwise you'll be retaining spurious adapter sequence within your reads at the 3' end, leading to erroneous conclusions with your analyses.

Also, your second image, from the resulting two step cutadapt approach, does not seem correct to me. The output should be just as long as the first step, as you should not be discarding any reads, only trimming them from the 3' end. Perhaps double check that...

1 Like

Good morning, @SoilRotifer.

Thanks to your kind response, I could finish the rest of the analysis well. It perfectly left 252bp without any adapters and primers.

I appreciate your help :slight_smile:

-Doyoung

1 Like