Cutadapt primer removal questions

Hello everyone!
I was trying to cut my adapters and primers sequences in my FastQs (V3-V4 region).

I usually use Fastp for adapter removal but this time i wanted to use cutadapt just to learn one more tool.

So i used 2 different commands that i think can both work but i'm not sure if they are 100% correct.

Considering that the adapter+primer sequence are:

16S Amplicon PCR Forward Primer 5'

TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG

16S Amplicon PCR Reverse Primer= 5'

GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

(Stronged sequence is the primer)

I used these 2 commands:
cutadapt -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -a CTGCWGCCNCCCGTAGG -A GGATTAGATACCCBDGTAGTC -Q 0 --discard-untrimmed --pair-filter=any -m 50 -j 30 -o cleaned_4_20/test1_r1 -p cleaned_4_20/test1_r2 r1.fastq.gz r2.fastq.gz

cutadapt -a ^CCTACGGGNGGCWGCAG...GGATTAGATACCCBDGTAGTC -A ^GACTACHVGGGTATCTAATCC...CTGCWGCCNCCCGTAGG -Q 0 --discard-untrimmed --pair-filter=any -m 50 -j 30 -o cleaned_4_20/r1_trimmed_prova2.fastq.gz -p cleaned_4_20/r2_trimmed_prova2.fastq.gz r1.fastq.gz r2.fastq.gz

My consideration are:

  • Assuming that the read-through always happens, i think that the 2nd command is more correct, but this does not always happens; and for this reason, 1st command should also good. Am i missing something?
    What would you use and why?

Also, is primer removal mandatory? I've read on DADA2 that it can create problems if ambiguous characters are there, but in my case there shouldn't be any ambiguous character in my reads.
Thank you for your time and suggestions.

Hi @Phoe,

Yes. It is not only about the presence of the ambiguous IUPAC bases, but also the fact that those sequences are still primer sequences that have been incorporated into the PCR / sequencing product. Thus, they are not actually the sequences from the target organism. Remember PCR primers are "leaky" and can still bind to and amplify despite a few mismatches, which are basically masked by the incorporated primer.

Also you normally do not need to add the adapter sequence into cutadapt unless you are getting substantial read-through. You should be able to get away with simply :

cutadapt -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC ...

Also, you can simply use cutadapt via QIIME 2.

1 Like

Thank you for your answer @SoilRotifer !
So the final command would be:

cutadapt -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -Q 0 --discard-untrimmed --pair-filter=any -m 50 -j 30 -o cleaned_4_20/test1_r1 -p cleaned_4_20/test1_r2 r1.fastq.gz r2.fastq.gz

Without the 3' primer sequences? If that's a yes, the answer is it in the read-through like we were discussing earlier?

Also I read that the best practice would be:

  • Adapter removal
  • Merging
  • Quality trimming
  • Import in qiime2 and analysis
    Does it change anything if I do everything inside qiime2?

Thank you again!

Looks okay to me. You can probably omit -Q. All the steps you've outlined can be done within QIIME 2. Check out the tutorials. Nothing should change within QIIME 2 as many of the steps are using generally accepted approaches and wrap the wools in question.

Thank you so much for you help!

Will look to do all in QIIME2 next time for sure!