cutadapt / trim-paired / option "front" and "adapter"

Hello,

I have sequences in paired end that have both primers in R1 and R2 files.
For exemple, a sequence in the forward ( R1 ) file :
In bold the 2 primers :

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

A sequence In the reverse ( R2 ) file:

TGATCCTTCTGCAGGTTCACCTACGGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTACTGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

I wish to remove the primers to keep the sequence inside so I used cutadapt like that:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux-paired-end.qza \
--p-front-f GTACACACCGCCCGTC  \
--p-adapter-f GTAGGTGAACCTGCAGAAGGATCA  \
--p-front-r TGATCCTTCTGCAGGTTCACCTAC \
--p-adapter-r GACGGGCGGTGTGTAC \
 --p-discard-untrimmed \
--verbose \
--o-trimmed-sequences trimmed_remove_primers.qza 

After that I can see my trimming sequences with this command:

 qiime tools extract /
--output-path trimmed_remove_primers /
--input-path trimmed_remove_primers.qza

In the forward ( R1) file, unfortunately I have always the forward primer but the reverse primer (and the downstream sequence) has been correctly removed:

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC

In the reverse ( R2 ) file, the reverse primer has been correctly removed but the forward primer with the downstream sequence is always here:

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

I don't undersand why only the reverse primer are removed on the forward (R1) and reverse (R2) file?

After that removing primers step I will use dada2, this is correct right?

Thank you for any advice and have a nice day :slight_smile:
Jérémy Tournayre

@JeremyTournayre,

I think here you just need to add an anchor(^) to your primer to indicate that it should be found at the front of the sequence: --p-front-f ^GTACACACCGCCCGTC. See here in the docs.


I am not really sure what is going on here, as I think you have a copy paste error in your post. It looks like somehow the R2 output that you posted is identical the R1 input:


It does look like you ran the command with the --verbose flag enabled, if you could post that log here it would give us a lot more to go on.


Exactly! You definitely want to make sure you have no non-biological sequences in your samples when you give it to DADA2. If you ever have workflow questions this is a great resource.

1 Like

Unfortunatly, I copy past the wrong sequence for the R2 file after the cutadapt step, thanks to see that mistake. This is the right sequence below (the problem don't change):

GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTACTGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Finally, I think I solved my problem with a cutadapt in 2 steps:

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-adapter-f GTAGGTGAACCTGCAGAAGGATCA
--p-adapter-r GACGGGCGGTGTGTAC
--p-match-read-wildcards
--p-match-adapter-wildcards
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences trimmed_remove_primers_wild.qza
--p-front-f GTACACACCGCCCGTC
--p-front-r TGATCCTTCTGCAGGTTCACCTAC
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild_2.qza

I added:

--p-match-read-wildcards and --p-match-adapter-wildcards:
Because in other data I got IUPAC primers.

--p-discard-untrimmed:
Only for the primer because in other data the amplicon can be much longer to have the second primer in a read.

With this, I got good trimmed sequences:

R1:

GCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC

R2:

GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGC

I think we can aprove my 2 steps cutadapt solution?

Hi there @JeremyTournayre - @Keegan-Evans is out of the office for the rest of the week, he'll get back to you some time next week. Thanks!

1 Like