cutadapt / trim-paired / option "front" and "adapter"

JeremyTournayre · November 17, 2021, 2:45pm

Hello,

I have sequences in paired end that have both primers in R1 and R2 files.
For exemple, a sequence in the forward ( R1 ) file :
In bold the 2 primers :

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

A sequence In the reverse ( R2 ) file:

TGATCCTTCTGCAGGTTCACCTACGGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTACTGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

I wish to remove the primers to keep the sequence inside so I used cutadapt like that:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences demux-paired-end.qza \
--p-front-f GTACACACCGCCCGTC  \
--p-adapter-f GTAGGTGAACCTGCAGAAGGATCA  \
--p-front-r TGATCCTTCTGCAGGTTCACCTAC \
--p-adapter-r GACGGGCGGTGTGTAC \
 --p-discard-untrimmed \
--verbose \
--o-trimmed-sequences trimmed_remove_primers.qza

After that I can see my trimming sequences with this command:

 qiime tools extract /
--output-path trimmed_remove_primers /
--input-path trimmed_remove_primers.qza

In the forward ( R1) file, unfortunately I have always the forward primer but the reverse primer (and the downstream sequence) has been correctly removed:

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC

In the reverse ( R2 ) file, the reverse primer has been correctly removed but the forward primer with the downstream sequence is always here:

GTACACACCGCCCGTCGCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCAGAAGGATCAAGACCAAGTCTCTGCTACCGTACGTCTTCTTAATCTCGTATGCCGTCTTCTGCTTGAAAATTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

I don't undersand why only the reverse primer are removed on the forward (R1) and reverse (R2) file?

After that removing primers step I will use dada2, this is correct right?

Thank you for any advice and have a nice day
Jérémy Tournayre

Keegan-Evans · November 19, 2021, 9:29pm

@JeremyTournayre,

I think here you just need to add an anchor(^) to your primer to indicate that it should be found at the front of the sequence: --p-front-f ^GTACACACCGCCCGTC. See here in the docs.

I am not really sure what is going on here, as I think you have a copy paste error in your post. It looks like somehow the R2 output that you posted is identical the R1 input:

It does look like you ran the command with the --verbose flag enabled, if you could post that log here it would give us a lot more to go on.

Exactly! You definitely want to make sure you have no non-biological sequences in your samples when you give it to DADA2. If you ever have workflow questions this is a great resource.

JeremyTournayre · November 22, 2021, 2:48pm

Unfortunatly, I copy past the wrong sequence for the R2 file after the cutadapt step, thanks to see that mistake. This is the right sequence below (the problem don't change):

GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGCGACGGGCGGTGTGTACTGTAGAACCATGTCGTCAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAACGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Finally, I think I solved my problem with a cutadapt in 2 steps:

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-adapter-f GTAGGTGAACCTGCAGAAGGATCA
--p-adapter-r GACGGGCGGTGTGTAC
--p-match-read-wildcards
--p-match-adapter-wildcards
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences trimmed_remove_primers_wild.qza
--p-front-f GTACACACCGCCCGTC
--p-front-r TGATCCTTCTGCAGGTTCACCTAC
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild_2.qza

I added:

--p-match-read-wildcards and --p-match-adapter-wildcards:
Because in other data I got IUPAC primers.

--p-discard-untrimmed:
Only for the primer because in other data the amplicon can be much longer to have the second primer in a read.

With this, I got good trimmed sequences:

R1:

GCTCCTACCGATACCGGGTGATCCGGTGAACCTTTTGGACCGTTTTTCGGAAAAATAAGTAAACCATATCACCTAGAGGAAGGAGAAGTCGTAACAAGGTTTCC

R2:

GGAAACCTTGTTACGACTTCTCCTTCCTCTAGGTGATATGGTTTACTTATTTTTCCGAAAAACGGTCCAAAAGGTTCACCGGATCACCCGGTATCGGTAGGAGC

I think we can aprove my 2 steps cutadapt solution?

thermokarst · November 23, 2021, 9:21pm

Hi there @JeremyTournayre - @Keegan-Evans is out of the office for the rest of the week, he'll get back to you some time next week. Thanks!

Keegan-Evans · November 29, 2021, 6:43pm

@JeremyTournayre,

I think your two-step solution is the correct one! Ideally this would not need to be the case, but Cutadapt can tend to need to be told exactly what to do, and in your case it seems to have to be told in two separate steps. You can see the same thing occurring in this recent post and the referenced, older post.

Unfortunately we are not the developers of Cutadapt and q2-cutadapt is just a wrapper. At this point we are going to recommend exactly what you did!

JeremyTournayre · December 7, 2021, 3:29pm

Hello,

I don't know if it's too late but I found this topic " Fungal ITS analysis tutorial"

In fact there is a section in this tutorial which is exactly my problem in this topic:
"One issue with ITS (and other marker genes with vast length variability) is readthrough , which occurs when read lengths are longer than the amplicon itself!"

There is the command used like me but only in one step:

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences demux.qza \
  --p-adapter-f AYTTAAGCATATCAATAAGCGGAGGCT \
  --p-front-f AACTTTYRRCAAYGGATCWCT \
  --p-adapter-r AGWGATCCRTTGYYRAAAGTT \
  --p-front-r AGCCTCCGCTTATTGATATGCTTAART \
  --o-trimmed-sequences demux-trimmed.qza

I tried downloading the data from this tutorial to see if my problem is with these data too, but the forward primers have already been trimmed in the raw reads.

So I think the command in the Fungal ITS analysis tutorial has the same problem that I had, I think we need to use the 2 step cutadapt solution.

Moreover, I detected an error in this tutorial: the forward primer and the reverse primer are swapped.

So, Instead of the command seen in the tutorial above the real command is (with the wildcard enabled!):

qiime cutadapt trim-paired \
  --i-demultiplexed-sequences demux.qza \
  --p-adapter-r AYTTAAGCATATCAATAAGCGGAGGCT \
  --p-front-r AACTTTYRRCAAYGGATCWCT \
  --p-adapter-f AGWGATCCRTTGYYRAAAGTT \
  --p-front-f AGCCTCCGCTTATTGATATGCTTAART \
  --p-match-read-wildcards
  --p-match-adapter-wildcards
  --o-trimmed-sequences demux-trimmed_swapped.qza

=> So to have the expected results: the 2 steps solution:

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-adapter-f AGWGATCCRTTGYYRAAAGTT
--p-adapter-r AYTTAAGCATATCAATAAGCGGAGGCT
--p-match-read-wildcards
--p-match-adapter-wildcards
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences trimmed_remove_primers_wild.qza
--p-front-f AGCCTCCGCTTATTGATATGCTTAART
--p-front-r AACTTTYRRCAAYGGATCWCT
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-discard-untrimmed
--verbose
--o-trimmed-sequences trimmed_remove_primers_wild_2.qza

How can I report this error on the tutorial Fungal ITS analysis tutorial?

system · January 9, 2022, 10:20pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.