Cut adapt for Single end Sequences

Hello everyone,

Can anyone please help me with this?

I have single end demultiplexed sequences with primers in them. I am using cut adapt with the following command, but it’s not trimming the primers.

qiime cutadapt trim-single
–i-demultiplexed-sequences pe-33-single-end-demux.qza
–p-cores 500
–p-adapter ATTAGAWACCCBNGTAGTCC
–p-front GTGCCAGCMGCCGCGGTAA
–p-error-rate 0
–o-trimmed-sequences trimmed-seqs.qza
–verbose

the actual sequence of the primers are as follow:
Forward: GTGCCAGCMGCCGCGGTAA
Reverse: GGACTACNVGGGTWTCTAAT

Here is an example of the sequence that cut adapt generated after trimming.

GTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTCTGTTAAGTCAGATGTGAAATCCCCGGGCTTAACCTGGGAACTGCATTTGAAACTGGCAGGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGAAACCCTGGTAGTCC

Your help with this will be greatly appreciated. Thanks

Kind Regards
Ankush

Hi @Ankush_1,

If you search the forum you'll see that others have come across this issue. The secret is:

There are other good suggestions in that thread too.

Also, you'll likely only need to enter in the forward primer, i.e. --p-front, often you'll not need to enter in the adapter sequence unless you expect read-through.

-Cheers!
-Mike

Hi Mike,

Thanks so much for the quick reply and guide.

I tried it this way and it worked fine. Though I can still see many of sequences with primer in the front. Not sure why it's not working. Attached is the snapshot of the rep seq table.

Kind Regards
Ankush

I had just noticed that you had set:

I would not recommend doing this as you will "see" your primers in the output. The problem is that you are not allowing for any error within the span of the primer sequence. Unless you allow for some errors / mismatches / indels (default is 10%), they will not be trimmed from the sequence as you are requiring the match to be exact. For example, the second sequence has a 1 bp mismatch to the primer. Sequence 5 and 6 are missing a base

P: GTGCCAGCMGCCGCGGTAA
2: GTGCCGGCAGCCGCGGTAA
5: GTGCCAGC--GCCGCGGTAA
6: GTG--CAGCAGCCGCGGTAA

Note: the M-A is not a mismatch with --p-match-adapter-wildcards set, as M represents A or C.

Also, I had mentioned there are other tricks in that thread I linked. For example you can set --p-discard-untrimmed as outlined in that very same thread, here.

So leave --p-error-rate as default and set --p-discard-untrimmed.

-Mike

2 Likes

Hi Mike,

Thanks a lot for correcting me and sorry I didn’t go through that thread thoroughly. It was really helpful. I was a bit rushing.

I use the parameters you recommended and it worked well.

Also, when I increased the error rate to 0.2, I was able to retain slightly more sequences as discussed in that thread.

Once again thanks for your help. Cheers

Kind Regards
Ankush

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.