Unsure about which part of a primer sequence/formatting to use for DADA2

Hello!

I'm conducting a meta analysis and am thus working with data that I did not obtain myself. I would like to trim primers from sequences using DADA2. Something that is confusing though is that one of my papers lists their primers with:

"Primers used to amplify the V4 region were as follows: AATGATACGGCGACCACCGAGATC TACAC [8bp -i5 index] ATGGTAATTGTGT- GCCAGCMGCCGCGGTAA and CAAGCA GAAGACGGCATACGAGAT [8bp -i7 index] AGTCAGTCAGCCGGACTACHVGGGTWTCTAAT (Illumina, 2016a)."

I think that the first chunk of each primer are the illumina adapters and the second part is the amplification portion. Would I put the whole sequence (e.g. AATGATACGGCGACCACCGAGATC TACAC ATGGTAATTGTGT- GCCAGCMGCCGCGGTAA) for the forward primer? And will the program tolerate the dashes or spaces?

Thanks for your help!

1 Like

Hello again Makaylee,

I've been there! Getting all the data to match up just right can be a challenge.

Depending on the sequencing method, primers may not appear in your data at all.
(This trick worked by starting the Illumina sequencing step using the same primer as you used to amplify the amplicon, instead of the typical Illumina adapter. This starts the read at the start of the primer. Pretty cool!)

Often, I skip primer removal entirely and run DADA2.
(You could say that my null hypothesis H0 is: there are no primers in my data.)
If the different runs don't combine well, I investigate why, and sometimes have to loop back to the primer removal step.
(Reject the null hypothesis: there ARE primers in my data!!)

Once I see there is an unwanted part of my reads, I can choose how best to remove it. Sometimes the basic DADA2 trimming steps are enough.

1 Like

I like this approach! Thanks so much :slight_smile:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.