Trimming sequences with cut-adapt to prep for both DADA2 and deblur

kida_miska · January 3, 2023, 5:52pm

Hello! This is my first time seeking assistance on this forum and my first time conducting a 16s sequencing analysis entirely on my own (there are no other individuals in my lab to ask for help, I am using a folder left behind with scripts/logs from the last bioinformatics postdoc to guide my analysis but they are not well annotated). I am new to this, I apologize in advance if I didn't provide the right details or too much detail below!

Context: I built a 16s rRNA library targeting the V1-V2 region in the wetlab and got 16s sequencing data back. Human stool samples, we're interested in the bacterial profile for the samples (gut microbiome study), total of 104 samples. I followed our lab protocols and used the same 27F universal primer (forward primer) in every sample but used a unique index primer with a specific barcode for each sample (as a tag to tell them apart). I have an excel file that shows the 5'-3' direction of the 27F and index primer DNA sequences but I am lost in how to correctly apply this to the cut-adapt code for my sequences because the script left behind was written for qiime2/2020.11 and while the script ran without error on my sequences I'm not sure if it's the correct trimming or I need to adjust it. I have paired-end sequences (fastq files with read1 and read 2 for each sample) and below is the old script:

My question is: why is the p-front f and p-front-r listed twice? I understand that I need to trim the forward primer and the reverse index primer off of each sample before I can begin the denoise steps, but I don't understand why the old post-doc wrote it this way, and many examples include lines that specifically says p-adapter and p-front. The above script was used on an older project where I built the library but he did the data processing, and used the exact same 27F and index primers as this current project. I'm confused why he doesn't specify the rest of the DNA sequences for either, or why the barcodes aren't trimmed off accidentally by telling qiime to target the ACTCCT sequences (on the right side of the image). My forward primer has the 5' Illumina adapter, forward primer pad, CC (forward primer linker) and then the forward primer sequence. The index primer has an AA linker and everything is the same for each sample except for a small segment in the middle but this isn't what's listed in the script (and I'm assuming I don't want to lose the barcodes since this should be needed to link to the metadata to identify what data belongs to what sample during the denoise steps, right?)

If it helps here are my primers (listed as 5'-3' for both):
27F sequence below:
AATGATACGGCGACCACCGAGATCTACAC (5' illumina adapter) TATGGTAATT (pad) CC (linker) AGMGTTYGATYMTGGCTCAG (forward primer)

**Index primer (barcode #1 for sample #1) 5' to 3' ** sequence below. The bold segment is the only part that changes between each sequence for each of the samples since that's the unique barcode tag:
CAAGCAGAAGACGGCATACGAGAT ACGAGACTGATT AGTCAGTCAG AA GCTGCCTCCCGTAGGAGT
(the reverse complement of GCTGCCTCCCGTAGGAGT is what is in the script)

How should I be adjusting the --p-front part of the cutadapt trim script for this project (I'm also using the updated qiime2 2022.8 version). Why is it listed twice and only the last portion of the primer sequences specified in the script? Any help is greatly appreciated!

Keegan-Evans · January 3, 2023, 10:25pm

@kida_miska,

Welcome to the forum! Thanks for including so much detail, it makes it a lot easier to answer questions when there is more detail rather than less

It looks like the previous post-doc used 2 different primers, you can supply as many as you would like in this manner.

From what I can see, all you should need to do is remove your forward primer, and Cutadapt will remove all other upstream bases as well, removing your Illumina adapter, pad, and linker as well.

Hope this helps/that I answered the correct question

PS, you might find the relevant q2-cutadapt documentation helpful as well in the future.

Keegan-Evans · January 4, 2023, 3:43pm

@kida_miska,

Chatting with another mod, they pointed out that the previous post-doc might have been dealing mixed orientation reads as discussed in this post.

kida_miska · January 4, 2023, 5:03pm

thank you so much for your reply & help, it does clear some of this up! I had a follow up question based on your reply if that's okay?

When you mention two different primers, do you mean a forward primer and a reverse primer or two different sets (n=4)? I think my confusion is stemming from the way the post-doc structured his script - I'm getting lost in why the p-front-f sequence (first one listed) is identical to the p-front-r sequence directly below it in the script? And why the other p-front-f is a different sequence but also has a p-front-r thats identical beneath it.

For the cut-adapt function, it makes sense that trimming at the location of the forward primer (starts after the CC link) will cut off everything before it, but for the reverse primer wouldn't it cut off the barcode since that would be on the other side of the AA link from the sequence listed? is that a problem for the next steps if the barcodes are cut off?

Thank you!!

Keegan-Evans · January 4, 2023, 5:53pm

@kida_miska, ah yes, I see how the way I wrote that could be confusing, arbitrarily many sets of primers, so if you had 5 different forward strand primers, you could provide those as separate --p-front-f, though if I remember correctly this also works for reverse strand primers (--p-front-r) as well I think the post doc was using it like this as the work around for having mixed orientation reads, using demux-paired to get them all "pointed the right way".

You can see in the base cutadapt docs(very handy if you want to understand exactly what it is doing, q2-cutadapt simply wraps the most commonly used functions for easy use in the QIIME 2 ecosystem), the reverse primer trim works exactly the same as the forward primer trimming functionality. To clarify, it is still reading the reverse strands "forwards"(5->3), but it is only reading on the strands marked as reverse reads, it is not that it is reading/operating on the reverse reads in the reverse direction(3'->5`), just changing which strands it is actually looking at.

kida_miska · January 5, 2023, 12:26pm

Thank you so much for taking the time to walk me through this!! It makes more sense why I have to input the reverse primer in the format of it's reverse complement in order for cutadapt to correctly find and trim it off of all my R2 sequences.

Thank you again!!

system · February 5, 2023, 6:27pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.