Hello! This is my first time seeking assistance on this forum and my first time conducting a 16s sequencing analysis entirely on my own (there are no other individuals in my lab to ask for help, I am using a folder left behind with scripts/logs from the last bioinformatics postdoc to guide my analysis but they are not well annotated). I am new to this, I apologize in advance if I didn't provide the right details or too much detail below!
Context: I built a 16s rRNA library targeting the V1-V2 region in the wetlab and got 16s sequencing data back. Human stool samples, we're interested in the bacterial profile for the samples (gut microbiome study), total of 104 samples. I followed our lab protocols and used the same 27F universal primer (forward primer) in every sample but used a unique index primer with a specific barcode for each sample (as a tag to tell them apart). I have an excel file that shows the 5'-3' direction of the 27F and index primer DNA sequences but I am lost in how to correctly apply this to the cut-adapt code for my sequences because the script left behind was written for qiime2/2020.11 and while the script ran without error on my sequences I'm not sure if it's the correct trimming or I need to adjust it. I have paired-end sequences (fastq files with read1 and read 2 for each sample) and below is the old script:
My question is: why is the p-front f and p-front-r listed twice? I understand that I need to trim the forward primer and the reverse index primer off of each sample before I can begin the denoise steps, but I don't understand why the old post-doc wrote it this way, and many examples include lines that specifically says p-adapter and p-front. The above script was used on an older project where I built the library but he did the data processing, and used the exact same 27F and index primers as this current project. I'm confused why he doesn't specify the rest of the DNA sequences for either, or why the barcodes aren't trimmed off accidentally by telling qiime to target the ACTCCT sequences (on the right side of the image). My forward primer has the 5' Illumina adapter, forward primer pad, CC (forward primer linker) and then the forward primer sequence. The index primer has an AA linker and everything is the same for each sample except for a small segment in the middle but this isn't what's listed in the script (and I'm assuming I don't want to lose the barcodes since this should be needed to link to the metadata to identify what data belongs to what sample during the denoise steps, right?)
If it helps here are my primers (listed as 5'-3' for both):
27F sequence below:
AATGATACGGCGACCACCGAGATCTACAC (5' illumina adapter) TATGGTAATT (pad) CC (linker) AGMGTTYGATYMTGGCTCAG (forward primer)
**Index primer (barcode #1 for sample #1) 5' to 3' ** sequence below. The bold segment is the only part that changes between each sequence for each of the samples since that's the unique barcode tag:
CAAGCAGAAGACGGCATACGAGAT ACGAGACTGATT AGTCAGTCAG AA GCTGCCTCCCGTAGGAGT
(the reverse complement of GCTGCCTCCCGTAGGAGT is what is in the script)
How should I be adjusting the --p-front part of the cutadapt trim script for this project (I'm also using the updated qiime2 2022.8 version). Why is it listed twice and only the last portion of the primer sequences specified in the script? Any help is greatly appreciated!