Seeking Advice on Primer and Adapter Trimming for MiSeq Amplicon Sequencing Data Using Cutadapt

Miseon · June 4, 2024, 6:23am

Hello,

I am a student studying microbiology, and I am teaching myself bioinformatics. However, I lack a way to verify if my analysis is appropriate, so I am seeking help from this forum.

My question is about trimming adapters and primers from demultiplexed sample-specific fastq files.

Here is some information about my analysis: I performed amplicon metagenomic sequencing using a MiSeq instrument, targeting the V3-V4 region. The primer sequences I used are as follows:

341F: CCTACGGGNGGCWGCAG
805R: GACTACHVGGGTATCTAATCC

I used the Nextera XT kit for library preparation, and the sequencing primers, indexes, and adapter sequences are as follows:

5'- AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-[locusspecific sequence]-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC[i7]ATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTG[i5]AGCAGCCGTCGCAGTCTACACATATTCTCTGTC-[locusspecific sequence]-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG[i7]TAGAGCATACGGCAGAAGACGAAC -5'

When I checked the fastq files for one sample, most of the sequences in the _1 (forward) file started with the 341F primer, and most of the sequences in the _2 (reverse) file started with the 805R primer (though not all reads did). Additionally, a few reads had primer and adapter sequences at the ends. Therefore, I planned to use cutadapt with the following command:

cutadapt 
    -j 14 
    -a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC 
    -g CCTACGGGNGGCWGCAG 
    -A CTGTCTTATACACATCTGACGCTGCCGACGA 
    -G GACTACHVGGGTATCTAATCC 
    -o ${OUTPUT_DIR}/primer_${base_name}_1.fastq.gz 
    -p ${OUTPUT_DIR}/primer_${base_name}_2.fastq.gz 
    -m 50 
    -q 20

In summary, I would like to know:

Is my script suitable for trimming my data?
How should a researcher decide on the parameters for minimum length (-m) and quality score (-q)?
Are there any other considerations I should be aware of?

If more information is needed to address these questions, I am happy to provide it.
I will also provide images of some of my fastq files for reference.

Thank you in advance!