Analysing V4/V5 16S rRNA Miseq paired-end reads (2x300). I amplified my eDNA with 51F-y and 926R primer sets and sent the samples for Illumina Miseq sequencing. I received the sequences as FASTQ.gz file. Do i need to trim the primers or adapters? I received the sequesnces as separate samples meaning they did the demultiplexing.I do not know the adapters they used (probably the common illumina adapters). They shared FASTQC report which insicate that all reads were 301 bp (illumina added 1 more base pair) both the reverse and the forward. However I tried to run this script
primer sequences used during pcr
FWD_PRIMER="GTGYCAGCMGCCGCGGTAA"
REV_PRIMER="CCGYCAATTYMTTTRAGTT"
Looping through all forward reads (_R1.fastq.gz)
for R1 in "INPUT_DIR"/*_R1.fastq.gz; do
R2="{R1/_R1.fastq.gz/_R2.fastq.gz}" # Find corresponding R2 read
SAMPLE=$(basename "$R1" _R1.fastq.gz) # Extract sample name
echo "Processing sample: $SAMPLE"
# Running Cutadapt
cutadapt \
-j 4 \
-a "$FWD_PRIMER" -A "$REV_PRIMER" \
--minimum-length 100 \
-o "$OUTPUT_DIR/${SAMPLE}_trimmed_R1.fastq.gz" \
-p "$OUTPUT_DIR/${SAMPLE}_trimmed_R2.fastq.gz" \
"$R1" "$R2"
done
echo "Trimming complete!"
I get the following output
=== Summary ===
Total read pairs processed: 1,156,333
Read 1 with adapter: 536,849 (46.4%)
Read 2 with adapter: 552,446 (47.8%)
== Read fate breakdown ==
Pairs that were too short: 551,224 (47.7%)
Pairs written (passing filters): 605,109 (52.3%)
Total basepairs processed: 696,112,466 bp
Read 1: 348,056,233 bp
Read 2: 348,056,233 bp
Total written (filtered): 364,232,687 bp (52.3%)
Read 1: 182,134,893 bp
Read 2: 182,097,794 bp
=== First read: Adapter 1 ===
Sequence: GTGYCAGCMGCCGCGGTAA; Type: regular 3'; Length: 19; Trimmed: 536849 times
However, when I run this script using the illumina adapeters instead of specifying the primers used the primers used during initial pcr amplifying region of interest
Load necessary modules
module load Miniconda3/23.5.2-0
Activate the conda environment with Cutadapt installed
source activate cutadapt_env
Loop over all *_R1.fastq.gz files and trim them using find
find /cluster/projects/nn8999k/hosea/ACTINOMETABA/ -name "*_R1_001.fastq.gz" | while read r1; do
r2="${r1/_R1_001.fastq.gz/_R2_001.fastq.gz}" # Assuming R2 files have the same base name
out_r1="/cluster/projects/nn8999k/hosea/ACTINOMETABA/trimmed_reads/$(basename $r1 _R1_001.fastq.gz)_trimmed_R1.fastq.gz"
out_r2="/cluster/projects/nn8999k/hosea/ACTINOMETABA/trimmed_reads/$(basename $r1 _R1_001.fastq.gz)_trimmed_R2.fastq.gz"
cutadapt -j 4 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --minimum-length 100 -o $out_r1 -p $out_r2 $r1 $r2
done
I get the following outcome
Total read pairs processed: 1,156,333
Read 1 with adapter: 18,279 (1.6%)
Read 2 with adapter: 18,126 (1.6%)
== Read fate breakdown ==
Pairs that were too short: 5,731 (0.5%)
Pairs written (passing filters): 1,150,602 (99.5%)
Total basepairs processed: 696,112,466 bp
Read 1: 348,056,233 bp
Read 2: 348,056,233 bp
Total written (filtered): 690,074,068 bp (99.1%)
Read 1: 345,032,569 bp
Read 2: 345,041,499 bp
I am confused how do I go about this?