Wei_Zhang
(Wei Zhang)
April 18, 2024, 2:12am
1
Hi all,
I encountered a bizarre problem when I processed my ITS data in qiime2-2023.2.
Firstly, I trimmed the sequencing reads to remove the primers using cutadapt
, as follows
cutadapt -j 10 \
-g CTTGGTCATTTAGAGGAAGTAA \
-G GCTGCGTTCTTCATCGATGC\
-n 2 \
-m 100 \
-e 0.1 \
--discard-untrimmed \
-o ${col1}.R1.fq \
-p ${col1}.R2.fq \
${R1} ${R2}
I checked and confirmed that all primers have been trimmed in my sequencing reads.
Following, these trimmed data were loaded and processed in the qiimme2.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest_2.tsv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2
qiime metadata tabulate \
--m-input-file metadata.tsv \
--o-visualization metadata.qzv
qiime dada2 denoise-paired \
--p-n-threads 40 \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trunc-len-f 220 \
--p-trunc-len-r 210 \
--o-table ./ASVs/dada2_table.qza \
--o-representative-sequences ./ASVs/dada2_rep_set.qza \
--o-denoising-stats ./ASVs/dada2_stats.qza
qiime metadata tabulate \
--m-input-file ./ASVs/dada2_stats.qza \
--o-visualization ./ASVs/dada2_stats.qzv
qiime feature-table tabulate-seqs \
--i-data ./ASVs/dada2_rep_set.qza \
--o-visualization ./ASVs/rep-seqs.qzv
However, I was surprised to found many merged reads still contained primers in the middle of sequence. So what happened?
Why do the primers appear again even though they have been removed initially? Any suggestions would be greatly appreciated!
Hello @Wei_Zhang ,
This is difficult for us to troubleshoot because you used cutadapt outside of qiime2. Consider importing the raw sequences into qiime2, using our cutadapt
plugin, and then denoising. If the problem persists we can then look into it further.
Wei_Zhang
(Wei Zhang)
April 20, 2024, 5:01am
5
Hi @colinvwood ,
Thanks for the response. I got the same results even when I loaded the raw data and used the cutadapt
plugin.
I attached a sample here for your reference (qiime2 - Google Drive ).
The forward and reverse primers are "CTTGGTCATTTAGAGGAAGTAA " and "GCTGCGTTCTTCATCGATGC ".
rep-seqs.qzv (242.8 KB)
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2
mkdir clean_data
qiime cutadapt trim-paired \
--p-cores 40 \
--i-demultiplexed-sequences paired-end-demux.qza \
--p-anywhere-f CTTGGTCATTTAGAGGAAGTAA \
--p-anywhere-r GCTGCGTTCTTCATCGATGC \
--p-minimum-length 150 \
--p-error-rate 0.01 \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--p-discard-untrimmed \
--p-times 2 \
--o-trimmed-sequences ./clean_data/paired-end-demux_de_primer.qza
mkdir ASVs
qiime dada2 denoise-paired \
--p-n-threads 40 \
--i-demultiplexed-seqs ./clean_data/paired-end-demux_de_primer.qza \
--p-trunc-len-f 220 \
--p-trunc-len-r 210 \
--o-table ./ASVs/dada2_table.qza \
--o-representative-sequences ./ASVs/dada2_rep_set.qza \
--o-denoising-stats ./ASVs/dada2_stats.qza
#---- denoising statistics
qiime metadata tabulate \
--m-input-file ./ASVs/dada2_stats.qza \
--o-visualization ./ASVs/dada2_stats.qzv
qiime feature-table tabulate-seqs \
--i-data ./ASVs/dada2_rep_set.qza \
--o-visualization ./ASVs/rep-seqs.qzv
Hi @Wei_Zhang ,
Check out the latter part of this post .
Wei_Zhang
(Wei Zhang)
April 22, 2024, 12:43pm
7
Hi @SoilRotifer ,
The previous post still did not work for me, even though I have already included the parameter --p-times 2
. I totally have no idea about what happened.
qiime cutadapt trim-paired \
--p-cores 40 \
--i-demultiplexed-sequences paired-end-demux.qza \
--p-front-f CTTGGTCATTTAGAGGAAGTAA \
--p-front-r GCTGCGTTCTTCATCGATGC \
--p-minimum-length 150 \
--p-error-rate 0.01 \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--p-discard-untrimmed \
--p-times 2 \
--o-trimmed-sequences ./clean_data/paired-end-demux_de_primer.qza
qiime demux summarize \
--i-data ./clean_data/paired-end-demux_de_primer.qza \
--o-visualization ./clean_data/paired-end-demux_de_primer.qzv
mkdir ASVs
qiime dada2 denoise-paired \
--p-n-threads 40 \
--i-demultiplexed-seqs ./clean_data/paired-end-demux_de_primer.qza \
--p-trunc-len-f 220 \
--p-trunc-len-r 210 \
--o-table ./ASVs/dada2_table.qza \
--o-representative-sequences ./ASVs/dada2_rep_set.qza \
--o-denoising-stats ./ASVs/dada2_stats.qza
#---- denoising statistics
qiime metadata tabulate \
--m-input-file ./ASVs/dada2_stats.qza \
--o-visualization ./ASVs/dada2_stats.qzv
qiime feature-table tabulate-seqs \
--i-data ./ASVs/dada2_rep_set.qza \
--o-visualization ./ASVs/rep-seqs.qzv
Hi @Wei_Zhang ,
My intent was for you to read the latter part of the linked post. That is:
Sometimes, there are a few almost identical sub-sequences within 16S that can make it appear that a primer is still present, when it is just a neighboring, yet very similar sub-sequence that is not actually the primer.
Can you paste an example of a sequence before and after primer trimming?
@Wei_Zhang
I forgot to mention that you should be using
--p-front-f CTTGGTCATTTAGAGGAAGTAA \
--p-front-r GCTGCGTTCTTCATCGATGC \
and not:
--p-anywhere-f CTTGGTCATTTAGAGGAAGTAA \
--p-anywhere-r GCTGCGTTCTTCATCGATGC \
Wei_Zhang
(Wei Zhang)
April 23, 2024, 2:16am
10
Hi @SoilRotifer ,
Thanks for the suggestion. I exported the trimmed the FASTQ file.
qiime tools export \
--input-path clean_data/paired-end-demux_de_primer.qza \
--output-path clean_data/
Interestingly, no primer sequence was found in the trimmed FASTQ file, indicating all primers should be correctly trimmed.
I checked those sequences but still found they were not a subsequence and did not even appear in the raw FASTQ data nor trimmed FASTQ file.
For instance, I grep the sequence below in the raw and trimmed FASTQ data, but nothing could be found out.
Also, the sequences that have been trimmed start with AAGTCGTAACAA
(red box), which is identical to the sequence downstream of the primer (green box). Thus, I think those sequences are not subsequences, but those that have not been trimmed. What surprised me most was that I could not find those sequences in the raw and trimmed FASTQ file. I wonder whether these sequences are newly generated by DADA2
.
Wei_Zhang
(Wei Zhang)
April 23, 2024, 2:18am
11
Definitely, I have already replaced the p-anywhere-f
with p-front-f
.
Hello @Wei_Zhang ,
Were you able to resolve this issue?