Primer show up again after removal using cutadapt in qiime2 DADA2

Wei_Zhang · April 18, 2024, 2:12am

Hi all,

I encountered a bizarre problem when I processed my ITS data in qiime2-2023.2.

Firstly, I trimmed the sequencing reads to remove the primers using cutadapt, as follows

cutadapt -j 10 \
-g CTTGGTCATTTAGAGGAAGTAA \
-G GCTGCGTTCTTCATCGATGC\
 -n 2 \
-m 100 \
-e 0.1 \
--discard-untrimmed \
-o ${col1}.R1.fq \
-p ${col1}.R2.fq \
${R1} ${R2}

I checked and confirmed that all primers have been trimmed in my sequencing reads.

Following, these trimmed data were loaded and processed in the qiimme2.

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest_2.tsv \
  --output-path paired-end-demux.qza \
  --input-format PairedEndFastqManifestPhred33V2

qiime metadata tabulate \
  --m-input-file metadata.tsv \
  --o-visualization metadata.qzv

qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs paired-end-demux.qza \
  --p-trunc-len-f 220 \
  --p-trunc-len-r 210 \
  --o-table ./ASVs/dada2_table.qza \
  --o-representative-sequences ./ASVs/dada2_rep_set.qza \
  --o-denoising-stats ./ASVs/dada2_stats.qza

qiime metadata tabulate \
  --m-input-file ./ASVs/dada2_stats.qza \
  --o-visualization ./ASVs/dada2_stats.qzv

qiime feature-table tabulate-seqs \
  --i-data ./ASVs/dada2_rep_set.qza \
  --o-visualization ./ASVs/rep-seqs.qzv

However, I was surprised to found many merged reads still contained primers in the middle of sequence. So what happened?

Why do the primers appear again even though they have been removed initially? Any suggestions would be greatly appreciated!

colinvwood · April 19, 2024, 6:32pm

Hello @Wei_Zhang,

This is difficult for us to troubleshoot because you used cutadapt outside of qiime2. Consider importing the raw sequences into qiime2, using our cutadapt plugin, and then denoising. If the problem persists we can then look into it further.

Wei_Zhang · April 20, 2024, 5:01am

Hi @colinvwood,

Thanks for the response. I got the same results even when I loaded the raw data and used the cutadapt plugin.

I attached a sample here for your reference (qiime2 - Google Drive).

The forward and reverse primers are "CTTGGTCATTTAGAGGAAGTAA " and "GCTGCGTTCTTCATCGATGC ".
rep-seqs.qzv (242.8 KB)

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.tsv \
  --output-path paired-end-demux.qza \
  --input-format PairedEndFastqManifestPhred33V2

mkdir clean_data
qiime cutadapt trim-paired \
        --p-cores 40 \
        --i-demultiplexed-sequences paired-end-demux.qza \
        --p-anywhere-f CTTGGTCATTTAGAGGAAGTAA \
        --p-anywhere-r GCTGCGTTCTTCATCGATGC \
        --p-minimum-length 150 \
        --p-error-rate 0.01 \
        --p-match-read-wildcards \
        --p-match-adapter-wildcards \
        --p-discard-untrimmed \
        --p-times 2 \
        --o-trimmed-sequences ./clean_data/paired-end-demux_de_primer.qza

mkdir ASVs
qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs ./clean_data/paired-end-demux_de_primer.qza \
  --p-trunc-len-f 220 \
  --p-trunc-len-r 210 \
  --o-table ./ASVs/dada2_table.qza \
  --o-representative-sequences ./ASVs/dada2_rep_set.qza \
  --o-denoising-stats ./ASVs/dada2_stats.qza

#---- denoising statistics
qiime metadata tabulate \
  --m-input-file ./ASVs/dada2_stats.qza \
  --o-visualization ./ASVs/dada2_stats.qzv
qiime feature-table tabulate-seqs \
  --i-data ./ASVs/dada2_rep_set.qza \
  --o-visualization ./ASVs/rep-seqs.qzv

SoilRotifer · April 20, 2024, 4:54pm

Hi @Wei_Zhang,

Check out the latter part of this post.

Wei_Zhang · April 22, 2024, 12:43pm

Hi @SoilRotifer ,

The previous post still did not work for me, even though I have already included the parameter --p-times 2 . I totally have no idea about what happened.

qiime cutadapt trim-paired \
        --p-cores 40 \
        --i-demultiplexed-sequences paired-end-demux.qza \
        --p-front-f CTTGGTCATTTAGAGGAAGTAA \
        --p-front-r GCTGCGTTCTTCATCGATGC \
        --p-minimum-length 150 \
        --p-error-rate 0.01 \
        --p-match-read-wildcards \
        --p-match-adapter-wildcards \
        --p-discard-untrimmed \
        --p-times 2 \
        --o-trimmed-sequences ./clean_data/paired-end-demux_de_primer.qza

qiime demux summarize \
        --i-data ./clean_data/paired-end-demux_de_primer.qza \
        --o-visualization ./clean_data/paired-end-demux_de_primer.qzv

mkdir ASVs
qiime dada2 denoise-paired \
  --p-n-threads 40 \
  --i-demultiplexed-seqs ./clean_data/paired-end-demux_de_primer.qza \
  --p-trunc-len-f 220 \
  --p-trunc-len-r 210 \
  --o-table ./ASVs/dada2_table.qza \
  --o-representative-sequences ./ASVs/dada2_rep_set.qza \
  --o-denoising-stats ./ASVs/dada2_stats.qza

#---- denoising statistics
qiime metadata tabulate \
  --m-input-file ./ASVs/dada2_stats.qza \
  --o-visualization ./ASVs/dada2_stats.qzv
qiime feature-table tabulate-seqs \
  --i-data ./ASVs/dada2_rep_set.qza \
  --o-visualization ./ASVs/rep-seqs.qzv

SoilRotifer · April 22, 2024, 2:20pm

Hi @Wei_Zhang ,

My intent was for you to read the latter part of the linked post. That is:

Sometimes, there are a few almost identical sub-sequences within 16S that can make it appear that a primer is still present, when it is just a neighboring, yet very similar sub-sequence that is not actually the primer.

Can you paste an example of a sequence before and after primer trimming?

SoilRotifer · April 22, 2024, 2:26pm

@Wei_Zhang

I forgot to mention that you should be using

 --p-front-f    CTTGGTCATTTAGAGGAAGTAA \
 --p-front-r    GCTGCGTTCTTCATCGATGC \

and not:

 --p-anywhere-f CTTGGTCATTTAGAGGAAGTAA \
 --p-anywhere-r GCTGCGTTCTTCATCGATGC \

Wei_Zhang · April 23, 2024, 2:16am

Hi @SoilRotifer ,

Thanks for the suggestion. I exported the trimmed the FASTQ file.

qiime tools export \
  --input-path clean_data/paired-end-demux_de_primer.qza \
  --output-path clean_data/

Interestingly, no primer sequence was found in the trimmed FASTQ file, indicating all primers should be correctly trimmed.

I checked those sequences but still found they were not a subsequence and did not even appear in the raw FASTQ data nor trimmed FASTQ file.

For instance, I grep the sequence below in the raw and trimmed FASTQ data, but nothing could be found out.

Also, the sequences that have been trimmed start with AAGTCGTAACAA (red box), which is identical to the sequence downstream of the primer (green box). Thus, I think those sequences are not subsequences, but those that have not been trimmed. What surprised me most was that I could not find those sequences in the raw and trimmed FASTQ file. I wonder whether these sequences are newly generated by DADA2.

Wei_Zhang · April 23, 2024, 2:18am

Definitely, I have already replaced the p-anywhere-f with p-front-f.

colinvwood · May 6, 2024, 4:33pm

Hello @Wei_Zhang,

Were you able to resolve this issue?