DADA2 -low percentage of input non-chimeric

I am analysing human salivary microbiota using Qiime2 (Miseq v3 2x300b), with the primer target set to v3-v4. However, after the DADA2 step, I noticed a significantly low percentage of input non-chimeric reads (~30%).
Just to note, the sequences were already demultiplexed, and the primers were removed beforehand.
The truncating parameters used in the analysis are as follows:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux_oral_Baby1000.qza
--p-trim-left-f 6
--p-trim-left-r 6
--p-trunc-len-f 257
--p-trunc-len-r 274
--o-representative-sequences oral_rep-seqs-DADA2.qza
--o-table oral_table.qza
--o-denoising-stats oral_DADA2-stats.qza
I am trying to understand the cause of this issue. If I reduce the truncating point of reverse reads, just wondering if it will increase the percentage of sequences after merging.
Thank you for your support.
DADA2_stats.tsv (9.4 KB)


Hi!
I don't like the first 20 minutes in the reads. It looks like the primers are still attached. Are you sure that the primers were removed? If you got sequences already demultiplexed by the sequencing center, and they wrote that primers had already been removed, I would try to remove primers in Qiime2 with cutadapt. If you are sure, that there are no primers in the sequences, I would try to trim the first 20 nt to see if it will improve the output.

Best,

2 Likes

Thank you very much for your input. I think they mentioned that primers will be removed. I want to double-check. I agree with truncating the first 20nt.

In my experience, they remove primers that were used for sequencing in their facility, or adapters, but not primers that were used for 16S library preparation.

2 Likes

Thank you very much for the insight. I want to double-check with the sequencing facility. If I removed any primers before DADA2 using cuadapt, would that affect the following step later on when doing the taxonomic analysis?

qiime feature-classifier extract-reads
--i-sequences HOMD_16S_rRNA_RefSeq_V15.23.fasta.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--o-reads HOMD_V3-V4_oral_ref-seqs.qza

I do not expect any negative consequences of primer removal on taxonomy annotation considering the command you provided. In general, it is considered to be a good practice to remove primers before dada2.

1 Like

Thank you so much for the insight.

Thanks for the insights. That was very useful. The sequenced facility confirmed that the fastq files are raw reads, so the Illumina adapters and 16S V3-V4 primers were not removed.

they have used; 16S_341f PCR Primer Sequence as Forward primer and 16S_805r PCR Primer Sequence as Reverse primer (adapter, and then primer)

Forward: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG

Reverse:GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

In this case, can I use the following script to remove both adapter and primer at once or should I first remove adapter and then primer as below?

  1. If removed both (adapter and primer) at once;

qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux.qza
--p-front-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG
--p-front-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC
--o-trimmed-sequences trimmed-paired-end-demux.qza

OR should I remove the primers first;

  1. qiime cutadapt trim-paired
    --i-demultiplexed-sequences paired-end-demux.qza
    --p-front-f CCTACGGGNGGCWGCAG
    --p-front-r GACTACHVGGGTATCTAATCC
    --o-trimmed-sequences trimmed-paired-end-demux.qza

  2. then remove adapter
    qiime cutadapt trim-paired
    --i-demultiplexed-sequences demux-paired-end.qza
    --p-adapter-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
    --p-adapter-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
    --o-trimmed-sequences trimmed-demux-paired-end.qza
    --verbose

Any insight is highly appreciated.
Thank you very much.

For me, with Illumina, this command usually works well:

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences demux.qza \
    --o-trimmed-sequences cutad.qza \
    --p-cores 6 \
    --p-front-f FORWARDPRIMER \
    --p-front-r REVERSEPRIMER \
    --p-discard-untrimmed

It will discard sequences with no primers found, and adapters should also be trimmed with primers.

trimmed_demux_oral_Baby1000_v3.qzv (320.6 KB)
trimmed_demux_oral_Baby1000_v2.qzv (320.9 KB)
Thank you, Timur. I followed your suggested script, and it appears that 62 out of 178 samples have a very low number of sequences (please see the attached "v3" file).

qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux_oral_Baby1000_v2.qza
--o-trimmed-sequences trimmed-paired-end-demux_oral_Baby1000_v3.qza
--p-cores 6
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-discard-untrimmed

I have also attached the previous (v2) file received from this script.
qiime cutadapt trim-paired
--i-demultiplexed-sequences paired-end-demux_oral_Baby1000_v2.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences trimmed-paired-end-demux_oral_Baby1000_v2.qza

I'm concerned about why this new script led to fewer sequences for these 62 samples. Could there be something wrong with the way I'm executing it?
I appreciate your help.

1 Like

So, the only difference between two commands was line:

--p-discard-untrimmed

which resulted in differences between retained reads?

That means that in the missing sequences primers were not found.
Did you perform any quality control before cutadapt? If yes, it may be a reason (better to work with raw data directly). If not, try to increase error rate to see if it will recover more reads.

PS. Can't check the files right now.

3 Likes

Thank you for the reply. I used the raw data for cutadapt. Again ran cutadapt with error rate of 0.1 and 0.2 separately, but it did not improve reads (only 1 or 2 reads but 0 for most of those 68 samples). Now I am thinking I will separate those 68 samples out and proceed cutadapt for those who have more reads and later run dada2 together. Any insight is appreciated. Thank you.

1 Like

Filtering these failed samples is reasonable but still it is large amount of samples to discard.
I am still puzzled with it.
I would go back to the library preparation step and double check that both bunches of samples (that retained or lost most of the reads) were processed in the same way. If you are not the person who prepared 16S libraries I would try to contact responsible person with that question.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.