dada2 high chimeric

First thanks a lot for the support to move forward using qiime2. I read so much in similar questions but still can not move forward.

I work with v3-v4 region demultiplexed data. When I got worried because dada2 consider up to 70 % of my sequences as chimeric, with dada2 i have between 50 to 70% identified as chimeras Why??** although I changed my truncation parameters many times and I have good merging already!! my expected amplicon length is 460bps and i have enough overlap 66 overlap

dada2

Question, How can I improve my chimera output with dada2? Should i filter chimeras before dada2? If yes which plugin should i use? Genomic center said my samples are already demultiplexed, does this means I also have no primer residues that r ruining my analysis? How can I make sure of that? Here is there exact words,

I used following command with dada2

qiime dada2 denoise-paired --i-demultiplexed-seqs /mnt/d/16S_WinterData_Files/H_atra/dada2/dada2_1/atra-demux.qza --o-table /mnt/d/16S_WinterData_Files/H_atra/dada2/dada2_2/atra--table.qza --o-representative-sequences /mnt/d/16S_WinterData_Files/H_atra/dada2/dada2_2/atra-rep-seqs.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 297 --p-trunc-len-r 229
--o-denoising-stats /mnt/d/16S_WinterData_Files/H_atra/dada2/dada2_2/atra-stats-dada2.qza

demux.qzv (309.9 KB)
atra-rep-seqs.qzv (1.1 MB)
atra-stats-dada2.qzv (1.2 MB)
atra-table.qzv (572.4 KB)

I am reading old posts in forum for hours but still can not move forward. Please advise me

When i use single end forward with dada2, I retain much of my samples non-chimeric, so what is goining on with Paired end?

1 Like

Hello!
Looks like primers are still in the reads. Could you try to remove primers ith q2-cutadapt first?

But, when I search for the primers using grep i get 0 as output!!
grep -c CCTACGGGAGGCAGCAG /mnt/d/Winter_data/Data_repeats/21Nov74-DL027_S27_L001_R1_001.fastq.gz
0
grep -c GGACTACHVGGGTWTCTAAT /mnt/d/Winter_data/Data_repeats/21Nov74-DL027_S27_L001_R1_001.fastq.gz

0

Is this means Primers are already removed?
What is the difference between the illumina primer and the primer here

Which one of those should I be using to train my classifier?? or to cut using cutadapt if I am still have to?

Based on your quality plots, primers are still attached to the sequences. I would try to remove then with cutadapt first, discarding any sequences without primers. If output file is too small compared to original, that's mean that there is either an error with a command or primers are indeed already removed.
grep command will not work properly with some special symbols in the sequence.
Use the same primers that you will try in cutadapt to train classifier.

1 Like

What is the correct primer i should be trimming, the illumina primer at the left or the primer on the right?

I think you should use primers from the right side.

done like that
qiime cutadapt trim-paired
--i-demultiplexed-sequences /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/ atra-demux.qza
--p-front-f CCTACGGGAGGCAGCAG
--p-front-r GGACTACHVGGGTWTCTAAT
--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences paired-end-demux-PrimerTrimmed.qza

paired-end-demux-PrimerTrimmed.qzv (316.3 KB)

What is going on at my new trimmed .qzv file?

It is normal that the forward/reverse reads per sample were trimmed differently , meaning no pattern/not equal number of bases are trimmed per sample?
After Primer removal


Before primer removal

After that I checked if adapters are also still there before proceed for further analysis?** or demultiplexing means adaptor are also removed beside the barcodes? using following
qiime cutadapt trim-paired
--i-demultiplexed-sequences /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/atra-demux.qza
--p-front-f CCTACGGGAGGCAGCAG
--p-front-r GGACTACHVGGGTWTCTAAT
--p-adapter-f ACACTCTTTCCCTACACGACGCTCTTCCGATCT *
** --p-adapter-r GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT *

--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences

When I add **--p-adapter-f --p-adapter-r I actually end up ranging from few bases less trimmed in some samples to equal trimmed in other samples compared if only --p-front-f --p-front-r being used. Additionally , the Interactive quality plot looks different at the end of it as follows

Are both trials still considered the same?
paired-end-demux-PrimerTrimmed.qzv (316.3 KB)
paired-end-demux-PrimerAdapterTrimmed.qzv (316.3 KB)
atra-demux.qzv (310.0 KB)

1 Like

Looks like the right command to me.

I see the quality plots like this quite often after primers removal. I can be mistaken, but I think it is related to changes in reads length after clipping primers. But you will truncate it anyway.

Based on your screens forward and reverse reads have the same amount of reads between each other before cutadapt and after. I can see only the differences between samples, but there is nothing wrong with it.

Cutadapt will remove primers and any subsequent or preceding bases (depending on the end of the read to which primers were attached). Since you run cutadapt with discarding reads with no primers, I would not worry about adapters. Your first cutadapt run is good enough to proceed with dada2 to check if it helps with chimeras.
Differences between your two cutadapt runs are minor.

Thank you so much.

When I have proceeded, i have generated a set of empty files after deblur. I believe my trimming length was wrong
demux-joined.qzv (299.8 KB)

(qiime2-2022.2) sabdelghany@LAPTOP-DUNNEENC:/mnt/d/16S_WinterData_Files/H_atra$ qiime deblur denoise-16S --i-demultiplexed-seqs /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/QualityControl/atra-demux-joined-filtered.qza --p-tri
m-length 435 --p-sample-stats --o-representative-sequences /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-rep-seqs.qza --o-table /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-table
.qza --o-stats /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-deblur-stats.qza
Saved FeatureTable[Frequency] to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-table.qza
Saved FeatureData[Sequence] to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-rep-seqs.qza
Saved DeblurStats to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-deblur-stats.qza
I choosed 435 based on these statistics of my Joined reads

What is that much wrong with 435?
so trimming at 435 here means all reads below that are discarded? This makes no sense to me!!

You are right! All reads shorter than 435 will be discarded. The same will happen in Dada2 for each of the reads before merging.

So If I want to keep all my reads as i do not have much, **would it be good to trimm at below 2% percentile so below 404 to keep all of them?

But if i trimm at 404 it will drop reads shorter and keep reads longer than 404 but cut those longer reads to the 404 length? Is this correct?

I have small range from 404-431!!

Yes, it will be better to trim at position 400/404 to recover most of the reads.

And yes, it will drop all the reads that shorter than this value and trim longer reads

1 Like

An off-topic reply has been split into a new topic: Large portion of Cyanobacteria

Please keep replies on-topic in the future.

Hi again, just to be sure question, Is it normal that after primer removal forward and reverse reads are not of the same length anymore?

Is it because primer sets are not of same length for forward and reverse primer?

Hi! Yes, that's normal - reverse and forward primers are not the same.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.