First thanks a lot for the support to move forward using qiime2. I read so much in similar questions but still can not move forward.
I work with v3-v4 region demultiplexed data. When I got worried because dada2 consider up to 70 % of my sequences as chimeric, with dada2 i have between 50 to 70% identified as chimeras Why??** although I changed my truncation parameters many times and I have good merging already!! my expected amplicon length is460bps and i have enough overlap 66 overlap
Question, How can I improve my chimera output with dada2? Should i filter chimeras before dada2? If yes which plugin should i use? Genomic center said my samples are already demultiplexed, does this means I also have no primer residues that r ruining my analysis? How can I make sure of that? Here is there exact words,
But, when I search for the primers using grep i get 0 as output!! grep -c CCTACGGGAGGCAGCAG /mnt/d/Winter_data/Data_repeats/21Nov74-DL027_S27_L001_R1_001.fastq.gz 0
grep -c GGACTACHVGGGTWTCTAAT /mnt/d/Winter_data/Data_repeats/21Nov74-DL027_S27_L001_R1_001.fastq.gz 0
Is this means Primers are already removed?
What is the difference between the illumina primer and the primer here
Based on your quality plots, primers are still attached to the sequences. I would try to remove then with cutadapt first, discarding any sequences without primers. If output file is too small compared to original, that's mean that there is either an error with a command or primers are indeed already removed.
grep command will not work properly with some special symbols in the sequence.
Use the same primers that you will try in cutadapt to train classifier.
It is normal that the forward/reverse reads per sample were trimmed differently , meaning no pattern/not equal number of bases are trimmed per sample? After Primer removal
After that I checked if adapters are also still there before proceed for further analysis?** or demultiplexing means adaptor are also removed beside the barcodes? using following
qiime cutadapt trim-paired
--i-demultiplexed-sequences /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/atra-demux.qza
--p-front-f CCTACGGGAGGCAGCAG
--p-front-r GGACTACHVGGGTWTCTAAT --p-adapter-f ACACTCTTTCCCTACACGACGCTCTTCCGATCT *
** --p-adapter-r GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT *
--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences
When I add **--p-adapter-f --p-adapter-r I actually end up ranging from few bases less trimmed in some samples to equal trimmed in other samples compared if only --p-front-f --p-front-r being used. Additionally , the Interactive quality plot looks different at the end of it as follows
I see the quality plots like this quite often after primers removal. I can be mistaken, but I think it is related to changes in reads length after clipping primers. But you will truncate it anyway.
Based on your screens forward and reverse reads have the same amount of reads between each other before cutadapt and after. I can see only the differences between samples, but there is nothing wrong with it.
Cutadapt will remove primers and any subsequent or preceding bases (depending on the end of the read to which primers were attached). Since you run cutadapt with discarding reads with no primers, I would not worry about adapters. Your first cutadapt run is good enough to proceed with dada2 to check if it helps with chimeras.
Differences between your two cutadapt runs are minor.
(qiime2-2022.2) sabdelghany@LAPTOP-DUNNEENC:/mnt/d/16S_WinterData_Files/H_atra$ qiime deblur denoise-16S --i-demultiplexed-seqs /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/QualityControl/atra-demux-joined-filtered.qza --p-tri m-length 435 --p-sample-stats --o-representative-sequences /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-rep-seqs.qza --o-table /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-table
.qza --o-stats /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-deblur-stats.qza
Saved FeatureTable[Frequency] to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-table.qza
Saved FeatureData[Sequence] to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-rep-seqs.qza
Saved DeblurStats to: /mnt/d/16S_WinterData_Files/H_atra/PrimerRemoved/DenoisingDeblur/atra-deblur-stats.qza I choosed 435 based on these statistics of my Joined reads