I used qiime2-2021.8 (installed with conda).
I have the paired-end demultiplexed fastq files for 16S metabarcoding data.
I did the denoise with the commond:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end-18.qza --p-trunc-len-f 301 --p-trunc-len-r 301 --p-trim-left-f 0 --p-trim-left-r 0 --o-table table-dada2-18.qza --o-representative-sequences rep-seqs-dada2-18.qza --o-denoising-stats denoise-dada2-18.qza
Before I did the denoise, I had average 110^5 - 210^5 reads per sample for both forward and reverse sequence. After denoise, I only got average 210^4 - 510^4 feature count per samples.
If I understand correctly, I lost around 70% reads data. Is this a normal situation or not? How can I know the problem of these reads?
I think there could be more problems during dada2. So I paste my denoise table here for your consideration. Could you please give me some hints to update my commond?
(I think there might something wrong during merge because I only got half outputs of the filters sequences?
Could you also provide the quality plots prior denoising and expected amplicon size or rRNA region targeted?
Low percentage of merged reads can be caused by bad quality scores at the ends of the reads (in that case you need to truncate them) or by too small overlapping region (here you will need to decrease minimum overlap in dada2 or disable truncation).
Just one quick question - did you already remove primers with Cutadapt? Primers in sequences prior dada2 could cause loses during chimera removal. If not, could you remove primers first with --p-discard-untrimmed enabled and then run dada2 (truncation parameters should be adjusted again)?
Then, I did the denoise again with the command:
qiime dada2 denoise-paired --i-demultiplexed-seqs trimmed-paired-end-18.qza --p-trunc-len-f 250 --p-trunc-len-r 200 --p-trim-left-f 0 --p-trim-left-r 0 --p-n-threads 4 --o-table table-dada2-new18-2.qza --o-representative-sequences rep-seqs-dada2-new18-2.qza --o-denoising-stats denoise-dada2-new18-2.qza
Hello!
Congratulations! Looks good to me. I think you can play with truncation parameters to check if you can get better outputs from this dataset and choose a best one or proceed as it is.