I'm bringing some questions about the pre-denoising step, the scan on the quality of my sequences and the best chioice about where to trim my reads.
First: I'm using the qiime2-amplicon-2024.10 and analyzing an Illumina paired-end sequencing 2x250bp.
Second: I have experienced two analyses in a single-end approach, because I had bad quality in my reverse reads. So I'm trying to go further with this paired-end analysis.
Here I have my paired-end data imported: paired-end-demux.qzv (320.6 KB)
And the same file after applying cutadapt: trimmed-seq.qzv (326.3 KB)
Questions:
After cutadapt application my guess to perform a good denoising (establishing a Phred-score of >25) is to retain my forward reads and trim-right my reverse reads at 173nt.
It is too stringent? Maybe I will get some problems to merge during the dada2 step? Which read length is sufficient to make a good merge?
There are some other remarks about my reads maybe I'm not paying attention? Or is a good run, with a sufficient number of reads?
I don't know of a perfect fix when the sequencing run itself went poorly. It may be easiest to also process this run with only R1.
You can always move on to the next step, like DADA2. Issues with read quality will appear here as reads that do not pass quality filter or reads that are not able to join. Because DADA2 does it's own quality control, you can try it and see how well it works! It will tell you tell you about the problems.
Maybe I will get some problems to merge during the dada2 step?
Thanks for your insigths! I just have tried some tests with dada2, three different tries:
Test 1: It gaves me the best of the percentage of input passed filter, although I'm not satisfied with the values, some samples were around 15-30%... And the highest around 60%. denoising-stats.qzv (1.2 MB)
Test 3: Finally, at this test I've decreased the number of nt that I was truncating in my reverse reads, I had improved a bit my percentage of input passed filter compared with test 2, but are not good enough yet. However, it gaves me the best results for merge, that was with a lot of zeros in the other tests. denoising-stats_3.qzv (1.2 MB)
If you trim so the reads are very short, this will increase quality, but if it's too short then there is not enough overlap for the reads to merge.
If you trim so the reads are a little longer, then a few reads will no longer pass the filter but more will merge overall.
If you trim so the reads are a much, much longer, then even more will be removed by the quality filter and errors can also prevent merging!
So it's a tradeoff. One simple thing to do is to look for settings that maximize the total reads merged. We have to optimize for something, and this seems reasonable to me.
I like your approach of trying many settings and compareing the results!
Have you calculated your expected amplicon length and expected read overlap?
But looking for my feature-table it looks like I have fewer ASVs as I would like, my past analyses I have between 4.000-5.000 ASVs and in that I have just 1.433 ASVs. table-paired_3.qzv (471.8 KB)
Maybe a single-end approach would be better, but I'm confused about that. It is common for reverse reads to show a decrease in the quality of the reads? Or this is a condition of the sequencing? Because my last two sequencings my analyses were conducted by a single-end approach, due to bad quality of the reverse reads.
I found the issue! There was an error, because after cutadapt my sequences were automatic trimmed for 237nt: trimmed_seq_single.qzv (300.1 KB)
So it makes sense truncating in 245nt remove all my data.
With that observation, I've run my denoising and I think my results are good enough to moving forward with my analysis (reasonable to good percentage of input that passed the filter, plus a good number of ASVs ~4.500) denoising-stats.qzv (1.2 MB) table-single.qzv (597.9 KB)