I am running an analysis on 16S rRNA data sample sequences from iHMP. I imported paire-end reads for 705 samples using the Casava 1.8 format. Here is the demux.qzv file: Abeni-demux.qzv (368.2 KB)
When I denoised the samples using DADA2 it only kept 126 samples of the 705 samples imported. The highest percentage of sequences that were non-chmieric in a sample is around 54%.
I tried several strategies based on previous threads posted in this forum and tried using the approaches suggested and the outcome was similar. Here are the commands I used in my last attempt at denoising the samples:
qiime dada2 denoise-paired
--o-denoising-stats Abeni-denoising-stats.qza posts Here is the denoise-stats.qzv
In previous threads it was suggested to run only the forward reads to improve the outcome of denoising using DADA2. I just want to make sure I am not missing anything in my dataset that is unusual. I am particularly surprised as these are samples downloaded from iHMP.
Thank you for your help
Even with genous maximum expected error thresholds (max-ee), the quality is probably too low for many of these reads to pair, given that no differences are allowed by q2-dada2 (due to mergePairs(..., maxMismatch = 0), for now...
Do you know if this is the V4 16S amplicon? If so, you might get better results using much, much shorter truncation settings, so that reads are able to join with exact overlap. Try something like this!
--p-trunc-len-r 140 # <- or make this even shorter!
If these reads target the V4 region, this should still give plenty of overlap in an area of high quality, resulting in fewer mismatches and more joined reads.
That is awesome you worked on the iHMP/HMP. Couldn't get better advise than yours. So according to the article published on the MOMS-PI they target the V1-V3 and the amplicon size is app. 540. That is why I used the longer truncation to make sure I had enough overlap.
Got it. OK, with maxMismatch = 0, these reads are never going to join.
One option is to join using another program, say vsearch or DADA2 directly in R, where we have more control of our settings and mismatches. Another option is to use just one of the reads, so we don't have to pair at all.
I'm not familiar with the MOMS-PI cohort and the data they published... If they published joined reads, you could import those and go from there.
If I merge the reads using DADA2 in R, once the merge is complete, how do I bring back the dada2 output, the table, rep-seq and stats back, as qiime2 qza artifacts to continue the analysis in qiime2?
I would also appreciate If you could also guide me a bit in terms of the parameters in DAD2 that I should play with in R. I am going to run their tutorial as well.
You could also do all your analysis in R, keeping track of your work using R markdown. (I've seen people to upstream processing in Qiime2, then export their data so they can do downstream analysis in R. I guess this depends on how much you like R... )
Sure thing! Their tutorials are awesome, but if you have any questions, feel free to open a post in Other Bioinformatics Tools and @ me.