I have a demultiplexexd 16S V4 dataset (250PE) for 1621 stool samples (~700 unique subjects longitudinal at multiple time points). Altogether, I obtained 261,819,393 raw reads. Below is the fastqc plot.
- The quality of reverse read starts dropping a lot after 150bp.
I use the criteria below using DADA2 for data processing.
truncate Qscore = 2, maximum EE = 1, truncate reverse read at 150bp, fastqc plot shows a quite good quality after this.
After merging paired-end reads, we get 408,017 ASVs. However, 87.7% ASVs are regarded as chimeras, which accounts for 28.7% of merged reads (52.6% of raw reads), resulting in 50,001 ASVs.
Among them, 94% of those ASVs have <1% prevalence and the remaining 6% of those ASVs have prevalence ranging from 2-10%.
For your reference, below is the number of reads retained at each step.
Here I have 3 questions.
Is the number of ASV reasonable for such a big dataset? (It’s really a huge amount compared with using OTU delineation)
Is more than 87% of ASVs classified as chimera reasonable? Why are so many ASVs regarded as chimeras?
The prevalence is very low, quite unusual for gut microbiome samples.
BTW, “–p-min-fold-parent-over-abundance FLOAT” is set to 2.
May I have your comments and suggestions on this output?