Hello everyone, I used DADA2 to denoise the PaBio 16s full length results, and it shows that only about 52% of the reads remain, meaning approximately half of the reads were filtered out. My question is there any way to improve non-chimeric reads?
Below are my run parameters(almost default parameters due to good sequence quality ): --p-min-len 1000 \ p-max-len 1600 \ -p-max-ee 2 \ -p-chimera-method consensus \ --p-trunc-len 0 \ -p-front AGRGTTYGATYMTGGCTCAG --p-adapter RGYTACCTTGTTACGACTT \ -p-pooling-method pseudo
Hi, thanks for your reply. However, adding the parameter --p-min-fold-parent-over-abundance 8/16 does not improve the results. Still nearly half of the reads did not pass the chimeric filter.
Any suggestion?
Sadly, I've never worked with PacBio data before, so that was my only quick suggestion. Not sure what the quality plots look like, but would truncation help at all? Perhaps someone in the forum with more experience working with PacBio data can help here?
Thank you for your quick reply and for the suggestion about truncation. I really appreciate you taking the time to think about it. I’ll keep experimenting and also see if anyone else with PacBio experience can offer additional advice.
Can you please run qiime demux summarize on your data and post the resulting visualization here? That will help us figure out where your parameters should be set.
I’m using dada2 denoise-ccs. Given the expected amplicon length of 16s rRNA sequence, I’m currently using: --p-min-len 1000 \ --p-max-len 1600 \ --p-max-ee 2 \ --p-chimera-method consensus \ --p-trunc-len 0
Any suggestions are greatly appreciated—thanks!
Hi, I'm sorry to bother you, but I really need your help.
I’m using dada2 denoise-ccs . Given the expected amplicon length of 16s rRNA sequence, I’m currently using: --p-min-len 1000 \ --p-max-len 1600 \ --p-max-ee 2 \ --p-chimera-method consensus \ --p-trunc-len 0
Any suggestions are greatly appreciated—thanks!
I am not very familiar with dada2 with Pacbio sequences. What I notice when looking at your post-dada2 stats is that after filtering, we lose the majority of the sequences (~85 to ~60). I am wondering if messing with your --p-min-len may help?
Hi @cherman2
Thanks for your suggestion!
Despite repeated attempts, we were unable to further improve this outcome. Under the same raw data and identical parameter settings, except expanding the allowable sequence-length range to 800–1600 bp did not alter the results; approximately 40% of sequences were still removed during filtering. So, that's that, we therefore proceeded with the subsequent analyses.