A general usage question for DADA-2. I'd like to be able to run many sequencing projects with the same parameters to streamline and remove potential user bias. Specifically, I'd like to use trunc-q 15 or so to procedurally remove reads from the sequences based on quality. Is there a reason to avoid this approach? In certain cases, I end up losing ~50-70% of sequences to chimera detection when using this approach. Thanks for any help!
I generally don't like using that parameter on single-end data, because then your sequences get set truncated at different sites and hence you get sequences of different lengths that get called as separate features (even if the full-length sequences are identical).
On paired-end data, though, this is less of a concern ā though it gives less control over attempting to get sequences to join (e.g., if you need just 10 more bases to achieve sufficient overlap but mean quality gets a little dodgy... manual set truncation lets you consciously balance these decisions).
If you plan to merge these sequencing projects together to compare samples across runs, it will be very important to make sure your sequences are all the same length and used the exact same trim lengths. So manual set truncation is better in this case.
I don't see why this parameter would lead to more/less chimeric seqs than trimming manually based on quality. Sounds like you may just have lots of chimeric sequences...
I hope that helps!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.