Optimal parameters for using q2 deblur denoise-other on 18S and CO1 amplicons

Hi there,

I am working on a meta analysis of both prokaryotic and eukaryotic diversity in coral reefs analyzing hundreds of samples each of 16S, 18S, and CO1 amplicon data. I’m using q2 dublur denoise-16S for the 16S data and I’m confident that the default parameters for that method are good to go out of the box. I’m using q2 deblur denoise-other for the 18S and CO1 amplicon data. My collaborators and I are a bit concerned about whether the default parameters for q2 deblur denoise-other are optimal for these amplicons.

Specifically, I’m wondering if I need to worry about optimizing and changing the --p-indel-prob and --p-indel-max parameters for each specific amplicon that I’m working with. I also wonder if the default error profile (–error-dist option in the standalone version of deblur) is sub-optimal for amplicons other than 16S. I assume that all of these deblur parameters were optimized for use with 16S and that they might not be perfect for the other amplicons. It seems like the probablility of indels (and maybe the error profile) would be dependent on how highly conserved the amplicon is and the rate of evolution of the amplicon region. I want to confirm that I’m not violating any assumptions of the algorithm or the default error profile in using deblur on 18S and CO1.

If these are things that I really need to worry about in applying deblur to non-16S amplicons, could you provide some insight on the best way to approach optimizing the deblur parameters for my needs. I looked through the deblur paper and supplementary materials, but don’t see explicitly how the default settings were chosen.

I am assuming that maybe doing a multiple sequence alignment and examining the fequency of indels could help in optimizing the --p-indel-prob and --p-indel-max parameters, but I’m not sure how to go about optimizing the --error-dist parameter (or if that is even necessary). It looks like the --error-dist parameter isn’t accessible through the qiime2 interface anyway.

This question is probably best suited for @wasade or one of the other deblur devs, but I welcome any feedback from community members who have experience applying deblur to 18S and/or CO1 amplicons. Apologies for the super long post!

Best,
Taylor

1 Like

Hi Taylor,
the short answer is you probably don’t need to change any of the parameters :slight_smile:

a bit longer:
deblur works by removing reads which are suspected pcr/sequencing errors, and assuming all the reads left after that are real sequences present in your sample.
The error removal is based on taking a real sequence (i.e. the sequence with the highest number of reads, so it cannot be a read error), and removing all sequences that could arise from it due to pcr/sequencing errors. This is done by taking an upper bound on the various possible errors (i.e. N substitutions/indels). Since the error probabilities depend on a lot of things (specific machine/run, pcr reagents/protocol, local sequence, etc.), we take an upper bound which should work for most cases.
From the error sources mentioned above, based on our experience in mostly 16S sequences, local sequence should have a small effect. So the error profile should not depend on what you are sequencing and the default error profile should be fine.

Having said that, the best way to know if the algorithm is working fine with your data would be to look at it and see if there seem to be “shadow” sequences (low freq. sequences behaving similar to another close (sequence-wise) high freq. sequence over the samples). if you see many such cases, it could indicate the error rate you have in you pcr/sequencing run is higher than what deblur expects, and by looking at the exact details of these shadow sequences vs. original sequence, you can see if it due to indel/mismatch/etc.
A tool we are developing which i like to use in order to see this behavior is Calour. by loading the biom table, clustering the features and looking at the resulting clusters, you can see what’s going on in the biom table.

But again, based on our experience with deblur, the default error upper bound profile should be fine.

does this make sense?
le me know how it goes
amnon

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.