Wierd Classification For a Functional Gene

Hi all,
I succeeded in plotting taxonomy for a functional gene. It sounds the classification was incomplete. Can you suggest me a way to do that well? I need to improve the result up to species level. Many taxa is unknown or unassigned or bacteria. The plot is strange, while 16S gene already gave full taxa lineages from kingdom to species for sulfate reducing bacteria. I worked a lot to reach this step but the plot frustrated me. Is there a way?

The classifier is Vsearch consensus.
Qiime version 2020.08.
gene is dsrB.

Thanks a lot.
Qiimer.

Hi @TurboQiimer,
This sounds like an issue with your reference database (incomplete? poor specificity?), the gene itself (lack of taxonomic resolution?), or the primers (poor specificity?). You should go back and check all three to troubleshoot. Provided the reference database fully covers your amplicons, the vsearch-based classifier in q2-feature-classifier should work well, so we can rule out the classifier.

One thing is certainly true: many of the unhelpful classifications you are getting are actually how the reference sequences in your database are annotated, so it is clear that at least part of the issue is your database (missing annotations).

I am unfamiliar with this functional gene, and it sounds like you are using a custom database, so unfortunately I cannot offer any more insight, only this:

This is one reason why 16S is so popular: because it has been historically popular, and so the reference databases and protocols are well designed and validated, and tutorials and support can be found quite easily from others. When you start using other targets, you are entering mostly uncharted territory.

Good luck!

1 Like

Thanks for the time, dear leader!
Thanks for the nice probabilities you demonstrated.
One parameter rescued me at the last minute.
–p-maxaccepts
Have a great time.
Qiimer

If what you mean is that you are setting maxaccepts to 1, you should be aware that this is probably not a good idea. By doing so, you are just taking the top hit, and ignoring possibly equally good top hits. This will lead to overclassification issues and false hits.

If you are setting to a value other than 1, that is probably okay. The defaults have been tested for 16S and ITS sequences, so for other marker genes some optimization is needed, so it would be best to test using a proper benchmark. Otherwise I recommend sticking to the results unless if rational settings can be chosen (i.e., not maxaccepts=1).

I already did not use this parameter. this time it was used with default (default is 10). I did not manipulate it. Thanks again for your consideration.

Excuse me. One question crossed my mind in this regard then I decided to ask it you. How can I optimize the integer of this parameter for a specific gene? And How did you optimized digit 10 to 16S or ITS gene? I just need it to apply it to my gene although I got a good result with 10 in compared with previous as you see in the attached photo.
Thanks

See the citations listed for the q2-feature-classifier plugin. Benchmarks using mock communities and cross-validation are here:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.