Taxonomy Assignment for Full-Length 16S (Already Denoised)

Do you want species level at any cost or the right answer? The two aren't always the same, alas, though with full-length the species labels should be somewhat more reliable than with a single subdomain :slight_smile:

You could see the results in this article — we benchmarked full-length 16S as well as subdomains. Table 2 shows different settings to optimize recall vs. precision... this was optimized on V4 I believe but full-length might not be too dissimilar.

I've improved the vsearch classifier since then... use the top-hit-only option, maybe try adjusting the min-consensus a bit to see what happens. It depends which way you want to go: more species-level classifications or fewer incorrect classifications.

What environment are you profiling? You could check this article out too:
https://www.nature.com/articles/s41467-019-12669-6

and the tutorial:

I'd like to believe that we are not database biased here... both Greengenes and SILVA (and GTDB, which I would recommend trying out if you want something brand new, curated, and species level) happen to be in formats compatible with QIIME 2, and so we link to these on the QIIME 2 website and they get used by most Q2 users. This does not mean that we specifically recommend these databases, or that we are biased against RDP or any other database, just that it doesn't come Q2 compatible out of the box.

RDP does not release a Q2-compatible format as far as I am aware (and we get enough questions on this forum about how to re-format RDP database for use with QIIME 2). Also, RDP does not have species-level information, as far as I recall. So by all means, use RDP if you can format it appropriately... but for your use case I'd recommend taking a look at GTDB and see what you think! :exclamation::eyes::exclamation:

2 Likes