Specificity and Sensitivity at Different Abundances?

Has anyone done work to show the specificity and sensitivity of taxa assignments in a whole genome shotgun sequencing, based on a) number of reads and b) the relative abundance of the taxa? Some commercial vendors - like TinyHealth - use an abundance of 0.05% as the minimum they will allow to be presented in the final CSV file, presumably because smaller abundances quickly become less accurate. I want to have a finer degree of understanding how much accuracy I might expect to get with an abundance of 1%, 0.1%, 0.01%, 0.001%, etc.

1 Like

Hello pone,

Of the genomic assembly or taxonomy or relative abundance?

For example, the abundance could be accurate to 2 significant digits yet the genus level is incorrectly assigned. So each metric will have its own accuracy.

Each step in the pipeline has a signal-to-noise ratio and quantifying signal loss and noise gain throughout a full pipeline is a worthy goal.

Are you familiar with the CAMI challenge? They are interested in similar questions:

CAMI2 - Microbiome COSI

2 Likes

Thanks for adding that nuance. I would like to know the specificity and sensitivity of genus and species assignment of shotgun sequencing when the abundance of genus/species is 1%, 0.1%, 0.01%, 0.001%, etc.

Understood.

Qiime2 does not do this directly, but this benchmark has been done using Qiime2.

Please see: Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin - PMC, especially this part:

We also developed tax-credit GitHub - caporaso-lab/tax-credit-code: Microbiome benchmarking code and jupyter notebooks from the tax-credit project, an extensible computational framework for evaluating taxonomy classification accuracy. This framework streamlines the process of methods benchmarking by compiling multiple different test data sets, including mock communities [14] and simulated sequence reads. It additionally stores pre-computed results from previously evaluated methods, including the results presented here, and provides a framework for parameter sweeps and method optimization. Tax-credit could be used as an evaluation framework by other research groups in the future or its raw data could be easily extracted for integration in another evaluation framework.

This sounds like the perfect tool!

Let us know what you try next.

1 Like