Using tax-credit for 18S rRNA classifier evaluation

Hi Nicholas,

Just read through the preprint and found it super helpful for understanding the usage of different classifiers!

I am hoping to use tax-credit to optimize classifier performance for an 18S study. Are there any / will there be any tutorials available for tax-credit? Similar to the super user friendly qiime tutorials?

Thanks!

1 Like

:blush:

Awesome! Glad to hear tax-credit will be useful for you.

I am not really planning any tutorials at the moment... but see the instructions here for using the jupyter notebooks included in tax-credit. Here are a few tips:

  1. All mock communities used in tax-credit are derived from mockrobiota, which does contain one 18S mock community (it's a bit old and small so yours are probably better! If so, please consider contributing to mockrobiota so others can re-use). You can format your mock community datasets in the same way as this 18S mock community (mock-11) to get them in the correct format for tax-credit.
  2. Check out the jupyter notebooks for processing/analyzing 16S mock communities in tax-credit. You can analyze 18S datasets that are in the same format using the same notebooks just by changing the paths.

One reason why a formal tutorial is not being made is because the jupyter notebooks are sort of one part reproducible analysis, one part tutorial for anyone who can read a bit of python code (if that's not you, don't be too discouraged — most of the heavy lifting is done by other code behind the scenes, and the code in the notebooks should be reasonably easy to follow and mostly just requires altering file paths to match those on your system).

If following the tax-credit notebooks is too daunting, you may want to check out the methods in q2-quality-control for evaluating mock communities. These methods are not as suited as tax-credit for large-scale method testing/optimization, but expose many of the same functions in a much more user-friendly way (i.e., via QIIME2), so should get the job done, particularly if you are not planning on doing the same parameter tuning that we did in that preprint.

Thanks for your interest in tax-credit! I hope that it or q2-quality-control as useful for you. Please let me know if you run into any more trouble with tax-credit / q2-quality-control.

I hope that helps!

Great! Thank you. I looked through the juypter notebooks and they look really easy to follow.

My study involves describing parasite communities in natural systems so generating a mock community in vitro would be super challenging. I plan on generating a mock community in silico using a curated database. Is the simulated mock community workflow in tax-credit similar to Grinder (https://omictools.com/grinder-tool) in that it introduces similar errors that would be observed when using IlluminaMiSeq for sequencing? Should I trim by curated database by my primers before using it in tax-credit?

Thanks!

No, none of the simulated datasets contain simulated sequence errors, because we wanted to distinguish performance decreases caused by sequencing error (measured in mock communities) from idealized classifier performance (tested by cross validation). It would be fairly easy, though (I think), to use something like grindr to incorporate sequencing errors in the cross-validated query sequences. In your case — since you can't use mock communities — that would be useful.

You should check out the "cross-validated" and "novel-taxa" methods instead of "simulated communities". The "simulated communities" are sort of a prototype that aren't actually used in that particular study (we are currently working on something like this — to simulate a mock community — but it's not ready yet).

For simulated sequences, yes definitely. It is always an interesting question, though (as we tested in the preprint): do the reference sequences need to be trimmed to the primer regions used for query sequences? So it's worth testing both.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.