Clawback classifier cross validation

mike_kratz · May 22, 2023, 4:49pm

I would like to compare the standard feature-classifier to a created clawback "water-non-saline" weighted classifier in terms of genera and species level accuracy . Is it possible to use RESCRIPt's evaluate-cross-validate function to cross validate the clawback made classifier? I do not see a way to incorporate the clawback generated weights.

Thank you for your help,

Mike

Nicholas_Bokulich · May 22, 2023, 7:04pm

Hi @mike_kratz ,

Great question. No, weights cannot be used with evaluate-cross-validate because a much more complicated cross-validation scheme needs to be used to properly evaluate taxonomic weights (including simulation of realistic-looking samples, as classification of reference sequences is sort of meaningless with weighted taxonomic classifiers). If you really want to use cross-validation with weighted taxonomic classifiers, check out the citation for q2-clawback, there should be a link for the code used for classifier evaluation in there.

But if you just want to evaluate the classifier without cross-validation (i.e., simulating the situation when all query sequences have exact matches in the reference database, so the correct answer is known and you are just evaluating how effectively you can resolve species from their near neighbors), you can (a) train the classifiers with/out weights, (b) classify the reference sequences with the same classifiers, and (3) run evaluate-classifications on the outputs to directly compare the classification accuracies.

The weighted classifiers might not perform all that well for classification of reference sequences, though, as the species distribution in the reference database will be really skewed and look nothing like a natural distribution, so this approach probably will not really answer the question that you want to know (i.e., how do the classifiers compare for classification of real samples). To answer this question you should simulate realistic looking samples (with known composition) and check the evaluation code used in q2-clawback paper.

Good luck!

mike_kratz · May 22, 2023, 8:03pm

Hey @Nicholas_Bokulich!

Thank you for your reply, that makes perfect sense! Since I want to know how the classifiers compare on real samples, I will use the simulation approach that you've recommended; is there a specific software/function you would recommend to simulate realistic data? I have downloaded the supplementary code from the article under @BenKaehler's github profile and will try and work on that once I can simulate a proper a dataset.

Thank you,

Mike

Nicholas_Bokulich · May 22, 2023, 8:35pm

I believe the simulation code is included in the supplementary code from the paper. But let us know if you can't find it.

Good luck!

mike_kratz · May 23, 2023, 4:44pm

@Nicholas_Bokulich I was able to find the simulation code (I think it is the "empirical-samples" on the "paycheck_cv" command). The only issue I am currently facing is running the cross validation code; where this error is popping up:

paycheck_cv
--empirical-samples ./clawback/cross.validate/feature-table.biom
--ref-seqs ./clawback/readytowear/data/silva_138_1/515f-806r/ref-seqs.qza
--ref-taxa ./clawback/readytowear/data/silva_138_1/515f-806r/ref-tax.qza
--results-dir ./clawback/cross.validate
--intermediate-dir ./clawback/temp
--k 5
--log-file ./clawback/cross.validate/log
--log-level DEBUG

Traceback (most recent call last):
File "/home/me/anaconda3/envs/qiime2-2023.2/bin/paycheck_cv", line 8, in
sys.exit(cross_validate())
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/paycheck/cross_validate.py", line 778, in cross_validate
taxonomy_samples = map_svs_to_taxa(
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/paycheck/cross_validate.py", line 1576, in map_svs_to_taxa
ref_taxa = Artifact.import_data(
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 321, in import_data
return cls.from_view(type, view, view_type, provenance_capture,
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 349, in _from_view
result = transformation(view, validate_level)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 68, in transformation
self.validate(view, level=validate_level)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/core/transform.py", line 143, in validate
view.validate(level)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/plugin/model/file_format.py", line 34, in validate
if not self.sniff():
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_types/feature_data/_format.py", line 50, in sniff
line = fh.readline()
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "/home/me/anaconda3/envs/qiime2-2023.2/lib/python3.8/encodings/utf_8_sig.py", line 69, in _buffer_decode
return codecs.utf_8_decode(input, errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 10: invalid continuation byte

I am not sure how to resolve this error, so any insight would be great!

Thank you,

Mike

mike_kratz · May 23, 2023, 6:02pm

@Nicholas_Bokulich Actually, I may have figured out the issue. I did not export from QIIME2 my reference sequences and taxonomy to fasta and .tsv files, respectively, before running paycheck_cv. Now that I've exported these files, paycheck_cv has been running; I will report back if it generates a result file with errors or not.

Thank you,

Mike Kratz

system · June 24, 2023, 12:02am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.