meaning behind rescript evaluate-crossvalidate warning

After running the following rescript command:

qiime rescript evaluate-cross-validate --verbose \
   --i-sequences input_seqs.qza \
   --i-taxonomy input_taxa.qza \
   --p-reads-per-batch 4000 \
   --p-n-jobs 4 \
   --output-dir CVoutput

I received this warning:

UserWarning: The lists of input taxonomies and labels are
different lengths. Additional taxonomies will be labeled numerically by their order in the inputs. Note that if these numbers match existing labels, those data will be grouped in the visualization.
  warnings.warn(msg, UserWarning)

I had thought that perhaps the number of features in the input_seqs.qza file were different from the number of features in the input_taxa.qza file. This isn’t the case - they are the same length. Perhaps this warning is concerned with the number of features in expected/observed output .qza files? What would cause such an issue, if that’s the case?

Thanks for your help with the detective work :mag: !

You can ignore that warning… evaluate-cross-validate is a pipeline running a few actions under the hood, and that particular warning is coming from qiime rescript evaluate-classifications, which (when run on its own) allows a user to input a list of labels for labeling the inputs, e.g., if comparing multiple classifications. It has no bearing on evaluate-cross-validate (since there is only one input so no reason to label anything internally), but the warning still appears. Just ignore and move on!

But if you are interested in learning how to use this with evaluate-classification, see qiime rescript evaluate-classifications --help

Thanks @devonorourke!

1 Like

Thanks! Any chance that rescript has a hidden feature that lets me compare the number of shared labels between lists of taxonomies (for a given taxonomic level)? For example, if I had two taxonomy datasets composed like:

## Taxonomy_file_1
Feature ID    Taxon
0001    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;mellifera
0002    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;cerana
0003    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;dorsata

and

## Taxonomy_file_2
Feature ID    Taxon
0001    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;
0002    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;cerana
0003    Animalia;Arthropoda;Insecta;Hymenoptera;Apidae;Apis;

The function would count the number of shared non-redundant taxonomy labels at a given level? So there would be 3 shared features at Kingdom, Phylum, Class, Order, Family, and Genus levels, but just a single shared Feature at Species level (Feature 0002’s)?

That would be sweet :honeybee:

Thanks!

@devonorourke,
No, not yet. I’ve mulled this for a bit so I will consider your question a feature request :smile:

I’ve hesitated on this because it seems like a bit of an error-prone process, especially if comparing across taxonomies, so have been mulling solutions. Contributions are welcome! :wink:

Could you point me in the direction of any documentation (or perhaps just an example file) that demonstrates how to import taxonomy and sequence files in QIIME?
As usual, I have an R implementation to solve this problem, but I’d like to start tackling these in Python. One of the first roadblocks I face is understanding how to import the QZA files in a Python framework (with R, I’d just use the qiime2R package).
Thanks

CLI instructions:
https://docs.qiime2.org/2020.8/tutorials/feature-classifier/#obtaining-and-importing-reference-data-sets

python API:

import qiime2
seqs = qiime2.Artifact.import_data("FeatureData[Sequence]", path-to-sequences-object)
taxa = qiime2.Artifact.import_data("FeatureData[Taxonomy]", path-to-taxonomy-object)

Your sequence and taxonomy objects need to be appropriate formats, e.g., as pandas.Series.