qiime sample-classifier classify-samples relative frequency

Hello,

I am trying to run the qiime sample-classifier classify-samples with a FeatureTable[RelativeFrequency] as input. Even though the documentation says that it is possible to use this table as input, I get the following error message:

Plugin error from sample-classifier:

Parameter 'table' requires an argument of type FeatureTable[Frequency]. An argument of type FeatureTable[RelativeFrequency] was passed.

This is the part of the documentation where it says that the FeatureTable[RelativeFrequency] can be used as input:
Usage: qiime sample-classifier classify-samples [OPTIONS]

Predicts a categorical sample metadata column using a supervised
learning classifier. Splits input data into training and test sets.
The training set is used to train and test the estimator using a
stratified k-fold cross-validation scheme. This includes optional
steps for automated feature extraction and hyperparameter
optimization. The test set validates classification accuracy of the
optimized estimator. Outputs classification results for test set.
For more details on the learning algorithm, see http://scikit-
learn.org/stable/supervised_learning.html

Inputs:
--i-table ARTIFACT FeatureTable[Frequency | RelativeFrequency |
PresenceAbsence | Composition]
Feature table containing all features that
should be used for target prediction.
[required]

I am using QIIME 2 version 2024.10 installed with conda.

Thank you for your help!

1 Like

Hello @elmavanwieren. Apologies for the somewhat delayed response.

Can you please run the following two commands in your conda environment and post the output here

qiime info

qiime sample-classifier classify-samples --help

Thank you

Hello @Oddant1,

The output for the qiime info command is as follows:

System versions
Python version: 3.10.14
QIIME 2 release: 2024.10
QIIME 2 version: 2024.10.1
q2cli version: 2024.10.1

Installed plugins
alignment: 2024.10.0
composition: 2024.10.0
cutadapt: 2024.10.0
dada2: 2024.10.0
deblur: 2024.10.0
demux: 2024.10.0
diversity: 2024.10.0
diversity-lib: 2024.10.0
emperor: 2024.10.0
feature-classifier: 2024.10.0
feature-table: 2024.10.0
fragment-insertion: 2024.10.0
longitudinal: 2024.10.0
metadata: 2024.10.0
phylogeny: 2024.10.0
quality-control: 2024.10.0
quality-filter: 2024.10.0
rescript: 2024.10.0
sample-classifier: 2024.10.0
stats: 0+unknown
taxa: 2024.10.0
types: 2024.10.0
vizard: 0.0.1.dev0
vsearch: 2024.10.0

Application config directory
/home/elwie/miniconda3/envs/qiime2-amplicon-2024.10/var/q2cli

Config
Config Source: /home/elwie/miniconda3/envs/qiime2-amplicon-2024.10/etc/qiime2_config.toml

Getting help
To get help with QIIME 2, visit https://qiime2.org
To get help with configuring and/or understanding QIIME 2 parallelization, visit Parallel Pipeline configuration — Using QIIME 2

and for the qiime sample-classifier classify-samples --help command:

Usage: qiime sample-classifier classify-samples [OPTIONS]

Predicts a categorical sample metadata column using a supervised
learning classifier. Splits input data into training and test sets.
The training set is used to train and test the estimator using a
stratified k-fold cross-validation scheme. This includes optional
steps for automated feature extraction and hyperparameter
optimization. The test set validates classification accuracy of the
optimized estimator. Outputs classification results for test set.
For more details on the learning algorithm, see http://scikit-
learn.org/stable/supervised_learning.html

Inputs:
--i-table ARTIFACT FeatureTable[Frequency | RelativeFrequency |
PresenceAbsence | Composition]
Feature table containing all features that
should be used for target prediction.
[required]
Parameters:
--m-metadata-file METADATA
--m-metadata-column COLUMN MetadataColumn[Categorical]
Categorical metadata column to use as
prediction target. [required]
--p-test-size PROPORTION
Range(0.0, 1.0) Fraction of input samples to exclude from
training set and use for classifier testing.
[default: 0.2]
--p-step PROPORTION Range(0.0, 1.0, inclusive_start=False)
If optimize-feature-selection is True, step
is the percentage of features to remove at
each iteration. [default: 0.05]
--p-cv INTEGER Number of k-fold cross-validations to
Range(1, None) perform. [default: 5]
--p-random-state INTEGER
Seed used by random number generator.
[optional]
--p-n-jobs NTHREADS Number of jobs to run in parallel.
[default: 1]
--p-n-estimators INTEGER
Range(1, None) Number of trees to grow for estimation.
More trees will improve predictive accuracy
up to a threshold level, but will also
increase time and memory requirements. This
parameter only affects ensemble estimators,
such as Random Forest, AdaBoost, ExtraTrees,
and GradientBoosting. [default: 100]
--p-estimator TEXT Choices('RandomForestClassifier',
'ExtraTreesClassifier', 'GradientBoostingClassifier',
'AdaBoostClassifier[DecisionTree]',
'AdaBoostClassifier[ExtraTrees]', 'KNeighborsClassifier',
'LinearSVC', 'SVC') Estimator method to use for sample
prediction.
[default: 'RandomForestClassifier']
--p-optimize-feature-selection / --p-no-optimize-feature-selection
Automatically optimize input feature
selection using recursive feature
elimination. [default: False]
--p-parameter-tuning / --p-no-parameter-tuning
Automatically tune hyperparameters using
random grid search. [default: False]
--p-palette TEXT Choices('YellowOrangeBrown', 'YellowOrangeRed',
'OrangeRed', 'PurpleRed', 'RedPurple', 'BluePurple', 'GreenBlue',
'PurpleBlue', 'YellowGreen', 'summer', 'copper', 'viridis',
'cividis', 'plasma', 'inferno', 'magma', 'sirocco', 'drifting',
'melancholy', 'enigma', 'eros', 'spectre', 'ambition',
'mysteriousstains', 'daydream', 'solano', 'navarro', 'dandelions',
'deepblue', 'verve', 'greyscale')
The color palette to use for plotting.
[default: 'sirocco']
--p-missing-samples TEXT Choices('error', 'ignore')
How to handle missing samples in metadata.
"error" will fail if missing samples are
detected. "ignore" will cause the feature
table and metadata to be filtered, so that
only samples found in both files are
retained. [default: 'error']
Outputs:
--o-sample-estimator ARTIFACT SampleEstimator[Classifier]
Trained sample estimator. [required]
--o-feature-importance ARTIFACT FeatureData[Importance]
Importance of each input feature to model
accuracy. [required]
--o-predictions ARTIFACT SampleData[ClassifierPredictions]
Predicted target values for each input
sample. [required]
--o-model-summary VISUALIZATION
Summarized parameter and (if enabled)
feature selection information for the
trained estimator. [required]
--o-accuracy-results VISUALIZATION
Accuracy results visualization. [required]
--o-probabilities ARTIFACT SampleData[Probabilities]
Predicted class probabilities for each
input sample. [required]
--o-heatmap VISUALIZATION
A heatmap of the top 50 most important
features from the table. [required]
--o-training-targets ARTIFACT SampleData[TrueTargets]
Series containing true target values of
train samples [required]
--o-test-targets ARTIFACT SampleData[TrueTargets]
Series containing true target values of
test samples [required]
Miscellaneous:
--output-dir PATH Output unspecified results to a directory
--verbose / --quiet Display verbose output to stdout and/or
stderr during execution of this action. Or
silence output if execution is successful
(silence is golden).
--recycle-pool TEXT Use a cache pool for pipeline resumption.
QIIME 2 will cache your results in this pool
for reuse by future invocations. These pool
are retained until deleted by the user. If
not provided, QIIME 2 will create a pool
which is automatically reused by invocations
of the same action and removed if the action
is successful. Note: these pools are local
to the cache you are using.
--no-recycle Do not recycle results from a previous
failed pipeline run or save the results from
this run for future recycling.
--parallel Execute your action in parallel. This flag
will use your default parallel config.
--parallel-config FILE Execute your action in parallel using a
config at the indicated path.
--example-data PATH Write example data and exit.
--citations Show citations and exit.
--use-cache DIRECTORY Specify the cache to be used for the
intermediate work of this action. If not
provided, the default cache under
$TMP/qiime2/ will be used. IMPORTANT
FOR HPC USERS: If you are on an HPC system
and are using parallel execution it is
important to set this to a location that is
globally accessible to all nodes in the
cluster.
--help Show this message and exit.

Thank you

Ok, that all checks out. Interesting. Would it be possible for you to DM me the FeatureTable you are trying to use so I can take a closer look at it?

Yes I would like to send you my FeatureTable, but unfortunately I am not allowed to send personal messages.

However, I did try to run the qiime sample-classifier classify-samples with the table from the Moving Pictures tutorial, but I get the same error message.

qiime sample-classifier classify-samples --i-table results/table-dada2-rel.qza --m-metadata-file doc/test-sample-metadata.tsv --m-metadata-column body-site --p
-estimator RandomForestClassifier --output-dir results/test

sample-metadata.tsv (2.0 KB)

table-dada2-rel.qza (74.0 KB)

Thanks @elmavanwieren ! Looks like you uncovered a bug in the pipeline. The issue is that a relative frequency table is accepted input for all parts of the pipeline except for the heatmap visualizer at the end of the pipeline. I raised an issue here to fix it in case you would like to track it:

In the meantime, to work around this bug, you can run the individual steps in the pipeline, e.g., split-table, fit-classifier, predict-classification, confusion-matrix. This would allow use of a relative-frequency table, while skipping the heatmap step. That's the one step that is breaking in this pipeline when a relative frequency table is passed. The downside is that you need to run 4 steps instead of 1.

Thanks for your patience!

1 Like

Good to know that there is a bug in the pipeline. Hopefully it can be easily fixed!

Thank you for the workaround!