distinction between classify-samples and classify-samples-ncv

I am struggling to appreciate the differences :face_with_monocle:: between outputs from qiime sample-classifier classify-samples and qiime sample-classifier classify-samples-ncv. Both produce a number of similarly named outputs like feature_importance.qza and predictions.qza. However, the values of the feature_importance.qza artifact vary when the same input table/metadata are input.

From what I can tell in the descriptions in the classify-samples-ncv documentation, the output of the feature_importance.qza provides a (relative) estimate of classification importance (accuracy?) each individual sequence feature contributes…

Outputs predicted
values for each input sample, and relative importance of each feature for
model accuracy.

In the case of classify-samples documentation, I also generate a feature_importance.qza file, and the description for this file seems to be the same as with the classify-samples-ncv output:

–o-feature-importance ARTIFACT FeatureData[Importance]
Importance of each input feature to model accuracy.

Yet something must be different, as I noticed that the particular value for a given feature varies between the two qiime sample-classifier operations. For example, I ran an experiment that collected samples at different locations and different time points. The classify-samples-ncv output feature_importance.qza file for location results in 35 features contributing towards half of the (cumulative) overall classifier importance. The same file with classify-samples using location metadata requires only 17 features to gather half of the overall classifier importance. Why the difference?

Thanks for any help you can offer distinguishing these two programs and their respective outputs!

Different models, different training data.

NCV is training/testing N times over (where N = the number of cross-validations performed), and then averaging importance scores across each fold.

classify-samples is only training on a portion of the samples (since the remaining samples are held back as a test set). Hence, the training data look substantially different from NCV and may yield quite different results.

The model performance and feature importances will depend quite strongly on the training data, so it is not very surprising that these pipelines yield different importance scores (hopefully the top features should appear in both lists, if sample size is high enough — if not, it could be due to small sample size, noisy data, etc).

I hope that helps clarify!

Thanks. When seeking to resolve which particy features are important, would the ncv method be preferable?

What I’m still unclear about is the motivation in using each technique, so I think I need to read more on how these models work - staring at this documentation is probably the start. Nevertheless, it seems like both classify-samples and classify-samples-ncv use a lot of the same terms, so differentiating their actions to understand why you’d use one over another still is giving me troubles :confused: .

As @Nicholas_Bokulich describes above, one difference seems to be how many times a feature might be evaluated for it’s relative importance in training a model. With NCV, you’re going to have all features assessed for feature importance at least one time (or more, depending on how many folds, correct?). With classify-samples, is the difference relative to NCV that your training set is never tested for importance (that is, just those in the testing set are?)?

Ideally, it would be great if someone could share an example of how they used both classify-samples and classify-samplee-ncv in action. Why would you use one over another? Can you use both to tease apart different aspects of your data?

Thanks again

1 Like

In my opinion NCV is generally better, since it is training and testing across multiple iterations so gives a sense of variance in accuracy, as well as predictions for all samples.

Yes, because indeed these are using the same methods but different training/testing schemes.

Not exactly — it’s which samples are included in training, and how many times training occurs. This could indeed be a matter of which features are tested, but not necessarily.

No — importance is determined on the training set, so the difference is that importance is determined on only some of the samples and only once (whereas for NCV importance is averaged across each iteration).

classify-samples is a complete pipeline and easier to use. It also outputs a trained classifier that can be re-used to predict other samples (NCV does not, since it trains K classifiers!). The motivations for NCV I mentioned above.

Both are perfectly valid to use, so it just depends on which is more suited to your use case… and the type of training/testing scheme you want to use.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.