We would like to repeat feature volatility analysis with q2-longitudinal code multiple times to obtain more accurate machine learning models, using paired samples. To check the overall accuracy of the final accumulated model, I would like to have the sample information used for training and the coordinate data used for the calculation of accuracy-results for each feature volatility analysis with q2-longitudinal, but I cannot output them.
Could you tell me how to output the following data in feature volatility analysis in q2-longitudinal? 1) Sample information used for training and test 2) The coordinate data used for the calculation of accuracy-results
Hi @Shimpei ,
feature-volatility is a pipeline that runs multiple different actions under the hood — and it only outputs some of the results, it throws out the intermediate data (e.g., specifics on which samples were used for training/testing). You should instead:
use q2-sample-classifier directly. The regress-samples pipeline will give you what you need.
you can then pass the outputs to q2-longitudinal's plot-feature-volatility action to obtain the plots of feature volatility and importance.
regress-samples will output this information.
You can also use q2-sample-classifier's split-table action directly if you want a little more control over this step.
This is in your sample metadata file, whatever the target variable is (e.g., time). But regress-samples will also output this information... the true target values and predicted target values for each test sample.
q2-longitudinal wraps q2-sample-classifier to perform machine learning. So what I outlined above was basically to perform the underlying steps directly in q2-sample-classifier so that you can access the intermediate files that you need. The outputs can be passed to q2-longitudinal to generate the same visualization.