Subsample - option to subsample samples in a feature-table by a metadata category

Lichen · February 20, 2019, 8:22am

Hello,

For making group comparisons, I have often found it useful to randomly subsample a feature-table such to have a similar number of samples in each group of interest (e.g., states of a metadata variable).

It would be useful to have an argument to 'feature-table subsample' that would subsample within states of a desired metadata variable, or intersections between states of multiple metadata variables, to a desired number of samples (x).

A useful default would be to subsample to the number of samples represented by the state (or intersection among states) with the fewest number of samples (y). If x exceeds y, another useful option would be to argue to include or exclude samples represented by states whose sample size is less than x.

Best wishes,

Justin

Nicholas_Bokulich · February 20, 2019, 1:23pm

Thanks @Lichen! You can use feature-table subsample to subsample, though you cannot stratify on a specific metadata variable. I have opened this feature request to add that eventually — contributions are always welcome to QIIME 2 and its plugins!

Negin · February 2, 2021, 12:24am

Hi is this available yet?

Nicholas_Bokulich · February 2, 2021, 7:51am

No — you can follow the link to the github issue above, which will be the best place to keep track of any progress.