Include_ids argument in filter_samples method?

Franck_Lejzerowicz · October 1, 2018, 11:25pm

Hi!

Subsetting a feature table for some samples is not uncommon and would come in very handy in the API.

We figured with Yoshiki through the creation of an intermediate pandas dataframe (eg. below), which could be relatively easily avoided thanks to some additional argument in the method " filter_samples ": for example " include_ids " (subset positive) to do the inverse of " exclude_ids " (i.e. subset negative). Thanks!

Franck

from qiime2.plugins.feature_table.methods import filter_samples
x = pd.DataFrame(index=pd.Series(name='#SampleID', data=<list_of_sample_names>))
filter_samples(a_bt, metadata=q2.Metadata(x))

ebolyen · October 1, 2018, 11:28pm

Hey @Franck_Lejzerowicz,

I'm not sure I follow your suggestion. The code snippet provided will work as described. Are you interested in an include_ids to make it more clear how the filtering will behave?

Franck_Lejzerowicz · October 2, 2018, 10:58pm

Oh I get it now! exclude_id changes the default behaviour, which to keep the samples in the metadata.
In fact the suggestion was to ask whether it would be possible to pass a list of samples directly, and not a q2 metadata table.
Thanks!

ebolyen · October 5, 2018, 7:26pm

I suppose it would be possible, although with how lists are implemented in the CLI it would look something like this:

--p-include-ids ID_1 --p-include-ids ID_2 --p-include-ids ID_3 ...

which would be super tedious and one probably has these IDs in a file to begin with.

If they are already in a file, then the only thing you need to turn it into QIIME 2 metadata would be a ID header which we already support (QIIME 2 metadata doesn't require there to be any real columns, so IDs by themselves are fine),