Feature-table filter-samples


(Esra Mescioglu) #1

Filtering out unwanted samples from a table can result in lots of frequencies that are no longer present in any samples, so your data remains really large with lots of 0’s. It would be nice if feature-table filter-samples could also remove all frequencies that don’t show up in at least 1 samples. I know you can do this after with feature-table filter-features by using --p-min-samples, but it might not be intuitive to some users.

Best,

Esra


(Matthew Ryan Dillon) #2

(Nicholas Bokulich) #3

Thanks for the suggestion @emescioglu! Keeping the functionality in filter-samples and filter-features separate has been rather intentional — filter-samples should not silently drop zero-abundance features, and filter-features should not silently drop empty samples. We have had similar discussions in the past (and just rehashed this discussion based on your suggestion) but the consensus has generally been that these filtering functions should be kept separate; adding a --p-min-samples parameter would really just duplicate some functionality from filter-features.

Having zero-abundance features in a feature table is not necessarily a problem, either. Of course it is “dead weight” especially when running something like ANCOM but we do have tutorials for filter-samples and filter-features and recommendations elsewhere in the qiime2 documentation that should make these concepts (and the motivations for removing low-abundance features) reasonably clear.

That said, we are here to serve the needs of :qiime2: users — this is the first time we have been asked for this feature but if we get more requests from users, we can get the wheels in motion to add such a parameter.

I hope that makes sense! Thanks for the suggestion.


(Nicholas Bokulich) #4

(Nicholas Bokulich) #5

Hi @emescioglu,
I must correct my prior statement: according to the docs, it looks like empty features are dropped by default after filtering out samples. If you are seeing otherwise, please share an example table so we can check it out! One way or another, since that behavior already exists I agree that having a parameter to control this would be useful, and I have raised an issue to track this.

Thank you for the suggestion!