A slightly different request than a previous post of mine.
I have a .qza object of sequence data (type ‘FeatureData[AlignedSequence]’).
I’d like to remove a subset of these sequences based on their sequence identifier. This would work exactly like what @SoilRotifer has set up in step #6 of this pipeline based on this python script.
I’m trying to use as many QIIME specific features instead of external scripts. My data don’t have any feature table to work with when filtering, and I’ve struggled to find an equivalent tool in QIIME to perform the same task as the above Python script. Perhaps metadata-based filtering would work? I’m wondering what the structure of that metadata file input would look like. Maybe I can fake it with a 2 column file listing the sequences I want to keep (the list would contain only those sequences I want), and some other prefix for the SQL search to work with?
SeqID Status
00001 keep
00002 keep
... ...
10000 keep
The docs on metadata filtering all point to filtering samples, not sequences though. And the docs on sequence filtering use taxonomic information or a frequency table. I just want to use the simple case where I know what the sequence identifier is to include/exclude.
Thanks for any info as to whether this is possible. It seems trivial, which makes me think it must be in the documentation somewhere and I just can’t find it!
You can toggle the --p-exclude-ids / --p-no-exclude-ids depending on what is in your metadata file. Which, in this case, should look something like this:
Awesome.
I suppose if the list is the features I want to keep, then the switch is --p-no-exclude-ids? How is that not just named --p-include-ids?
Thanks!
QIIME 2 user interfaces are generated dynamically, based on the plugin registration. So, the relevant bit of plugin code is here:
You'll see, the parameter is a boolean (True/False) named "exclude_ids".
When q2cli gets its hands on this method, it automatically generates a few parameter flags for you to use - --p-exclude-ids & --p-no-exclude-ids, to cover the True and False cases, respectively. Admittedly though, that terminology is a bit weird, and is often surprising, so about a year ago @ebolyen extended q2cli to support --p-exclude-ids True / --p-exclude-ids False, which can be a bit more clear.
Hope that explains the weird naming convention a bit.
That’s way too thorough an explanation for my complaint - appreciate the insight.
Maybe programmers think better in the double negative then? My brain would have thought you’d have these two flags as:
--p-include-ids
--p-no-include-ids
Once you extend support to include TRUE/FALSE statements, that seems even simpler. But then again, perhaps this all comes down to what users are doing more often. Maybe they are excluding more often than including, which is what I’m guessing is the case. Which of course then makes what you’ve set up perfectly natural!