Quick question, in order to filter a large number of samples from a merged data table is straight forward based on sample id (filtering from the table.gza with a new metadata table).
Do we also need to filter the merged req-seq.gza sequences file? I was trying to figure this out since the it would seem that the sequences would just filter appropriately given that they’re already assigned per sample?
I have a merged table.gza and req-seqs.gza file for 7 runs. I now want to pull out only the samples from the table to work one to: a) make a tree b) more diversity metrics c) export data to phyloseq to make graphs/charts.
The table should be filtered by sample name, but the req-seq.gza file for the same samples as well? I think my main confusion is that since we merged the req-sequences together, should those be pulled out for the samples we want to work with? Because we want to generate a tree only for the samples of interest and not the merged table?
This is my main question, should I filter the rep-seqs.qza file to create a new tree for the subset of samples? Maybe I really mean to say, can I filter a rep-seqs.gza table based on sample ID and not feature ID.
Thanks for the examples, that makes a lot of sense!
I think either approach (filtering your FeatureData[Sequence] or not) makes sense to me - if you were staying within QIIME 2 that wouldn’t cause any kind of mechanical issues with things like diversity metrics or taxonomic assignment, since features that aren’t present in your FeatureTable[Frequency] would just be dropped from the tree or sequences. I am not 100% how this will work in phyloseq (if extra tree tips will cause a problem or not). Perhaps it is worth running things both ways for a subsample and comparing the results? The other aspect that might be worth looking into is the actual tree-building process - you will most likely see different trees, depending on the features present, or the tree building method utilized (right now the q2-phylogeny plugin uses fasttree, but @SoilRotifer added in RAxML support which should be out later this month; as well @Stefan has a q2-fragment-insertion plugin, which uses a fragment-insertion technique instead of a de novo one to build a tree). Anyway, looks like you found a few resources that will help you with the actual process of filtering FeatureData[Sequence] using a FeatureTable[Frequency] - thanks for linking to those here! Keep us posted, and let us know if you have any more questions!