When using the filter-seqs command, where do we get the information to put in the metadata file of sequences to exclude?
I'm guessing the process of filtering out sequences found in blank samples would start off like this:
#Create a sequences artifact with only the blank samples
qiime feature-table filter-seqs --i-data rep_seqs.qza --m-metadata-file map.txt --p-where "Control='Y'" --o-filtered-data control_seqs.qza
And then once you have your artifact of sequences from the control samples, you could tabulate the sequences to get them into a format you can view ...
What's the equivalent of downloading the frequency per feature .csv in @thermokarst's recommended workflow above? Would you download the fasta file, save it as a TSV and use that as your metadata file for removing the 'blank' sequences from the original rep seqs file?
Hi @Matilda_H-D,
Thanks for posting your question! The metadata input to filter-seqs can only consist of feature metadata, e.g., a sequence file or taxonomy file. We do not yet have functionality for removing sequences that are found in a specific sample in a single command (multi-step details are below).
For now, this forum post still describes the best workaround for removing features from a feature table that are detected in a specific sample. What filter-features allows us to do is to also filter our sequences file by passing in features-to-filter.tsv (see the forum post for how that file is generated) as metadata.
Note, however, that removing all sequences found in a blank may not be a good approach; many of these sequences may in fact be cross-contaminants rather than exogenous contaminants and removing them could eliminate valid features from other samples.
In the future we plan to add methods for contaminant detection that more directly address this issue.
I hope that answers your questions! Please let us know if you have additional questions/concerns.
What filter-features allows us to do is to also filter our sequences file by passing in features-to-filter.tsv (see the forum post for how that file is generated) as metadata.
So do I understand correctly, when you use the filter-table filter-features command, both the feature table and the representative sequences file will be filtered according to the metadata file that is passed? I had another look at the filter-featuresdocumentation page and it only mentions a filtered table as output, not a filtered sequences file.
What I would be interested in was what Fernando Stuart mentioned in this post -- filtering the rep seqs file in order to build a phylogenetic tree containing only the sequences that remain after filtering out those found in lab controls.
Looking at the feature-table filter-seqsdocumentation, it seems like maybe this could be achieved using this command and passing in the same metadata file (i.e. list of features, not of actual sequences, to exclude) that would be used in filter-features? Is that right?
No — filter-features will only remove those features from the feature table, not from the sequences file.
Correct. The process of generating a list of features that you can pass as metadata is mentioned further down in the same thread. The features-to-filter.tsv file described in that thread (the same list of features to filter from the feature table) would be passed to filter-seqs to remove those same features (sequences) from the FeatureData[Sequence] artifact that you have using the following command:
Generating that features-to-filter.tsv is described in this post (and in the future we may support a more direct method for generating such a file that contains features found in a single sample or collection of samples; I have raised an issue here to track progress)