after using qiime feature-table filter-features, generating a .tsv file that contained all the features to retain and applying qiime feature-table filter-seqs I used the filtered data to obtain core metrics. However, the command resulted in a an error that indicated an inconsistency in my filtered feature table and the filtered sequences. It turned out that the command qiime feature-table filter-seqs ignored the first line after the header in the .tsv file which was also the most abundant feature of my sample! Why did it do that?
Here the command I used to convert the .csv from the frequency per feature .qzv-page into the .tsv file including the information of which sequences to keep (frequency < 5 in my case). All other sequences are correctly filtered (out), just one was thrown out (the one with highest frequency), which was consistent across different sample sets.
Would you mind sharing your features-to-retain.tsv and or-seqs.qza files to help us troubleshoot? You can send those directly to me in a direct message if you do not want your files to be shared openly on the forum.
Note that in the future (probably this month’s release) it will be possible to filter FeatureData[Sequence] artifacts directly using a feature table (e.g., filtered feature table) as input, making it unnecessary to use this workaround for filtering sequences that are not found in a feature table.
It looks like the issue is that your features-to-retain.tsv file contains a # character at the start of the header line. This causes that line to be interpreted as a comment line instead of a header, and hence it is ignored; the next line (the missing feature!) winds up being interpreted as the header line, and hence is not retained in your filtered sequences! Remove the # in the first line and all is well.
I assume you are reading another forum post or tutorial describing how to filter sequences using a metadata file; if this guide states that a # character should be in the first line, please give me the link so that we can correct that information.
The pending issue that I linked to in my previous post will make the process of filtering sequences found in a feature table much easier and clearer! In the meantime (and until we have better documentation of these actions), thank you for your brave efforts!
Ouch, it looks like @steff1088 ran into the same metadata filtering issue reported here (other users have also run into this problem):
The issue stems from the fact that QIIME 2 doesn’t have a required header for metadata files, making it impossible to detect and correct from the situation @steff1088 and @Chris_Hemmerich ran into I am working on overhauling metadata this month (including the file format), and this issue will be addressed then (no more unexpected filtering behavior!). I’ll follow up here when the issue is fixed in the 2017.12 release!