Missing feature in feature-table filter-seqs output

Hi everybody,

after using qiime feature-table filter-features, generating a .tsv file that contained all the features to retain and applying qiime feature-table filter-seqs I used the filtered data to obtain core metrics. However, the command resulted in a an error that indicated an inconsistency in my filtered feature table and the filtered sequences. It turned out that the command qiime feature-table filter-seqs ignored the first line after the header in the .tsv file which was also the most abundant feature of my sample! Why did it do that?

Here the command I used to convert the .csv from the frequency per feature .qzv-page into the .tsv file including the information of which sequences to keep (frequency < 5 in my case). All other sequences are correctly filtered (out), just one was thrown out (the one with highest frequency), which was consistent across different sample sets.

$ echo '#FeatureID','Frequency' | cat - feature-frequency-detail.csv | tr "," "\\t" > features-to-retain.tsv

The qiime feature-table filter-seqscommand looked like this:

$qiime feature-table filter-seqs \
--i-data or-seqs.qza \
--m-metadata-file features-to-retain.tsv \
--o-filtered-data or-seqs_filtered.qza

Any suggestions or explanations would be great!

cheers,
steffen

Hi @steff1088,
Would you mind sharing your features-to-retain.tsv and or-seqs.qza files to help us troubleshoot? You can send those directly to me in a direct message if you do not want your files to be shared openly on the forum.

Note that in the future (probably this month’s release) it will be possible to filter FeatureData[Sequence] artifacts directly using a feature table (e.g., filtered feature table) as input, making it unnecessary to use this workaround for filtering sequences that are not found in a feature table.

Thanks!

Hi @Nicholas_Bokulich,

no problem, here the files:
16S_SJ1_features-to-retain.tsv (70.1 KB)

I just noticed that it wont let me upload the .qza file which is 29 MB big. Can I send this as direct message?

-steffen

Hi @steff1088,
It looks like the issue is that your features-to-retain.tsv file contains a # character at the start of the header line. This causes that line to be interpreted as a comment line instead of a header, and hence it is ignored; the next line (the missing feature!) winds up being interpreted as the header line, and hence is not retained in your filtered sequences! Remove the # in the first line and all is well.

I assume you are reading another forum post or tutorial describing how to filter sequences using a metadata file; if this guide states that a # character should be in the first line, please give me the link so that we can correct that information.

The pending issue that I linked to in my previous post will make the process of filtering sequences found in a feature table much easier and clearer! In the meantime (and until we have better documentation of these actions), thank you for your brave efforts! :sailboat:

2 Likes

Ouch, it looks like @steff1088 ran into the same metadata filtering issue reported here (other users have also run into this problem):

The issue stems from the fact that QIIME 2 doesn't have a required header for metadata files, making it impossible to detect and correct from the situation @steff1088 and @Chris_Hemmerich ran into :-1: I am working on overhauling metadata this month (including the file format), and this issue will be addressed then (no more unexpected filtering behavior!). I'll follow up here when the issue is fixed in the 2017.12 release!

3 Likes

Thank you @Nicholas_Bokulich and @jairideout.

This is exactly what was wrong. I falsely included the # symbol into my header line…
I checked and the feature does now appear in the filtered sequences.

Thanks for your quick help, I keep learning about qiime2 every day!

cheers,
steffen

3 Likes

QIIME 2 2017.12 is now out, and it includes the ability to now optionally filter FeatureData[Sequence] using the feature ids found in a FeatureTable!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

In the QIIME 2 2018.2 release, Metadata now has a very minimal required header to avoid the issues reported in this topic.

There are a number of other changes to QIIME 2 Metadata in the 2018.2 release. See this forum announcement for details on what changed, as well as the updated Metadata tutorial. :sun_with_face: