identifier based filtering of features

I want to remove some ASVs from the table-filt.qza. This table has 141 samples, 1908 features, Total frequency 3,797,638. I want to retain 1559 features.

I created a metadata.txt file containing featureids of 1559 features with column header "#OTU ID" to retain and ran:

qiime feature-table filter-features \
  --i-table table-filt.qza \
  --m-metadata-file ASV-Ctrl-10.txt \
  --o-filtered-table table-filt-1.qza

New table summary shows 141 samples, 1559 features, total frequency 3,783,781.
Feature-filtering is supposed to selectively retain features present only in the samples and remove features present in environmental controls (9 controls). e.g. Control SampleC5 has feature count of 40528 in table-filt.qza. After filtering, sample ControlC5 feature count should be less that 10. Any idea what is going on? Thanks.

Hi @kindergarten,
So from what I am understanding you are filtering out features from your controls and you filtered down to 1559 unique features.

It sounds like feature-table filter-features did filter down to those 1559 features, however one of your controls has more features than you'd expect? Why do you think that Control SampleC5 should have less than 10 features? It looks to me that Control SampleC5 has 40528 occurrences of the features you filtered down too.

Let me know!

1 Like

I have 141 samples - 9 environmental controls, 132 test samples. In the initial analysis, Qiime2/DADA2 identified 1908 features. 1559 features were present in test samples and were absent (or have less than 10 reads) in environmental samples. I want to retain these 1559 features and remove 349 features which are present mainly in environmental controls.

So, I prepared a metadata file ASV-Ctrl-10.txt with desired 1559 features and filtered the table. When I viewed the output table-filt-1.qza in qiime2view, table summary shows 141 samples and 1559 features. That is what I expected. However, when I go to Interactive Sample Detail, feature count in environmental control samples is same as before filtration. e.g. ControlC5 feature counts - before (40528), after 40516. Filtration should have removed 40520 reads from undesired 349 features and should have only 12 reads. There is mismatch between Table Summary and Interactive Sample Detail.

Hi @kindergarten,
Hmm, that is weird. Could you DM me the table that you are trying to filter, the filtered table and the metadata so I can get a better idea of what is going on?

Thank you!

Hello @kindergarten,
Looking into your data, ControlC5 only has one feature that isn't in your list of features to retain. So your command only filter out the occurrences of 1 unique sequence for ControlC5.

I would

  1. make sure that your list of features to retain is accurate to what you want to retain.
  2. Think about removing the 349 features that you do not want by using the --p-exclude-ids flag. It might be easier to manage than the list of 1559 features.

side note: You can't really see what unique features make up a sample using the table.qzv so I had to do some finagling in python

1 Like

Thank you very much for looking at data and suggestions.