error running qiime feature-table filter-seqs

anna-schrecengost · May 23, 2022, 10:29pm

Hi, I have looked around the forum and didn't find this question posted before, apologies if it has been.

I am having trouble filtering my dna sequences using my filtered feature table. When I run the following code:

qiime feature-table filter-seqs --i-data rep-seqs.qza --i-table arch_asv_25filtered.qza --o-filtered-data repseqsfiltered25.qza

I get this error message: Plugin error from feature-table: All features were filtered out of the data.

These are the files I used
arch_asv_25filtered.qza (11.5 KB)
rep-seqs.qza (123.8 KB)

I have checked and I know that the features I have in the table are present in the sequence file, so I'm not sure why this is happening. There are about ~300 sequences in the rep-seqs and about ~30 features in the feature table.

In case it's relevant, I obtained these files by merging output from the same dada2 runs. I exported the feature table into R, filtered out the bottom ~25% features per sample, then reimported it after converting it to a biom file (because I couldn't find a way to filter based on relative frequency per sample in qiime). these commands are below:

biom convert -i arch_asv_25filtered.tsv -o arch_asv_25filtered.biom --to-json

qiime tools import --input-path arch_asv_25filtered.biom --type 'FeatureTable[Frequency]' --input-format BIOMV100Format --output-path arch_asv_25filtered.qza

I really appreciate any help! I've been spending too much time trying to figure this out, maybe there is an obvious solution. Thanks!

Keegan-Evans · June 2, 2022, 6:46pm

@anna-schrecengost,

It looks like something may have gone wrong during filtering, I would expect to see exactly the same number of sequences in the rep-seqs file and number of features in the feature table. At this point I think I would import your data directly into QIIME2, and then use the filter-features-conditionally command(docs) to perform the filtering you would like to. Alternatively, if you are just trying to clean up your data rather than following a strictly prescribed procedure, you might consider performing taxonomic classification first and then filtering based first on the taxonomy and then by additional considerations. This is a fairly common approach, and you can see it in action in the Cancer Intervention tutorial. Hope this helps!

anna-schrecengost · June 2, 2022, 7:25pm

Hi @Keegan-Evans, thank you for your reply! I didn't know about this filtering option, but i don't think it completely fits my needs (although I may be wrong about that). Basically I have data from low diversity symbiont communities, with generally 1 or 2 ASVs comprising ~80% of the total reads per sample. I want to pick out these top 1 or 2 ASVs per sample and construct a phylogeny with them. So it really depends on relative abundance per sample, and has nothing to do with abundance across samples. is there a way to use filter-features-conditionally to accomplish this?

I didn't see a way to do this within QIIME, which is why i brought the feature table outside and then re-imported it after filtering the way I wanted. And then I thought i could use filter-seqs to filter the rep-seqs as well, and then obtain a feature table and rep seqs file with the same amount of ASVs. Sorry if I wasn't clear in my original post. I have noticed that this feature table behaves weirdly when I tried to use it in other applications as well. For example, I wanted to try to use fragment-insertion to construct a phylogeny with a custom reference dataset and the symbiont sequences, which I managed to do by just manually picking out the symbiont representative sequences from the rep-seqs file. But when I tried to visualize the tree with qiime empress, it did not recognize the feature table as having any of the same features in it as the tree, even though I checked myself and all of the features in the feature table (similar error as I described in the original post). I suspect something happens with this file when I export it then import it back into qiime.

Keegan-Evans · June 20, 2022, 5:27pm

@anna-schrecengost,

I am not sure that it would provide the exact filtering, but you could filter for features that are present at a some what high percentage(maybe 10%?) in at-least some very low percentage(0.01% or so) of samples. This should give you the high-abundance features, even when they are found in a low number of samples(even a single one if you set you your sample percentage low enough).

This unfortunately will not help with well to well contamination, though your contaminates would at least be "real" sequences, even if they are originating in a different sample than it appears. Not sure if this would be helpful, we have had some behind the scenes discussion and this is the current best solution that we have so far. If you find that it does not work for you for some particular reason, go ahead and post again and we can try to come up with a solution there.

Not impossible, but there should generally be no manipulation of the data when importing, it is basically just assigned whatever type label that you specify with the import command and stored in a zip file with that type label, and some extra bits of information, though these are stored separately from the data itself.

system · July 21, 2022, 11:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.