Error while using metadata-based filtering

Hello,

I’m trying to filter out sequence variants present in my extraction (-) and PCR (-) from the rest of my dataset. I noticed similar posts in this forum and I’ve gone through them to see if any of the recommended fixes worked for me. So far they haven’t, but maybe I missed something obvious.

I’ve attempted to filter my data in two ways:

1)Using metadata-based filtering:
qiime feature-table filter-features
–i-table table-final.qza
–m-metadata-file BrazilMicrobMetadata.tsv
–p-where “sampleid!=‘PCR-’ OR sampleid!=‘Ext-’”
–o-filtered-table filtered_table.qza

  1. Using identity based filtering (this metadata file contains all the samples except the PCR (-) and Ext (-):
    qiime feature-table filter-features
    –i-table table-final.qza
    –m-metadata-file BrazilMicrobMetadata_filtered.tsv
    –o-filtered-table id-filtered-table.qza

I then try to summarize the filtered table using this command:
qiime feature-table summarize
–i-table id-filtered-table.qza
–o-visualization id-filtered-table.qzv
–m-sample-metadata-file BrazilMicrobMetadata_filtered.tsv

And I get this error:
Plugin error from feature-table:
All IDs were filtered out of the Metadata, resulting in an empty Metadata object.

I used ls -ls to verify that none of the files that I use in this command are empty. I’m using QIIME2-2018.2.

Thanks for the help!
Kelly

Thank you for the update Kelly,

Just for reference, here is one of the previous conversations about using positive and negative controls.

Colin

1 Like

Hi @colinbrislawn!

Thanks for pointing out that thread. I’ll go ahead and ID the sequence variants in my (-) controls and filter them out using filter-seqs.

Best,
Kelly

After classifying the sequence variants in my dataset and creating a list of taxa that I believe are contaminants, I am still running into difficulty eliminating certain taxa.

Namely, I cannot filter out taxa that don’t have genus or species identified:

"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Micrococcaceae;__"
"k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Micrococcaceae;g__"
"k__Bacteria;p__Bacteroidetes;__;__;__;__"
"k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Bacillaceae;g__"
"k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Methylobacteriaceae;g__"
"k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Phyllobacteriaceae;g__"
"k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Phyllobacteriaceae;__"

I’ve used the below command:
qiime taxa filter-seqs
–i-sequences rep-seqs-final_filtered.qza
–i-taxonomy /home/kspeer/array1/BrazilMicrobiomeAnalysis/training-feature-classifiers/taxonomy_unfiltered.qza
–p-mode exact
–p-exclude “k__Bacteria; p__Bacteroidetes; __; __; __; __”
–o-filtered-sequences rep-seqs-final_filtered.qza

qiime taxa filter-table
–i-table table-final_filtered.qza
–i-taxonomy /home/kspeer/array1/BrazilMicrobiomeAnalysis/training-feature-classifiers/taxonomy_unfiltered.qza
–p-mode exact
–p-exclude “k__Bacteria; p__Bacteroidetes; __; __; __; __”
–o-filtered-table table-final_filtered.qza

However, when I reclassify rep-seqs-final_filtered.qza and then build the taxa barplot, “k__Bacteria; p__Bacteroidetes; __; __; __; __” is still present in the data.

Is there another way to filter out these less specifically identified taxa that I think are contaminants?
Thanks,
Kelly

Hey @kspeer,

If you were to run qiime metadata tabulate on taxonomy_unfiltered.qza do you still see the offending entry? I suspect it will look like this instead: k__Bacteria;p__Bacteroidetes

If I’m remembering correctly, taxa barplot pads out the taxonomy strings to make rendering simpler. In other words k__Bacteria;p__Bacteroidetes;__;__;__;__ is just a side-effect of trying to make the depth match everything else, it’s still really k__Bacteria;p__Bacteroidetes.

However if you saw k__Bacteria;p__Bacteroidetes;c__;o__;f__;g__ (with the prefixes) then that would mean there’s a un-named Greengenes OTU with that taxonomic resolution.

Also, unrelated to what I think is happening, there’s another issue with your command.
If we compare your search string to the list we see that they are not exactly the same:

k__Bacteria;p__Bacteroidetes;__;__;__;__          # Your list
k__Bacteria; p__Bacteroidetes; __; __; __; __     # Your --p-exclude

In particular it looks like your query has spaces in it which the computer considers as different.

If I’m remembering correctly about the behavior of taxa barplot your --p-exclude should look more like:
--p-exclude 'k__Bacteria;p__Bacteroidetes'.

Hope that helps!

1 Like

Thanks so much @ebolyen! I tabulated the taxonomy_unfiltered and copied the taxon ID directly from the metadata. After I did this, the targeted taxa were removed from the dataset. Thanks so much for pointing that out.
:sunglasses:

As a suggestion, it might be helpful to allow people to filter based on feature ID when using --p-mode exact. These don’t suffer from minor variations in the way they are presented, like presence/absences of spaces. This may already be implemented?

Awesome!

It sure is! That was actually our first form of filtering: qiime feature-table filter-seqs (granted feature-table might not be the first place one would look for this). There's other filtering operations in q2-feature-table as well which are based on IDs.

1 Like

Hello everyone, here is a little script to show how you can use the "p-mode exact" to show the micro-organisms that are resolved:
1.) only at the kingdom level
2.) at the family level

1 Like

I ran into a problem again, and if you have already done this, you probably went through the same steps that I did, and then realized you were maybe going down a rabbit hole : )
I need to have an “p-mode exact” search for the ‘k__Bacteria’, but also have the ‘p-mode contains’ for all the other taxons! I am a little stuck! Do I have to use the “qiime feature-table filter-features” instead and create several tables that I will then merge into one? It really is getting complicated. Is there an easier way, and I am just missing the boat!?

Hey there @MartinLubell!

Hopefully we can help you out!

Hidden away at the end of this section of the Filtering data tutorial is this little hint:

If your filtering query is more complex than those supported through qiime taxa filter-table, you should use qiime feature-table filter-features.

Sounds like you are a perfect candidate for this, since your query is becoming a bit complex now, and feature-table filter-features supports SQL-based queries, which are super powerful!

So, let's rewrite you command to work with this:

qiime feature-table filter-features \
  --i-table table.qza \
  --m-metadata-file taxonomy.qza \
  --p-where "Taxon='k__Bacteria' OR Taxon LIKE 'k__Bacteria; p__Planctomycetes; c__Planctomycetia; o__Pirellulales; f__Pirellulaceae%'" \
  --o-filtered-table filtered-table.qza

(double-check my spelling in that long taxon string - I am sure I typoed somewhere...)

What this SQL clause is saying is: I want all the features that have the exact (=) taxon string of k__Bacteria. I also want (OR), all the features that start with (LIKE; % at the end of the query string which is a wildcard match) k__Bacteria; p__Planctomycetes; c__Planctomycetia; o__Pirellulales; f__Pirellulaceae.

Hope that helps! Let us know how it goes! :t_rex:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.