Filtering part of the taxa present in PCR-blanks from the FeatureTable


I was trying to filter some of the contaminant sequences present in the PCR-blanks from the feature table. This is what I did:

#First, make a feature talbe containing sequences present in PCR-blanks (library-blanks)
qiime feature-table filter-samples
–i-table table.qza
–m-metadata-file metadata.tsv
–p-where “SampleType=‘blank-library’”
–o-filtered-table blank-library-sequences.qza

#Second step: retain sequences considered as contaminants after a careful examination of sequences detected in the PCR-blanks
qiime taxa filter-table
–i-table blank-library-sequences.qza
–i-taxonomy taxonomy.qza
–p-include veronii,Halomonadaceae,Shewanellaceae,Ralstonia
–o-filtered-table blank-library-contaminant-sequences.qza

The above step, however, didn’t work. An error message popped up, saying:
“Plugin error from taxa: Metadata is empty, there must be at least one ID associated with it.”

So, what’s the problem here?

The following step was what I intended to do, had the above steps worked. Is it correct?
#Final step: filter the contaminant sequences from the feature table.
qiime feature-table filter-features
–i-table table.qza
–m-metadata-file blank-library-contaminant-sequences.qza
–o-filtered-table table-no-chlo-mito-lowPre-conta-with-Phyla.qza

Hi @yanxianl!

This error message seems to indicate that your file taxonomy.qza is empty — if you run the following command and then view the resulting visualization, what do you see?

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

Hello @thermokarst!

Here are the input files for the commands I posted:
metadata.tsv (14.8 KB)
blank-library-sequences.qza (16.6 KB)
taxonomy.qza (253.1 KB)
table.qza (304.9 KB)

The taxonomy.qzv file looks fine.
taxonomy.qzv (1.6 MB)

Hi @yanxianl! Thanks so much for your input files, that was really helpful!

Did you happen to run feature-table summarize on blank-library-sequences.qza? If you did, you would see that that table has zero samples and zero features! The reason is because your feature-table filter-samples command is filtering out all of your samples! If you revise the --p-where parameter to: "SampleType='Blank-library'", then you will wind up with three samples in blank-library-sequences.qza!

The moral of the story is that the SQL where clauses are case-sensitive — in your metadata.tsv file you have the values listed as Blank-library, not blank-library.

Reading on to your later commands — the feature-table filter-features in the last step will not work, because you cannot use a feature table as metadata. Please check out this forum post for details on how to create a feature metadata file from your feature table that you can use here. There is an open issue to streamline this step so that you don’t have to do this extra work presented in the post, but for now that is the best option.

Hope that helps! :t_rex:

Hi @thermokarst,

You're right! The blank-library-sequences.qza is empty! I've done what you suggested and it's working now.

There's one more problem, though.

As discussed in another thread, filtering all the sequences in the negative controls is not a good idea as there might be cross contamination from biological samples. As such, it's best to carefully check the sequences present in the negative controls before we proceed to filtering.

Initially, I decided to filter the following 4 taxa based on their prevalence in the negative controls, mock and biological samples:

Yet, when checking the blank-library-contaminant-sequences.qzv (333.2 KB)
, I found 68 features assigned to these 4 obvious contaminant taxa. In particular, some features assigned as Halomonadaceae were found in samples but not in the negative controls, indicating that I've filtered real features from the samples.

Therefore, filtering contaminant sequences using feature ID is probably a better way. I tried to fetch a feature table file (csv or tsv) displaying the count of features in different samples, like the taxonomy table (DADA2-level-7.txt (155.2 KB)) that can be downloaded fom the visualization of taxa-bar-plots. However, the 'Frequency per feature detail.csv' file only gives the total count of each feature. How can I get the distribution of features across different samples?

1 Like

Hi @yanxianl!

Have you had a chance to review my earlier post?

The link there details a workaround for creating the feature metadata file necessary for filtering.

I don’t quite follow where or why you are trying to do this, but, at this point the only way you could do that is by filtering your feature table down to one sample, once for each sample in your study (n samples => n 1-sample feature tables). Then, the summarize command will give you the distribution of features in that one sample.

Hi @thermokarst,

I’ve read your earlier post and learnt how to filter the rep-seq.qza based on feature table.

The reason I want to know the count of features in different the samples is that I want to identify the contaminant sequences and filter them from the feature table. Previously, I did that based on the taxonomic composition at species level but I found it inappropriate. Filtering the exact features identified as contaminant sequences is the correct way. I think it can be done by exporting the feature table as biom file and then convert it to text file.

Thank you for your help!

That is one way to do it, but the post I linked to above provides the same results, entirely within QIIME 2. Either way, you should be set!

QIIME 2 2017.12 is now out and includes the option to now use a feature table’s feature IDs to filter with as part of filter-seqs! This makes this filtering process much simpler!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.