I was trying to filter some of the contaminant sequences present in the PCR-blanks from the feature table. This is what I did:
#First, make a feature talbe containing sequences present in PCR-blanks (library-blanks)
qiime feature-table filter-samples
#Second step: retain sequences considered as contaminants after a careful examination of sequences detected in the PCR-blanks
qiime taxa filter-table
The above step, however, didn’t work. An error message popped up, saying:
“Plugin error from taxa: Metadata is empty, there must be at least one ID associated with it.”
So, what’s the problem here?
The following step was what I intended to do, had the above steps worked. Is it correct? #Final step: filter the contaminant sequences from the feature table.
qiime feature-table filter-features
Hi @yanxianl! Thanks so much for your input files, that was really helpful!
Did you happen to run feature-table summarize on blank-library-sequences.qza? If you did, you would see that that table has zero samples and zero features! The reason is because your feature-table filter-samples command is filtering out all of your samples! If you revise the --p-where parameter to: "SampleType='Blank-library'", then you will wind up with three samples in blank-library-sequences.qza!
The moral of the story is that the SQL where clauses are case-sensitive — in your metadata.tsv file you have the values listed as Blank-library, not blank-library.
Reading on to your later commands — the feature-table filter-features in the last step will not work, because you cannot use a feature table as metadata. Please check out this forum post for details on how to create a feature metadata file from your feature table that you can use here. There is an open issue to streamline this step so that you don’t have to do this extra work presented in the post, but for now that is the best option.
You're right! The blank-library-sequences.qza is empty! I've done what you suggested and it's working now.
There's one more problem, though.
As discussed in another thread, filtering all the sequences in the negative controls is not a good idea as there might be cross contamination from biological samples. As such, it's best to carefully check the sequences present in the negative controls before we proceed to filtering.
Initially, I decided to filter the following 4 taxa based on their prevalence in the negative controls, mock and biological samples:
Yet, when checking the blank-library-contaminant-sequences.qzv (333.2 KB)
, I found 68 features assigned to these 4 obvious contaminant taxa. In particular, some features assigned as Halomonadaceae were found in samples but not in the negative controls, indicating that I've filtered real features from the samples.
Therefore, filtering contaminant sequences using feature ID is probably a better way. I tried to fetch a feature table file (csv or tsv) displaying the count of features in different samples, like the taxonomy table (DADA2-level-7.txt (155.2 KB)) that can be downloaded fom the visualization of taxa-bar-plots. However, the 'Frequency per feature detail.csv' file only gives the total count of each feature. How can I get the distribution of features across different samples?
The link there details a workaround for creating the feature metadata file necessary for filtering.
I don’t quite follow where or why you are trying to do this, but, at this point the only way you could do that is by filtering your feature table down to one sample, once for each sample in your study (n samples => n 1-sample feature tables). Then, the summarize command will give you the distribution of features in that one sample.
I’ve read your earlier post and learnt how to filter the rep-seq.qza based on feature table.
The reason I want to know the count of features in different the samples is that I want to identify the contaminant sequences and filter them from the feature table. Previously, I did that based on the taxonomic composition at species level but I found it inappropriate. Filtering the exact features identified as contaminant sequences is the correct way. I think it can be done by exporting the feature table as biom file and then convert it to text file.