Confusion on filtering

Hello, In qiime 1 I used the instructions for “Filtering OTUs observed in the control blanks from the experimental samples” contained in the larger topic for “Filtering contaminant or category specific OTUs from OTU tables” and could easily remove the samples contained in my negative controls (only a few and definitely contaminants) from my samples. I keep trying to do this with the “feature-table filter-seqs” options and I just can’t figure out how to get this to work. I also looked at the tutorial for “Evaluating and controlling data quality with q2-quality-control” and since I don’t have reference sequences to use exclude-seqs I’m confused. I know I must be missing something that makes this work, please help.

Good evening,

This is a great question because dealing with contamination is hard. We had a great discussion on the forums about this issue. I think it's a great place to start.

Based on the approaches discusses there, what do you want to try with you data set? Once you have outlined your plan, we can help you find the right plugins to use.

Colin

1 Like

Hi @bmillerlab,
In addition to @colinbrislawn’s advice and questions (thanks @colinbrislawn!), could you please also give us an example of the command(s) that you are attempting to use to achieve this?

You mentioned having trouble with feature-table filter-seqs… it sounds like maybe what you are attempting to use is something like:

feature-table filter-seqs \
    --i-data seqs.qza \
    --i-table table.qza \
    --p-exclude-ids \
    --o-filtered-data filtered-data.qza

That command will filter all sequences found in all samples in the input table. Hence, if you create a feature table containing only negative controls, and filter out all features from that table that are absent from the negative controls, you can use that table to filter your sequences file.

But I would really recommend following @colinbrislawn’s advice first — simply filtering all OTUs present in a negative control is risky business and a bioinformatic tool for predicting contaminants is available (but not yet implemented in QIIME2) as discussed in that link.

Good luck!

Hello, Thanks for the suggestion to look through the discussion. I’m still not sure what to do… but I know lots of other people are confused too.:grinning:

I looked at my negative controls (I ran several) and there are 3 contaminants I think are real because they are in all of them and they are also Pseudomonas types which I’ve read are common contaminants. I would like to remove those from my real samples. I got a couple of other things present in only 1 or 2 of the neg. controls at low numbers that I think are cross-contaminants or maybe from the illumina run, so I’m thinking I should probably leave those alone.

Does that sound OK?

We're with you. :smile: in the immortal words of @colinbrislawn, "dealing with contamination is hard".

Fortunately, it doesn't need to be. In the discussion that Colin linked to, @benjjneb mentions a statistical package that his lab developed for identifying true contaminants vs. cross-contaminants using negative controls. The bad news is that it is currently only available as an R package, so you will need to export your data, filter, then re-import into QIIME2 for downstream steps. The good news is that we plan to implement in QIIME2 soon.

But Pseudomonas species are also present in many other sample types, e.g., plant-associated, environmental, clinical... just something to consider before deciding these must be contaminants.

Seems reasonable, but if they are cross-contaminants you would expect these to be more abundant in other samples.

This is where an programmatic method may help take out the guess work. If you think that contamination might be a major issue in your samples (e.g., you are working with low-biomass samples or have very noisy data and think something went wrong), check out that discussion thread and give that program a try.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.