How to deal with contaminations that might be partly real

Hi @Alexandra_Bastkowska I have indeed been thinking about this problem a lot (@thermokarst is evidently a mind reader) . The short answer is no, I am not aware of any current method that can determine what contaminants are exogenous vs. cross-contamination, and an accurate proportion of that contaminant in a given sample.

My advice is to leave in any pseudomonas or other OTUs that appear to be cross-contamination from your biopsy samples, while filtering out others that are obvious contaminants; you've already applied sound logic to determine what's likely exogenous/cross-contamination.

Yet there remains the question: why are only some pseudomonas OTUs detected in the blanks but not other more abundant OTUs from the biopsy samples? (or so I deduce from your description.) If this is correct, it's suspicious. A somewhat more data-driven approach (but may not pan out, which is why I offer the simple advice above first) would be to run these data through deblur or dada2 pipelines in QIIME2 (instead of OTU picking) to yield actual sequence variants instead of OTUs. These would be much more sensitive, and allow you to determine the relative abundances of all sequence variants present in the biopsy vs. blank samples. Are the sequence variants in the blanks present in other samples or only the blanks? If only the blanks (or at very low abundance in biopsy samples while other biopsy sequence variants are absent from the blanks), then these too would appear to be exogenous contaminants. Similarly, you could pass these sequence variant data to source tracker (a sourcetracker2 plugin exists and supports qiime2 data integration) with all blanks and a random subset of biopsy samples as source samples and let the sourcetracker classifier decide. Both approaches (particularly the second) would require lots of mulling to interpret the best solution — but if any OTU/sequence variant appears to be an obvious contaminant, I'd recommend removing it altogether.

5 Likes