How to deal with contaminations that might be partly real

Alexandra_Bastkowska · September 20, 2017, 4:54pm

Hi qiime team, we have sequenced biopsy samples with 3 blank control samples to identify contaminations from the reagents. The latter contain some OTUs which we highly assume might be contaminations but also real. What is the common approach to deal with contaminations apart from filtering the OTUs completely?
Thanks,
Alex

thermokarst · September 20, 2017, 8:17pm

Hi @Alexandra_Bastkowska! Could you provide a bit more detail? We aren't quite sure what it is you are looking to do (sorry!). Have you performed the kind of QC you are thinking of before, with other tools? If so, could you let us know which tools, and please summarize the workflow. The only other thing that comes to mind is this recent post on filtering out blanks, but it doesn't seem to quite line up with what you are describing. Thanks so much!

Alexandra_Bastkowska · September 21, 2017, 12:16am

Hi Matthew,

sorry for not being so clear and thank you for getting back so fast!

We have sequenced 16S rDNA genes (V4) obtained from biopsy samples on
Illumina. Apart from the DNA extracted from the biopsy samples, we have
included 3 "blank" samples, which contained the DNA extraction kit, to the
sequencing run. We have used qiime (still the old version 1.9) to join
the reads, QC, picked OTUs and assigned to taxonomy. The sequencing
generated a high number of reads in the blank samples with reads
assigned to well described contaminations (Burkholderia and
Bradyrhizobium are the top 2) and which are included in some of our
biopsy samples in a small proportion. However, in the blank samples are also
Pseudomonas OTUs, which we know are not only
contaminations but real OTUs in the biopsy samples. The Pseudomonas OTUs are clearly more abundant
in our biopsy samples than in the blank microbiome. My question was if
there is another approach apart from removing the OTUs completely from
the table?

Thanks so much in advance!
Alex

thermokarst · September 21, 2017, 12:19am

Hi @Alexandra_Bastkowska! Thanks for the clarification! That is an interesting situation --- I wonder if @gregcaporaso or @Nicholas_Bokulich have any thoughts? The only option that comes to mind mind is completely removing that OTU (feature) entirely, but that certainly doesn't sound like that is something you are interested in doing. Let's wait and see what the others have to say!

Nicholas_Bokulich · September 21, 2017, 12:43am

Hi @Alexandra_Bastkowska I have indeed been thinking about this problem a lot (@thermokarst is evidently a mind reader) . The short answer is no, I am not aware of any current method that can determine what contaminants are exogenous vs. cross-contamination, and an accurate proportion of that contaminant in a given sample.

My advice is to leave in any pseudomonas or other OTUs that appear to be cross-contamination from your biopsy samples, while filtering out others that are obvious contaminants; you've already applied sound logic to determine what's likely exogenous/cross-contamination.

Yet there remains the question: why are only some pseudomonas OTUs detected in the blanks but not other more abundant OTUs from the biopsy samples? (or so I deduce from your description.) If this is correct, it's suspicious. A somewhat more data-driven approach (but may not pan out, which is why I offer the simple advice above first) would be to run these data through deblur or dada2 pipelines in QIIME2 (instead of OTU picking) to yield actual sequence variants instead of OTUs. These would be much more sensitive, and allow you to determine the relative abundances of all sequence variants present in the biopsy vs. blank samples. Are the sequence variants in the blanks present in other samples or only the blanks? If only the blanks (or at very low abundance in biopsy samples while other biopsy sequence variants are absent from the blanks), then these too would appear to be exogenous contaminants. Similarly, you could pass these sequence variant data to source tracker (a sourcetracker2 plugin exists and supports qiime2 data integration) with all blanks and a random subset of biopsy samples as source samples and let the sourcetracker classifier decide. Both approaches (particularly the second) would require lots of mulling to interpret the best solution — but if any OTU/sequence variant appears to be an obvious contaminant, I'd recommend removing it altogether.

gregcaporaso · September 21, 2017, 4:11pm

I agree with @Nicholas_Bokulich's points here. Good luck @Alexandra_Bastkowska!

Alexandra_Bastkowska · September 22, 2017, 12:47am

Hi @Nicholas_Bokulich Thank you for your detailed reply! I will try to install QIIME2 and run the suggested pipelines!

When I typed my post, I also wanted to ask how often you have observed cross-contamination, but I decided to keep my post simple and deleted other questions. I have seen some cross-contamination in my data. The most abundant OTUs were also identified in the blank sequences but with very low read counts so I assumed that this might be cross-contamination. Nothing suspicious in the data
It is a pity that many studies don't publish their blank control microbiome. I noticed that most of them only mention that they have used it and removed everything without publishing the composition or uploading the sequences....

Nicholas_Bokulich · September 22, 2017, 2:10am

Hi @Alexandra_Bastkowska,

I have seen all sorts of levels — it varies by experiment so I can't really comment on how frequently it really occurs but it certainly is a more prevalent problem than many would like to believe. However, it is also probably only problematic with low-biomass samples (as I'm imagining your samples may be, depending on the biopsy site).

Indeed, this would be useful information to have.

Good luck with the next steps! If you are willing to share, please post your findings back here, I will be interested to hear what you find. (and of course please open a new forum thread if you run into any questions/issues with qiime2)