How to deal with possible cross-contamination by a crucial bacteria

joaomiranda · January 4, 2026, 4:17pm

Hi everyone, happy new year!

I'm dealing with a difficult issue, so I need your help to proceed on the right path.

I'm using qiime2-amplicon-2025.10, I have a 16S dataset of 2 runs with 175 samples.

I'm analyzing microbiome data of mosquitoes, and one of the goals of my survey is comparing microbiome of mosquitoes infected or uninfected with one particular bacteria: Wolbachia (d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Anaplasmataceae;g__Wolbachia).
For that purpose I have field samples from mosquitoes with different species and stages of development, and laboratory samples of mosquitoes, including samples from colonies that we know are not infected by Wolbachia.

After the filtering steps I've plotted a taxa barplot from my dataset before and after the decontamination step, in which I've used decontam

Before: table_no_mito_chloro_no_euk.qza (457.7 KB)table_no_mito_chloro_no_euk_summary.qzv (802.5 KB)
taxa-barplot.qzv (2.5 MB)

After:
decontaminated-table-no-controls.qza (519.2 KB)
decontaminated-table-no-controls.qzv (867.4 KB)
taxa-barplot-controls.qzv (3.3 MB)

For my surprise, the wolbachia is present in my known uninfected samples (AEG-LAB samples). So I suspected that maybe a cross-contamination might have ocurred, once my control samples (Cneg-EXT-2, Cneg-EXT, POOL-BLANK, PCR-C-Neg) where contaminated with wolbachia too.

Since decontam was not able to recognize wolbachia as a contaminant I've checked my taxonomy file and observed that I have 6 features labeled as wolbachia. Therefore, my strategy was to check in my AEG-LAB samples which wolbachia feature was present, and if this same feature was present in the controls. For this purpose, I've generated .tsv files and searched for all of the 6 wolbachia features in these specific samples. I've found that the feature "a1041c364108fd12b8ad83ff7d1e1ef7" was present in my controls and also in my AEG-LAB samples.
taxonomy_decontaminated.qzv (1.8 MB)
feature-table-no-controls.tsv (1.0 MB)
feature-table-with-controls.tsv (1.8 MB)

I decided to remove the feature from my dataset, once could be a contamination and the other 5 features labeled as wolbachia could still be present, and maybe doesn't affect my taxa distribution.

I've removed "a1041c364108fd12b8ad83ff7d1e1ef7" from my feature-table and generated a new taxa barplot.

echo -e "FeatureID\na1041c364108fd12b8ad83ff7d1e1ef7" > feature_to_remove.tsv

qiime feature-table filter-features \
  --i-table decontamination/decontaminated-table-no-controls.qza \
  --m-metadata-file feature_to_remove.tsv \
  --p-exclude-ids \
  --o-filtered-table decontamination/decontaminated-table-no-a1041c.qza

qiime taxa barplot \
  --i-table decontamination/decontaminated-table-no-a1041c.qza \
  --i-taxonomy taxonomy_decontaminated.qza \
  --m-metadata-file filtered_metadata_merged.tsv \
  --o-visualization taxa-barplots-no-a1041c.qzv

Unfortunately, this step removed the majority of wolbachia from my samples:
taxa-barplots-no-a1041c.qzv (2.5 MB)

What is the best way to handle my data in this situation?
What are the possible causes of Wolbachia appearing in samples that do not contain this bacteria?

Nicholas_Bokulich · January 8, 2026, 11:21am

Hi @joaomiranda ,

I have recategorized this topic as a General Discussion . This sounds like a problem upstream, you have already gone through the rounds of potential computational solutions to the problem.

This sounds like an issue either pre-sequencing, e.g., due to cross-contamination. Detection in control samples (negative controls I assume) imply cross-contamination (which decontam cannot really handle) as opposed to reagent contamination (which decontam should be able to help fix)

Index hopping is another possibility but this, too, is something that is best fixed upstream with experimental design, there is not really a clean computational solution once the problem is there. In my lab we use a dual-indexing technique to try to detect and remove index hopped reads (and you can use fully unique dual indices to allow detecting and removing all index hops, or combinatorial which will significantly reduce costs but not fully solve the index hopping problem but will greatly reduce it by still allowing detection and filtering of unexpected index pairs). Our protocol is here if you are interested:
https://doi.org/10.1101/2024.10.10.617643

So I hate to be the bearer of bad news, but you might need to resequence or even re-extract if you think the contamination happened during DNA extraction (this is a common issue, especially with plate extractions). There are lots of other posts on the forum about handling contamination in negative controls, so you could consult these for ideas, but depending on the level of contamination and as you are running a quite sensitive experimental design (i.e., you care about presence/absence of specific ASVs) these may or may not be solutions.

Good luck!

system · February 8, 2026, 5:22pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.