I really like the term 'cross-indexing' to differentiate issues on the sequencers from environmental 'cross-contamination.'
Given that it's hard to differentiate cross-contamination and true biological similarity, my lab has mostly focused on cross-indexing, as it's a tractable problem.
From your monoculture positive control that you included in the run.
For example, if you have a batch of human samples, you may include a saltwater microbe as a positive control. After the run has finished, you can easily evaluate cross-indexing in two different ways.
- Did my positive control end up in uman samples?
- Did my most abundant Human reads end up in my positive control?
- (Bonus!) Is my ASV denoising algorithm doing a good job resolving my single, known microbe in my positive control?
Based on these observed frequency of cross-indexing error, you can identify any samples that are massively contaminated, and also measure the baseline cross-indexing error.
But how do you remove this cross-indexing error from all other samples? The solution that I think would work best is outlined by @ebolyen here:
Given that filtering is inelegant, maybe it's better to leave in contamination and control for it statistically, as @mortonjt suggested: