Filter Controls or Not to Filter Controls

I really like the term 'cross-indexing' to differentiate issues on the sequencers from environmental 'cross-contamination.'

Given that it's hard to differentiate cross-contamination and true biological similarity, my lab has mostly focused on cross-indexing, as it's a tractable problem.

From your monoculture positive control that you included in the run.

For example, if you have a batch of human samples, you may include a saltwater microbe as a positive control. After the run has finished, you can easily evaluate cross-indexing in two different ways.

  1. Did my positive control end up in uman samples?
  2. Did my most abundant Human reads end up in my positive control?
  3. (Bonus!) Is my ASV denoising algorithm doing a good job resolving my single, known microbe in my positive control?

Based on these observed frequency of cross-indexing error, you can identify any samples that are massively contaminated, and also measure the baseline cross-indexing error.


But how do you remove this cross-indexing error from all other samples? The solution that I think would work best is outlined by @ebolyen here:

Given that filtering is inelegant, maybe it's better to leave in contamination and control for it statistically, as @mortonjt suggested:

1 Like