Tutorial for filtering controls

mortonjt · May 9, 2017, 12:50am

Right. Given what @ebolyen suggested - I'd strongly recommend against straight up subtracting the reads. For one, the reads are unevenly distributed across the samples. In addition, since we are most concerned with the proportion of reads, its not clear how to best account for these confounders.

If we are dealing with typical lab contaminants - the most straightforward approach is to identify them, and just remove the columns associated with that contaminant.

If you think that you are dealing with some sort of bias that is widely distributed across your samples, check out some of the compositional statistics available in skbio and gneiss. Particularly if you have some sort of prior information about your bias.