Just to add further to this discussion, I’ve previously used decontam prior to it being incorporated into QIIME 2, and whilst I’m a huge fan of the principles it’s built around, it didn’t work fantastically for my samples. I will caveat that as I did only attempt the prevalence method with 2 negative controls (one kit negative and one sequencing negative) for around 20 samples, which is likely too low a number of controls to have any real statistical power, I’m not sure what the ideal number of negative controls for the prevalence based method would be. I found that when playing around with the threshold parameter, I couldn’t find a happy medium and that I was either removing features that to me appeared to be potentially true or keeping features that were questionable to my eye. Going forward I would be interested to see how the plugin performs based on quantification data, and will hopefully be testing this in the coming weeks on new data.
In these uncertain times, I do like the suggestion of a sort of ‘lite’ version of contaminant removal, whereby you can show you’ve addressed the issue by removing a small group of clear contaminants i.e. those described in the Salter paper where using extraction kits, however despite this being the safest option when addressing potential contaminants, it can also feel careless leaving in the taxa with 10 reads in sample A, which had 50000 reads in the negative control, and is likely a result of Illumina crosstalk, here I feel the only option is a commonsensical approach, although I note this isn’t reproducable, perhaps we are now nearing the stage where a very large round table is to be required to produce published consensus guidelines on how to deal with these anomalies.
So long as we are actively discussing the issues, working toward reproducible solutions and not butchering our data in the meantime, that’s all we can do! And in what seems like quite a progressive time for the field, I take considerable comfort in that.