What should I do with the fastq files of negative controls?

benjjneb · February 3, 2023, 3:05pm

Should I simply subtract those taxa (I think this is not a good idea according prior talks in this forum)?

This is dangerous because "cross-contamination" is also something that can occur, which will cause the most abundance taxa in the real samples to also appear in negative controls.

The files didn't have a column called "quant_reading", which I think, is a key sample variable in the "decontam R package, especially when frequency based decontamination process is planned

You don't need a specific variable called "quant_reading", you just need your measurement of DNA concentration from each sample. This is described in more detail in the decontam paper.

(1) quantitative DNA concentrations for each sample, often obtained during amplicon or shotgun sequencing library preparation in the form of a standardized fluorescence intensity (e.g., PicoGreen)

But that isn't needed at all if you are using negative controls as part of "prevalence" based contaminant identification with decontam. However, for the decontam prevalence approach to be useful you will need to have sequences more than just 1 or even 2 negative controls.

Beyond that, a variety of ad hoc approaches are used in the field, such as removing taxa present at "high" abundances in the negative controls, with the specific definition of "high" varying with the investigator.