What should I do with the fastq files of negative controls?

Jee-Woong_Choi · January 31, 2023, 12:28pm

I am using QIIME2 and R packages for microbiome analysis.
I think QIIME2 is the best tool, especially for researchers who are about to begin the microbiome analysis.

My question is about the decontamination process.

I have the fastq files of 16s microbiome sequences from skin tissue samples. The sequence files (fastq) of negative controls (swab samples of transport tubes and containers) were also obtained because of low biomass of skin samples. After analysing the fastq files of negative controls using QIIME2 (DADA2), I found 17 taxa in otu_table (phyloseq object in R).

What should I do next? Should I simply subtract those taxa (I think this is not a good idea according prior talks in this forum)? I know a useful R package called "Decontam", but QIIME2 output files for R were not possible to use even after conversion using the phyloseq package. The files didn't have a column called "quant_reading", which I think, is a key sample variable in the "decontam R package, especially when frequency based decontamination process is planned (Introduction to decontam).

I would not be able to analyze all the fastq files using QIIME2 and R packages without the help of great biomedical researchers, here. I really appreciate your help.

Would you please tell me the best way to deal with these sequencing files (fastq) of negative controls in order to make reproducible and reliable results? If you find some contaminated taxa by using decontam in R, do you just remove those taxa from the otu_table of true samples?

Thank you so much!

benjjneb · February 3, 2023, 3:05pm

Should I simply subtract those taxa (I think this is not a good idea according prior talks in this forum)?

This is dangerous because "cross-contamination" is also something that can occur, which will cause the most abundance taxa in the real samples to also appear in negative controls.

The files didn't have a column called "quant_reading", which I think, is a key sample variable in the "decontam R package, especially when frequency based decontamination process is planned

You don't need a specific variable called "quant_reading", you just need your measurement of DNA concentration from each sample. This is described in more detail in the decontam paper.

(1) quantitative DNA concentrations for each sample, often obtained during amplicon or shotgun sequencing library preparation in the form of a standardized fluorescence intensity (e.g., PicoGreen)

But that isn't needed at all if you are using negative controls as part of "prevalence" based contaminant identification with decontam. However, for the decontam prevalence approach to be useful you will need to have sequences more than just 1 or even 2 negative controls.

Beyond that, a variety of ad hoc approaches are used in the field, such as removing taxa present at "high" abundances in the negative controls, with the specific definition of "high" varying with the investigator.

Jee-Woong_Choi · February 4, 2023, 2:30am

Thank you very much for your kind reply.

I did the "prevalence" based approach as described in the protocol, and it worked perfectly.

I have one more question. Which one is better between two approaches?

I read your paper that frequency-based contaminant identification is not recommended for extremely low-biomass sample. Are there any other conditions that I have to consider before using the"prevalence" based approach?

Thank you!

benjjneb · February 6, 2023, 11:12pm

I think both are complementary (as seen in the paper) and should be used when then necessary quantitative DNA concentrations or negative controls are available.

You need more than 1 negative control, and preferably several, for the prevalance method to be useful. Contamination (and cross-contamination) is stochastic enough that multiple negative controls are needed. We recommend 5 (see manuscript) per 96-well plate, and more for very low-biomass samples.

The best negative controls are "full-process" negative controls that go through as much of the sampling and measurement process as possible. For example, negative controls that start from the same sampling instrument you are using for the real samples.

Jee-Woong_Choi · February 7, 2023, 12:26am

Thank you for the reply!

I have the sequencing data from three negative controls that went through exactly same extraction and sequencing process as tissue samples. I used those sequencing data, and hope that reliable results were obtained by using the decontam package.

Thank you for making this wonderful R package!