Tutorial for filtering controls

Hi Jia,

sorry for my late reply due to the Christmas break…

How to handle controls in our NGS projects is still a matter of debate in our lab. However, from my point of view I would suggest to do the following (I’ll only tackle data handling and won’t mention wet lab methods to get rid of the “kitome” in this post):

First of all you should process as many controls as possible. I would consider negative controls (NTCs, field blanks etc.), positive controls (DNA of mock communities), and DNA extraction controls as a minimum.

Then I would process the data of my biological samples in parallel with these controls. The next steps is to check the composition (alpha and beta diversity) of your controls and how they relate to your biological samples.

  • If your controls are very dissimilar from your biological samples then you could use them as a baseline or control in the frame of your whole analysis. You could use tools like LEfSe, MaAsLin, ancom, gneiss etc. to investigate the composition of your controls and maybe define a “kitome” for your study. Your positive controls can be processed with q2-quality-control to understand the quality of your sequencing data. Finally if there is no overlap between your biological samples and your controls you do not have to filter them from your data, but describe them in your study.

  • If your controls are similar to your biological samples the hard work starts. We often work in low biomass environments and therefore have to estimate if a certain ASV (amplicon sequence variant) present in both sample types (biological samples and controls) makes sense for a microbial ecologist. Usually I use two main methods at the moment to filter controls. First the tool decontam and then subtraction of normalized ASV tables (for instance if we work with skin samples and see an ASV assigned to Staphylococcus aureus or any other typical skin bacteria). Then I compare my filtered data analysis with the original data. Dependent on the results I usually include both analysis in a manuscript. Sometimes it makes sense to show your controls in relation to your biological samples and sometimes it is better to show data that is based on a filtered dataset.

It is just really important that the reader comprehends what you did and why you did a filtering of your data in a paper.

You could also check out the three publications below:

https://www.nature.com/articles/s41564-018-0202-y

https://msystems.asm.org/content/3/3/e00218-17?utm_source=TrendMDmSystems&utm_medium=TrendMDmSystems&utm_campaign=trendmdalljournals_0

Hope I could help and I’m looking forward to any comments!

Cheers, Mechah

11 Likes