Given that falsely removing true biological features may have a huge impact on the analysis, I generally prefer to do it manually following the principles outlined in the DECONTAM paper.
Here’s what I do for my projects:
Filter features present in my negative controls, which gives candidate features for screening contaminants.
Make prevalence and frequency plots. The former plots show the relative abundance of each feature in the biological and control samples, including negative and positive controls. The latter plots show the correlation between sample DNA concentration and feature relative abundance: a negative correlation suggests contamination.
Screen contaminating features manually one by one based on the plots generated in step 2.
This process is slow but assuring. Usually, it takes me 1-2 days to finish the work and I don’t need to come back again.
Below is an example showing the prevalence and frequency plot of a contaminating ASV classified as Pseudomonas. It’s a contaminating feature commonly found in our lab.
Prevalence plot. Note that the relative abundance of this contaminating ASV is much higher in the negative control samples and increases in diluted mock samples (far right; original concentration, 1:16, 1:32).
Frequency plot. I used raw Cq values as proxies for sample DNA concentrations. Therefore, you see a positive correlation.
Besides negative controls, mock is also quite useful when it comes to contaminant screening as you know what should and should not be there. If you have serially diluted mock samples, it’s even easier.
If you’re interested, you can look at the code (code/03_filtering.Rmd) on my GitHub repo that generates the plots.