I am performing ITS and 16S amplicon sequencing on a set of samples. Within this set I included two negative process controls (just Butterfield buffer). The 16S amplicons from these samples yielded some output (the demultiplexed, interleaved sample files are 680 and 698 kb when I look at the size of the files in winSCP, and looking at the demultiplexing summary, the Yield (mBases) =2.) When I went through the data analysis/processing with the samples, I got a result form them and were able to identify organisms.
The ITS amplicons for these samples are smaller- the demultiplexed, interleaved sample files are 50 and 50 kb when I look at the size of the files in winSCP and the Yield (mBases) =0. When I went processed my samples within the QIIME pipeline, these two samples did not appear in my output files at all. I decreased my PHRED filtering score to 20 to let through lower quality reads, thinking that the negative control samples will have low quality reads. These samples didnt appear in output files after I did that.
So I have a few questions:
- Is there a minimum number of reads/output data in a sample file needed in order to successfully process your samples, and does this change between ASV/DADA versus clustering?
- Is there a way to estimate between file size (as seen on WinSCP) and number of reads?
- More broadly, how do folks use negative controls for their sample analysis? As a lower limit threshold, or as a way to judge 'background noise' and subtract this out of all the other samples (or see if its present in other samples.) Or how else?
It makes sense that these negative control samples dont have any fungi present, but some background bacteria. We would like to set thresholds and have an understanding for this kind of thing pre hoc, not post hoc.