Probably another question that has no true answer but is something that I am finding very interesting to think about.
Are there common gold standard rules for what sequence counts to filter out (i.e. if it is below 25, remove it) and what library sizes to keep (i.e. below 10,000 we don’t want to look at that sample)?
I imagine for the library size threshold it is like dependent on sample characteristics and whether a sample with 5000 sequences can capture the same levels of information as a sample with 50,000 sequences.
But for the individual sequence counts, that is where things get muddier in my opinion - because aren’t we more so making decisions on a value that we deem appropriate? If you are filtering out everything below 25 reads, are the ASVs with 26 and 27 reads really that much better to keep? And even for singletons, we often remove singletons during denoising (to my understanding)…but keep variants with counts of 2 (unless you do other sequence count filtering) - does that extra read count for the variants with counts of 2 instead of 1 really make it that much better?
Just interested to hear what other people think!