What are optimal frequency per sample and frequency per feature distributions?
25% of my samples have 0 features, what does that mean? What about a sample with a very large number of features? What is a optimal frequency per fature?
This is for an 18S metabarcoding project.
This is a good question partly because there is no one right answer.
If 25% of your samples have 0 features, it means that no features (i.e., OTUs or ASVs / sequence variants) were detected in these samples using your current processing and filtering criteria. This is probably due to low sequencing depth. The Illumina platform often has very unequal reads per sample, though there are ways to address this.
It's also possible that some samples have few reads because they have low biomass or a small number of 18S genes in them. What would you expect to see in an empty sample like a technical blank or a sterile swab?
If you have plenty of reads in each sample and some samples end up with zero counts inside of them, something else has gone wrong in your pipeline. If THAT'S the issue, let us know!
ah ha! It's certainly low sequencing depth. I learn a 150 cycle kit was used on a 600bp amplicon.