I drop sample with less than 1000-2500 sequences per sample in most of my sample types, unless there's a clear cut off in my data (a jump of 1000 sequences with a low number of samples.) Then, I rarefy to this depth and work form there. In my experience/opinion, deeper sequencing doesn't buy you much more than noise in most datasets with more than 10K reads.
In terms of keep vs re-run: it depends. I think if they're really special samples and you have to re-run, you should re-run everything to make sure it's comparable. And maybe consider a re-extraction?
But, I also just try to build extra space into my studies. On fecal samples, I like to assume a 5-10% failure rate (10% on smaller studies because those invariably have more issues); maybe 10-15% in oral, and 30-50% in skin/vaginal, but there are bacvk of the hand "failure" rates.
You may alos just find some samples are low biomass.
...All said, I'm sure @colinbrislawn and several others have excellent opinions and diversity is once again a strength in the microbiome. (Except that vaginal microbiome.)
There are the times I have re-run some samples and got some improvement, like there were 3 taxa in first run and ~30 taxa in the second run.
And there are the times I dropped the sample because the sample which has low read count was in a group consists of like 6 samples, so there I had other good samples to continue with.
@colinbrislawn, are the samples of A and B from different runs? If not, why there is such a drop of read counts?
Thank you @jwdebelius, @colinbrislawn, I got the answer for what to do when this happens. But I would like to point the real problem even if it is a problem with the platform.
I believe this is about the platform because when I just re-run without changing anything or re-extraction, read count can jump from 2k to 100k. Plus, phred scores are getting worse and worse with each run.
I'm poking it this much because I don't have the platform. If the reason is the platform, I would think that they are not taking a good care of it, and plan to change where I send the cartridge
You can have a wide range on a sequencing run. Keep in mind that there are multiple stochastic processes in 16s rRNA sequencing that affect read depth. Extraction efficiency is one, but you also have PCR efficiency and flow cell adherence efficiency. These can all result in varying read depths for multiple samples in the same run. Re-extraction may save some of the samples, but some just have lower counts.
I also want to mention that to me, 10K is not "low". I'm less comfortable around 1K (although hats my absolute minimum) and more around 5K, but in most of the study sizes Im working with, increasing read depth to 10K or 100K just increases the number of taxa I can't test. It is a little bit weird to me that @colinbrislawn has such big gaps by treatment (unless these are spoofed?) but tthat may be a secondary sample issue! If you're looking for novelty, this might be a solution for you, but from a "I want to do statistical analysis perspective", deeper is not always better
This is fake data I made up as an example! Fully spoofed!
(Is it still called a 'batch effect' if the batches are fake? )
I also agree that 10K reads per sample is very good. For comparison, the Earth Microbiome Project published in Nature while rarefying to only 5k reads per sample. https://www.nature.com/articles/nature24621#Sec3
You can also use objective methods to determine a minimum acceptable read depth. The alpha and beta rarefaction methods allow you to determine the effects of rarefaction on alpha and beta diversity, which can be used as guides for your analysis. Minimum acceptable depth will depend on the diversity present in a sample, so (compared to using a rule-of-thumb approach) these methods can allow you to select acceptably lower read depths in low diversity samples and prevent you from going too low in high diversity samples.