What is the threshold, above that I select from the sequence count for diversity analysis in Qiime 2?

Dear All researchers,

The sequence count in various samples is very, each sample is replicate, there are five replicates, The sequence count is like 100, 200, 600, 1000, 1100, 1200, 2200,3500, 4000, 5500, 6000, 7000, 8900, 9000, 10000, 10500, What would be the sampling depth for alpha and beta core metrics phylogenetic analysis? I have three replicates 100, 200, 600 falls below 1000, For example, 100 and 200 are the two replicates of a sample, similarly, 600 is a replicate of another sample. If I use a sampling depth of 1000, of course, I will miss all the 100, 200 and 600, In this way, I miss two replicates 100 and 200 of one sample and one replicate of another sample. Would it affect my diversity? What is your suggestion? I am very confused, If I use 200 or 100 as a threshold, I could not get a good rarefaction curves, My samples are from the a land which is watered by rainfall.
Waiting for your suggestions


My personal rule of thumb for modern microbiome experiments is to always exclude sequences with fewer than 1000 sequences/sample. You lose samples with lower sequencing depth, but you tend to improve the quality of your microbiome samples and analysis. Your diversity measurements at deeper sequencing depths will better character the full community, allowing for better characterisation of differences based on less common organisms.

I typically expect to lose between 5 and 10% of my samples to sequencing failures, even in high biomass communities. (In lower biomass communities, I often expect a higher failure rate.) As long as the failures are stochastic, rather than clearly systematic, Im less concerned.
However, this loss becomes more pronounced as your sample size is smaller. My suggestion here is to try to plan your experiments with about 5-10% more samples than you plan to analyse.


It means I have to exclude 100, 200 and 600 samples from the diversity analysis. It will not affect the diversity at all.There are five replicates, If I discard two, will not the mean of the rest three affected?

There are three issues, I think, in your question.

  1. Your diversity is a function of your rarefaction depth. So, if you rarify to 1000 seqs/sample, you will have a different absolute value of diversity for each sample than you would at 10,000 seqs/sample. (As illustrated by the rarefaction curve).

  2. The second issue is that diversity is calculated on a per-sample or per-pair basis. So, if you have 3 samples and you exclude sample A, it will not affect the alpha diversity in sample B or sample C, nor will it affect the distance between Sample B and C.
    However, it may affect the observed PCoA, since PCoA is entirely dependent on the samples included in the sample set. Including or excluding outliers can shift the axis and observed data.

  3. Your mean, is, of course, a function of the samples you’re using. However, if you have a robust estimate, your mean should be less affected by the exclusion of the samples. My concern is that your mean will not be robust if you chose too low of a rarefaction depth because your estimates will be less stable.) I think the place you’ll see the larger issue is with your standard deviation and confidence intervals.

My best recommendation remains to treat the samples as lost, and leave them out of all further analyses.


Thank You for your suggestion. You have solved the problem of rarefaction depth.


