Filtering and Quality Control Criteria in Gut Mucosa Samples


I have mucosa from different intestinal regions and disease states. I inherited this data so while I have “controls” they aren’t “healthy controls”. I’m trying to determine the best way to determine appropriate sample cutoffs and filter out low variants as well as identify outliers.
Approaches I’m taking at the moment:

  • using a cutoff of 10,000 reads
  • ~80% of the composition needs to consist of Bacteroidetes and Firmicutes
    • the problem with this is while the “controls” should have a composition dominated by Bacteroidetes and Firmicutes, in the disease cases this isn’t going to be the case

I would appreciate any insight on this!


Hi @nmshahir,
Thanks for posting! Much of your question is really specific to your dataset so it is difficult to give good, general answers. I will do my best:

Sounds good. You could probably go lower if you wanted to (see here for some help with that).

I would advise against this. This is making an assumption that all “healthy” individuals must conform to the same composition, and as you pointed out would exclude many disease cases.

By default, singletons are removed by dada2 and deblur (I am assumed you are using denoised data). You may want to remove other low variants, particularly before statistical testing (removing ASVs with fewer that 10 sequences is probably a good rule of thumb, but we do not have good guidelines for this since it depends so much on unique characteristics of the data and an investigator’s goals)

Removing outliers is a trickier question — you could use beta diversity / PCoA to visually identify samples that are obvious outliers, but we do not yet have well tested methods for quantitatively identifying outliers in QIIME 2. You should probably discuss with a biostatistician to get support for this.

I hope that helps!

1 Like

Thanks for the insight @Nicholas_Bokulich

We’re still figuring out that cutoff because while shannon diversity evens out, observed richness appears to be depending on the sampling depth. I know this is primarily because the larger your sampling depth is the more unique ASV you’re likely to pick up. We may end up trying out multiple cutoffs for the our alpha diversity testing and use a lower cutoff for the rest of the analyses.

Makes sense.

Yes, this is denoised data and I will take that under consideration! We’re interested more in the changes in the microbiota associated to disease state as opposed to discovery of novel ASVs.


1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.