OTU abundance threshold - 0,005% of what

Hi,

I´m trying to understand how the secondary filtration of OTUs in Qiime is working. The 2013 paper on quality filtering recommends removing OTUs containing less than 0,005% reads: “For datasets where a mock community is not included for calibration, we recommend the conservative threshold of (c = 0.005%).”

My question: Is it 0.005% out of all reads in the whole sequencing run, or 0.005% out of all reads in one of the samples in the sequencing run?

Regards,
Ruben Dyrhovden

Hi @Ruben!
A point of process first: I’ve moved your question to General Discussion, as it seems to be a question about best practices discussed in the literature, rather than a software support issue. If I’m misinterpreting your question, please feel free to clarify and/or move back into the correct category.

I’m not a bioiniformatics expert, and am uncertain which 2013 paper you’re referencing. If you can clarify which paper, the community might be better able to answer your question.

Finally, if this is a software question, we should be clear on whether you’re using QIIME (which is no longer supported), or QIIME 2, which replaced QIIME in early 2018. The two tools work very differently.

Best,
Chris
:sheep:

2 Likes

Hi,

thanks for your reply. I will try to make my question clearer.

The paper I am referring is titled " Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing" by Bokulich et.al.: https://www.nature.com/articles/nmeth.2276

My question is about what is defined as “secondary filtering / OTU abundance threshold ©” in the cited paper.

I understand from reading other posts on the forum that OTU-abundance threshold filtering is no longer recommended if using deblur/dada2 Filter Feature Table .

Although not recommended any longer in QIIME2 I would really appreciate if you could help me understanding the 0,005% OTU abundance filtering approach, as it is used in a lot of microbiome studies. I have not been able to understand clearly if it is 0,005% of all reads in the entire sequencing run, or 0,005% of all reads in one sample.

@Ruben, according to Dr. Bokulich himself, it’s 0.005% of all reads in the entire sequencing run.

Have fun out there!
:poodle:

2 Likes