Exclude seqs to attain only 95-97% confidence

Nicholas_Bokulich · August 26, 2018, 11:37pm

Hi @Fabs,
The % similarity you are describing is the % similarity used for clustering OTUs — e.g., to cluster your input sequences at 97% similarity to construct OTUs.

My guess is that you used deblur or dada2 to denoise your sequences — in which case you do not need to perform clustering. Your sequences are effectively 100% OTUs (that have been denoised).

No — that is entirely separate. The reference sequences here do consist of OTUs, but they have been clustered mostly to dereplicate the database, making it easier/faster/less memory intensive to use, while still retaining enough information to, e.g., identify species. You will report something along the lines of:

"Sequences were denoised using the q2-dada2 plugin (citation) with default parameters. ASVs were classified taxonomically using the classify-sklearn method in q2-feature-classifier plugin (citation) for QIIME 2 (citation), using default parameters; the UNITE database (release number) (citation) clustered at X % similarity was used as reference sequences for taxonomy classification."

So to clarify:

OTU clustering is NOT necessary, since you are denoising your sequences instead.

I hope that clarifies!