Core metrics sugesttions

EMI012 · June 11, 2026, 5:59am

Hello everyone,

I am working on 16S rRNA Nanopore sequencing data from gastric biopsies (24 samples, H. pylori positive patients), with reads of ~450 bp median length.

After quality control and dereplication using q2-vsearch, I obtained ~26,000 unique features from ~26,000 reads, likely reflecting Nanopore sequencing error rates. I then performed taxonomic classification using SILVA 138.2 and collapsed the data at the genus level for downstream analyses.

For diversity analyses, I used non-phylogenetic metrics (Shannon, Pielou, Bray-Curtis, Jaccard) and differential abundance testing (ANCOM-BC). I initially attempted to build a phylogenetic tree (FastTree), but it became computationally infeasible due to the large number of unique sequences.

Given these constraints, I decided to proceed without phylogenetic tree construction, and rely only on non-phylogenetic metrics at the genus level.

My question is: Is this approach methodologically sound for Nanopore 16S data, and acceptable for microbiome analysis without using phylogenetic-based metrics like UniFrac?

Thank you for your advice!

timanix · June 11, 2026, 6:05am

Hello!

Yes, your pipeline makes sense to me.
Just a small note - for diversity metrics, I would collapse to the species level, since your goal in that particular case is not precise taxonomic annotation but grouping unique features into shared clusters. For the barplots and Ancom-BC2, in your case, I would also use genus-level annotations.

Nicholas_Bokulich · June 11, 2026, 11:57am

Hi @EMI012 ,
Just to add to @timanix 's advice: you could also consider clustering sequences into OTUs, e.g., at 99%. This would have the same net effect as collapsing by taxonomy, collapsing your sequences into clusters to smooth over the issues with sequencing errors.

This is absolutely sound and a common procedure (regardless of sequencing error issues). To get insight into genetic similarity of the communities but skip tree building, you could try the k-mer-based metrics in the q2-boots/q2-kmerizer plugins. These will still be sensitive to sequencing errors (slightly less so than ASV-based diversity analyses, but still spurious kmers could have an impact), so I suggest OTU clustering before trying these methods.