Comparing 16S and shotgun metagenomic data with greengenes2

wasade · January 8, 2026, 4:14pm

Great! Answers below:

Since they’re V4, I would rely on the existing placements which is what filter-features allows. This is only getting you taxonomy, not phylogenetic position, and Naive Bayes is less informative at higher levels of specificity than the existing placements. I would anticipate the majority of your read mass to be retained as the existing placements scope ~300k diverse V4 samples. The data model currently assumes you’re operating off of 515F, and at defined lengths; largest number of fragments are from the 90, 100, 125 and 150nt lengths if I recall.

If you are loosing too much overall read mass, you could place the fragments with SEPP. For taxonomy, you would likely need a separate step though like Naive Bayes.

A prebuilt WoLr2 database can be found here. Our typical processing is to use the SHOGUN parameter set with bowtie2. Alternatively, you could upload your data to Qiita and do the Woltka processing there. Briefly, the process is: align to the database, use the woltka classify on the resulting alignment data to produce a feature table. @qiyunzhu may be able to comment about q2-woltka, I don’t have any experience with it.

Yes, please use the 2024.09 version. The backbone is the same as 2022.10 but there are a larger number of fragments placed and a revised taxonomy.

Best,

Daniel