Comparing 16S and shotgun metagenomic data with greengenes2

Bark9299 · January 8, 2026, 4:49am

Hello,

I have both 16S rRNA V4 and shotgun metagenomic data from the same samples and I want to broadly compare overall patterns in alpha and beta diversity by treatment, and in seeing whether 16S and shotgun data recover similar higher level taxonomic patterns. I know this can be done as other people have published papers comparing the results bewteen the two sequencing techniques and I realized that utilizing the same database would be the best way to compare these two different sequencing methods. I came across the greengenes2 paper which does this and shows that these biases can be overcome through using GG2. Now, I want to implement using GG2 on both data sets and am trying to follow the code from this forum post. Is this the correct way to go about my goal? If so, I am somewhat confused on a few things about the code:

For the 16S V4 processing part, I used DADA2 then ran:
qiime feature-classifier classify-sklearn \ --i-classifier 2024.09.backbone.full-length.nb.sklearn-1.4.2.qza \ --i-reads rep-seqs-dada2-245.qza \ --o-classification taxonomy-245-gg.qza

The linked tutorial says to run qiime greengenes2 filter-features and qiime greengenes2 taxonomy-from-tablebut is my above command doing the same thing? I wanted to process my data through DADA2 and am unsure if it will achieve my end goal to compare shotgun and 16S. If this is not correct, do I just use the table.qza DADA2 created into the qiime greengenes2 filter-features and qiime greengenes2 taxonomy-from-table commands?

Following the linked GG2 tutorial "If you have shotgun metagenomic data" it says to process the short read data using the Woltka toolkit against WOLr2. I'm getting confused here because my shotgun metagenomic data is formated as fastq files where each sample has a forward and reverse fastq (already has gone through adapter and host removal). I'm not sure here if I'm supposed to use the q2-woltka plugin or first use bowtie2 and align the reads to the WOL database then import into qiime? Im having a hard time finding the correct WOLr2 to use to make the bowtie2 index (bowtie2-build).
Is there a newer version of the 2022.10.taxonomy.asv.nwk.qza that I will use in the qiime greengenes2 filter-features and taxonomy-from-table commands?

wasade · January 8, 2026, 4:14pm

Hi @Bark9299,

Great! Answers below:

Since they’re V4, I would rely on the existing placements which is what filter-features allows. This is only getting you taxonomy, not phylogenetic position, and Naive Bayes is less informative at higher levels of specificity than the existing placements. I would anticipate the majority of your read mass to be retained as the existing placements scope ~300k diverse V4 samples. The data model currently assumes you’re operating off of 515F, and at defined lengths; largest number of fragments are from the 90, 100, 125 and 150nt lengths if I recall.

If you are loosing too much overall read mass, you could place the fragments with SEPP. For taxonomy, you would likely need a separate step though like Naive Bayes.

A prebuilt WoLr2 database can be found here. Our typical processing is to use the SHOGUN parameter set with bowtie2. Alternatively, you could upload your data to Qiita and do the Woltka processing there. Briefly, the process is: align to the database, use the woltka classify on the resulting alignment data to produce a feature table. @qiyunzhu may be able to comment about q2-woltka, I don’t have any experience with it.

Yes, please use the 2024.09 version. The backbone is the same as 2022.10 but there are a larger number of fragments placed and a revised taxonomy.

Best,

Daniel

Bark9299 · January 12, 2026, 5:53pm

Hi @wasade,

Thank you for your detailed response. I have a few more questions now that I am moving forward with your suggestions:

For my 16S V4 data, I attempted to use the filter-features command using 2024.09.taxonomy.asv.nwk.qza, however the resulting table was empty. This is probably because my sequences are 245nt fragments. That being said, is my original code above using classify-sklearn and the 2024.09.backbone.full-length.nb.sklearn-1.4.2.qza doing the same thing? Is the resulting taxonomy file going to be comparable to my shotgun metagenomic data when i use qiime greengenes2 filter-features and qiime greengenes2 taxonomy-from-table?
I posted in the SHOGUN issues page but have gotten no response so I was hoping since you use the SHOGUN pipeline you could help me with the issue I'm having. I have downloaded the WoLr2 bowtie2 indexed database from your link into a /bowtie2 directory and realized that I need a metadata.yaml file in the same directory. I am confused on how to format this file if I am only using the aligning step of SHOGUN. I thought this was correct:

general:
taxonomy: rep82.tax
fasta: rep82.fna
shear: sheared_bayes.txt
filter: filter/humanD252
function: function/ko
burst: burst/rep82
bowtie2: WoLr2
utree: utree/rep82.gg

But I am getting an error: FileNotFoundError: [Errno 2] No such file or directory: '/lfs/bark9299.ui/Sweden_shotgun_metagenomics/20240209_DNASeq_PE150/CLEAN_READS/clean_reads/output/bowtie2/rep82.tax' when I run

shogun align -a bowtie2 -i combined_seqs.fna -d /lfs/bark9299.ui/Sweden_shotgun_metagenomics/20240209_DNASeq_PE150/CLEAN_READS/clean_reads/output/bowtie2 -o /lfs/bark9299.ui/Sweden_shotgun_metagenomics/20240209_DNASeq_PE150/CLEAN_READS/clean_reads/output/shogun_output

This error is confusing because I though I was only aligning to the WoLr2 database and not using the taxonomy step because I was going to import into qiime2 and use the qiime greengenes2 filter-features and qiime greengenes2 taxonomy-from-table. How do I fix the metadata.yaml file to work with the shogun align function? Is it correct that I am not going to use the shogun assign_taxonomy function and instead import my table from the shogun align command into qiime2?

Thanks again for your help and sorry about the confusion,
E

wasade · January 20, 2026, 9:51pm

We did not place any fragments of that length off of 515F so none would be present.

That is performing a Naive Bayes classification of the data. filter-features is reducing to the set of fragments in your set which have been previously placed in the phylogeny. In our NBT paper, we demonstrated a higher concordance with the taxonomy of placed records relative to WGS data than we observed with Naive Bayes.

I’ve only used the SHOGUN parameter set with bowtie2 followed by woltka classify, I’m not familiar with SHOGUN itself. The command string that qp-woltka uses in Qiita, for instance, is here.

I really do recommend doing the processing in Qiita as it will take care of the compute for you and might simplify this all.

Best,

Daniel