Greengenes 1 vs 2

Dana · December 17, 2023, 10:05am

Hi,
I compared Greenngenes1 and Greenngenes2 and noticed a significant drop in features (from 1,220 to 420) and total frequency (from ~4M to ~2M) in Greenngenes2-table.
Is that expected?
I used qiime greengenes2 non-v4-16s command and gg-13-8-99-515-806-nb-classifier.qza classifier for Greenngenes1.

Thanks!
Dana.

cherman2 · December 18, 2023, 4:56pm

Hi @Dana,
Could you provide more information?

What commansd are you running?
what do you mean significant drop in features? Do you mean a significant drop in labeled features?

is your data non-v4-16?

wasade · December 18, 2023, 5:17pm

Hi @Dana,

To expand a little more, do you see a similar proportion of total reads recruiting to Greengenes2?

Best,
Daniel

Dana · December 19, 2023, 11:55am

I first visualized the table I got after the dada2 command (qiime dada2 denoise-single with
--p-trim-left 13 and --p-trunc-len 230) and got:

And then for greengenes2 I run qiime greengenes2 non-v4-16s and visualized the mapped-table and got:

My data is V4 sequences but when I try qiime greengenes2 filter-features and then qiime greengenes2 taxonomy-from-table I get ‘No requested tips found’ error. After reading this post I understood that I can also run non-v4-16s command.

Thank you for the help!

wasade · December 19, 2023, 4:55pm

Hi @Dana,

Interesting, thank you! If you don't mind, what environment are these data from? Do you recover more total frequency with a relaxed clustering threshold?

Given the trim and truncation, I would not anticipate ASVs to be found on filter-features. We primarily placed 90, 100, and 150nt sequences based on the EMP protocol. These sequences will tend to start with TAC on the 5' as that is immediately proximal to the 515F fwd primer

Best,
Daniel

Dana · December 21, 2023, 8:50am

Thank you Daniel,
I didn't completely understand your response. We work with human and mouse samples, and the problem I described is from a mouse dataset - we did not have similar issues with a human vaginal dataset. I am using qiime2 2023.7. In all cases, we trimmed only the barcodes from the front end of the sequences.

Can you explain what you mean by the relaxed clustering? I used the default clustering threshold in DADA2 and did not specify a threshold in the gg2 command.

Can you explain what the gg2 filtering does exactly?
If I would like to retain my taxa table, including unnamed taxa, can I use gg2 as a classifier (in qiime feature-classifier classify-sklearn command) rather than using the filtering function? In the past, I used gg1 only as a classifier.

Thank you again for your help.
Dana

wasade · December 21, 2023, 8:46pm

Hi Dana,

By relaxed threshold, I mean to reduce the level of similarity used with vsearch via non-v4-16s.

GG2 filtering computes a set intersection between the features in your table, and the features in the tree. So the features need to exist exactly. In practice, for well studied environments in particular like human and mouse samples, at 90, 100, or 150nt fwd ASVs from V4 relative to the EMP 16S primers, I would expect most ASVs to match and representative of most of the sequence data, as we placed ~20M fragments from over 300,000 public and private microbiome samples.

There exists pre-computed Naive Bayes classifiers that are compatible with the feature-classifier sklearn action in QIIME 2 data resources

Best,
Daniel

system · January 22, 2024, 2:47am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.