Greengenes2 + other analyses recomended workflow

Dear all,
I've just started switching qiime2 workflows to use greengenes2. As I'm working with v3v4 region I used the non-v4-16s option to map my ASVs to greengenes2. As I also have seen on this post ASV table and Taxonomy table have different formats with greengenes2 - #6 by wasade I get a decrease and ID-change in my number of features, and as I've seen here Introducing Greengenes2 2022.10 - #33 by liang_zhou I cannot get my original IDs back.

As my feature-tabledon't match the taxonomy anymore, I am having trouble to combine greengenes2 taxonomy with other analyses? for example give name to differential-abundant ASVs (found with Songbird), or collapse my feature-tables to do other analyses.

I also tried to train a v3v4-greengenes2-classifier as suggested in the tutorials and comments, and then assign taxonomy to my ASVs with sklearn and therefore, not loosing my original IDs.

qiime feature-classifier extract-reads
--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer CCTACGGGNGGCWGCAG --p-r-primer NACTACHVGGGTATCTAATCC
--p-read-orientation both
--o-reads ref_seqs-classifier.qza
--p-n-jobs 8

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref_seqs-classifier.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier gg2-v3v4-classifier.qza

However, when I compare the taxonomies generated with these two options with qiime2-rescript I see this latter option may be worse, as it has more unclassified features and moreover, relative abundances at low taxonomic levels change substantially. So, it would be better to use the first option (non-v4-16s)?

Hence, my question is: which would be the best way to combine greengenes2 non-v4-16s with other analyses that require to apply taxonomy on our original IDs (differential abundance, collapse tables and others)?

Thanks a lot in advance :slight_smile:

I think that the main issue here is that (as I understood) GG2 classifier will cluster your original ASVs and 99% identity (or provided one) against reference reads and actually replace this cluster of sequences with a sequence from reference, summarizing counts. In the output, one will get a feature table with new sequence IDs and a taxonomy file with the same IDs as in your new table. Then one can obtain a tree from GG2, already constructed, that can be used for phylogenetic analyses.
The easier way to match them for me will be to perform GG2 annotation right after denoising and filtering steps and use produced files for all the analyses after (DA, core-metrics and so on), or use another approach that you already described by training classifier with GG2 files.

Best,

2 Likes

Thanks @timanix for your quick response!

Is methodologically correct (or at least the same as usual) to compute core-metrics and DA-analyses on greengenes2 output table and sequences? I believe they won't change a lot, but I'm afraid not to get the same results since the number of features is reduced, and frequencies per sample change (even the order of samples by frequency).

By the way, I also checked some of the analyses and for example seems Songbird doesn't accept gg2 output table (error message: Invalid value for '--i-table': 'table.1-gg.qza' is not a QIIME 2
Artifact (.qza))

I would say that it is not incorrect.
I am sure that results will change since they will be changed even if you just rerun the core-metrics plugin with the same data (random subsampling). I can not predict whether all the main trends that you already discovered will change or stay unaffected. I would give it a try or go for the trained classifier that you already used.

1 Like

Okay, I just re-run core-metrics and relative frequency from collapsed tables and results seem not to change a lot (appart from some numbers).
I don't understand why Songbird doesn't accept de feature table coming from greengenes2, and for this reason I'll follow the trained-classifier approach, despite this classification may seem a little worse. If someone has an idea why Songbird does that it would be very much apreciated.

Thanks @timanix for your kind help

1 Like

Hi @pau,

Can you confirm that the table.1-gg.qza file used with Songbird is the same file used with the other analyses?

And correct, it is easiest to interpret the results if the downstream methods (e.g., diversity analyses, songbird, etc), are applied to a derivative of the same feature table such as what's mapped against Greengenes2

Best,
Daniel

1 Like

Hi @wasade , I saw what was going on!
I did all the greengenes2 commands in the latest qiime2-2023.9 version. After, I also did qiime diversity core-metrics-phylogenetic, as well as other diversity commands inside the same version. However, when I jumped to Songbird I used qiime2-2020.6 as it is suggested in the q2-songbird installation (GitHub - biocore/songbird: Vanilla regression methods for microbiome differential abundance analysis). The error was dued to incompatibility between 2023.9 and 2020.6 feature tables.

So, finally I was able tu run songbird by exporting 2023.9 table and re-importing it with 2020.6. These would be the commands:

qiime2-2023.9
#1: compute greengenes2-table
qiime greengenes2 non-v4-16s
--i-table table-1.qza
--i-sequences rep-seqs-1.qza
--i-backbone 2022.10.backbone.full-length.fna.qza
--p-threads 8
--o-mapped-table table.1-gg.qza
--o-representatives rep-seqs-gg.qza

qiime greengenes2 taxonomy-from-table
--i-reference-taxonomy 2022.10.taxonomy.asv.nwk.qza
--i-table table.1-gg.qza
--o-classification taxonomy.qza

#2: do core-metrics and diversity analyses
qiime diversity core-metrics-phylogenetic
--i-phylogeny 2022.10.phylogeny.asv.nwk.qza
--i-table table.1-gg.qza
--p-sampling-depth $depth
--m-metadata-file metadata.tsv
--output-dir core-metrics-results
--p-n-jobs-or-threads 4

#3export gg2 table
qiime tools export
--input-path table.1-gg.qza
--output-path songbird/table.1-gg-2023.9

switch qiime2-2023.9 by 2020.6

#4re-import table
qiime tools import
--input-path songbird/table.1-gg-2023.9/feature-table.biom
--type 'FeatureTable[Frequency]'
--input-format BIOMV210Format
--output-path songbird/table.1-gg-2020.6.qza

#5run songbird

Is it correct to export-import qiime2-greengenes tables with different versions? Or it is not recomended to proces greengenes2 tables with qiime2-2020.6 version? I don't know if switching between different versions can impact the analyses...

By the way, in the core-metrics command (#2), is 2022.10.phylogeny.asv.nwk.qza the correct phylogeny tree to use?

Thanks a lot for your help!!

2 Likes

Thank you, @pau. I would recommend installing Songbird in the same environment to avoid switching QIIME 2 versions. The issue is most likely not Greengenes2 specific, but rather changes in how the QIIME 2 artifacts are represented as a lot in QIIME 2 has changed in the last three years.

That phylogeny works but the others would also be fine as the IDs are relative to the backbone given the use of non-v4-16s

All the best,
Daniel

1 Like

Hi again @wasade

I tried songbird with 2023.9 and it caused me some errors, in fact in the instructions it is said: make sure that QIIME 2 is in between (version 2019.7 and 2020.6 ) is installed before installing Songbird.

So I'm not completely sure how to avoid the change between qiime versions.

Again, thanks for your kind help! :slight_smile:

Hi, we're not maintaining Songbird beyond 2020.6.

I can't comment on the artifacts, but the biom format hasn't change significantly so exporting the qza should still be an option.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.