GreenGenes classifier

CamilloColleluori · February 29, 2024, 12:56pm

Hello everyone,

I did some taxonomic classification using the gg-13-8-99-515-806-nb-classifier.qza GreenGenes classifier and then imported the resulting taxonomy in R via the qiime2r package.
While inspecting the taxonomy table I noticed that some of the higher taxonomic levels (e.g. Genus) actually have entries such as Phylum (line 2) or Kingdom (line 3).

This implies that when I try to plot the taxonomy at the Genus level as shown below I can't really compare the different genera.

Do those entries simply mean that the classifier wasn't able to assign a Genus to that sample?
And in that case why would it go all the way back to Kingdom instead of e.g. repeating the closest taxonomic level?
Is there something I'm missing here?

P.S. I found out about the more recent GreenGenes2 classifier, which I'm trying out now but I am curious about finding out the reason behind that output

Thank you in advance,
Camillo

cherman2 · February 29, 2024, 2:50pm

Hi @CamilloColleluori,
Would you mind sending your taxonomy and the command you ran to get your taxonomy so I can take a peek?

CamilloColleluori · February 29, 2024, 4:21pm

Hi Chloe,

Thanks for your reply. I obtained those results following the "Moving Pictures" tutorial.
Here's the command I used:

qiime feature-classifier classify-sklearn \
            --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
            --i-reads rep-seqs.qza \
            --o-classification taxonomy.qza

CamilloColleluori · February 29, 2024, 4:33pm

Hi Chloe,

Thanks for your reply. I obtained those results following the "Moving Pictures" tutorial.
Here's the command I used:

qiime feature-classifier classify-sklearn \
            --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
            --i-reads rep-seqs.qza \
            --o-classification taxonomy.qza

In case that's helpful I'll also attach the dada2 command I used to produce the rep-seqs.qza file:

qiime dada2 denoise-ccs \gg-13-8-99-515-806-nb-classifier
    --i-demultiplexed-seqs sequences.qza \
    --p-front AGRGTTYGATYMTGGCTCAG \
    --p-adapter RGYTACCTTGTTACGACTT \
    --p-n-threads 36 \
    --o-representative-sequences rep-seqs.qza \
    --o-table table.qza\
    --o-denoising-stats stats.qza

cherman2 · February 29, 2024, 4:53pm

Would you mind sending or DMing me your taxonomy.qza?

CamilloColleluori · March 1, 2024, 2:42pm

Sure thing!

I'll send it to you in DM.

cherman2 · March 1, 2024, 6:04pm

@CamilloColleluori,
So I exported your taxonomy and I do not see this pattern in the tsv that you get from exporting the data.
I will send you that tsv in the DMs .

Can you tell me what your process is for transfering this data into R?

CamilloColleluori · March 1, 2024, 6:11pm

@cherman2,

Thank you for your help.
Here's the R command I used to import the qza file:

library(qiime2R)

taxa_gg <- read_qza("gg_taxonomy.qza")
taxonomy_gg <- as.matrix(do.call(rbind, strsplit(as.character(taxa_gg$data$Taxon), "; ")))
colnames(taxonomy_gg) <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species")
rownames(taxonomy_gg) <- taxa_gg$data$Feature.ID

I'm not too familiar with the package so I wouldn't be surprised if I was doing something wrong in this step.

cherman2 · March 1, 2024, 6:16pm

Can you print this variable taxa_gg

CamilloColleluori · March 3, 2024, 3:08pm

Sure here it is:

It's classified as large list of 7 elements.

By the way after a little trying, the new GreenGenes2 classifier seemed to have solved the original issue.

cherman2 · March 4, 2024, 6:17pm

Hi @CamilloColleluori,
Could you print this as well?

CamilloColleluori · March 4, 2024, 7:49pm

Hi Chloe,

Here's the result of that after renaming the columns with the taxa and the rows with Feature_ID:

It's classified as a matrix.

cherman2 · March 5, 2024, 3:24pm

Hi @CamilloColleluori,
I think that this is the line that is causing issues with your taxonomy.

Could you print taxa_gg$data$Taxon

system · April 5, 2024, 9:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.