GreenGenes classifier

Hello everyone,

I did some taxonomic classification using the gg-13-8-99-515-806-nb-classifier.qza GreenGenes classifier and then imported the resulting taxonomy in R via the qiime2r package.
While inspecting the taxonomy table I noticed that some of the higher taxonomic levels (e.g. Genus) actually have entries such as Phylum (line 2) or Kingdom (line 3).

This implies that when I try to plot the taxonomy at the Genus level as shown below I can't really compare the different genera.

Do those entries simply mean that the classifier wasn't able to assign a Genus to that sample?
And in that case why would it go all the way back to Kingdom instead of e.g. repeating the closest taxonomic level?
Is there something I'm missing here?

P.S. I found out about the more recent GreenGenes2 classifier, which I'm trying out now but I am curious about finding out the reason behind that output

Thank you in advance,
Camillo

Hi @CamilloColleluori,
Would you mind sending your taxonomy and the command you ran to get your taxonomy so I can take a peek?

Hi Chloe,

Thanks for your reply. I obtained those results following the "Moving Pictures" tutorial.
Here's the command I used:

qiime feature-classifier classify-sklearn \
            --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
            --i-reads rep-seqs.qza \
            --o-classification taxonomy.qza 

Hi Chloe,

Thanks for your reply. I obtained those results following the "Moving Pictures" tutorial.
Here's the command I used:

qiime feature-classifier classify-sklearn \
            --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
            --i-reads rep-seqs.qza \
            --o-classification taxonomy.qza 

In case that's helpful I'll also attach the dada2 command I used to produce the rep-seqs.qza file:

qiime dada2 denoise-ccs \gg-13-8-99-515-806-nb-classifier
    --i-demultiplexed-seqs sequences.qza \
    --p-front AGRGTTYGATYMTGGCTCAG \
    --p-adapter RGYTACCTTGTTACGACTT \
    --p-n-threads 36 \
    --o-representative-sequences rep-seqs.qza \
    --o-table table.qza\
    --o-denoising-stats stats.qza

Would you mind sending or DMing me your taxonomy.qza?

Sure thing!

I'll send it to you in DM.

@CamilloColleluori,
So I exported your taxonomy and I do not see this pattern in the tsv that you get from exporting the data.
I will send you that tsv in the DMs .

Can you tell me what your process is for transfering this data into R?

@cherman2,

Thank you for your help.
Here's the R command I used to import the qza file:

library(qiime2R)

taxa_gg <- read_qza("gg_taxonomy.qza")
taxonomy_gg <- as.matrix(do.call(rbind, strsplit(as.character(taxa_gg$data$Taxon), "; ")))
colnames(taxonomy_gg) <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species")
rownames(taxonomy_gg) <- taxa_gg$data$Feature.ID

I'm not too familiar with the package so I wouldn't be surprised if I was doing something wrong in this step.

Can you print this variable taxa_gg

Sure here it is:

It's classified as large list of 7 elements.

By the way after a little trying, the new GreenGenes2 classifier seemed to have solved the original issue.

Hi @CamilloColleluori,
Could you print this as well?

Hi Chloe,

Here's the result of that after renaming the columns with the taxa and the rows with Feature_ID:

It's classified as a matrix.

Hi @CamilloColleluori,
I think that this is the line that is causing issues with your taxonomy.

Could you print taxa_gg$data$Taxon

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.