ASV table and Taxonomy table have different formats with greengenes2

Dear Qiime2 community,

With the l advent if greengenes 2, I thought I'd reanalyse my data with the new database. However, when I exported my data from Qiime2 to R, I ran into a predicament. My ASV table had the normal "4f9070b5d72c4cada7a60e06269fc149" format of classification. However, my taxonomy table was totally different with formats such as "MJ006-1-barcode41-umi149015bins-ubs-3" or "JN873915".

I think this disparity between the two tables is what's causing my phyloseq object error:
Error in validObject(.Object) : invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()

I have attached screenshots of the tables below, please let me know if you have any advice!

Kind regards,

Johann

image

image

Hi @Johanndb,

How did you process your data against Greengenes2?

Best,
Daniel

Hi @wasade,

I think my problem was that I tried to use both the Greengenes2 package and qiime2 in tandem to train my own classifier. When I just used Qiime2 to classify my sequences, all went well.

Kind regards,

Johann

Hi @Johanndb,

Okay :slight_smile: glad it's working. Just to verify, were you successful in using Greengenes2 2022.10 specifically or are you still having an issue that you would like help with?

Best,
Daniel

I came across a similar observation. my taxonomy output (greengenes2_taxonomy.qza) is shorter and has different sample ids, similar to the example abvoe. Am I using the commands correctly?

qiime greengenes2 non-v4-16s
--i-table table.qza
--i-sequences rep-seqs.qza
--i-backbone $DB/2022.10.backbone.full-length.fna.qza
--o-mapped-table greengenes_23_biom.qza
--o-representatives greengenes_23_fna.qza

qiime greengenes2 taxonomy-from-table
--i-reference-taxonomy $DB/2022.10.taxonomy.asv.nwk.qza
--i-table greengenes_23_biom.qza
--o-classification greengenes2_taxonomy.qza

table.qza: feature table of my data
rep-seqs.qza: representative sequences of my ASVs

Hi @Alexandra_Bastkowska,

Thank you for reaching out. Could you clarify, do you mean that you have fewer samples or features in the FeatureTable following application of these commands? These commands should not modify sample IDs, is there an example of how they're changing?

Best,
Daniel

Hi @wasade ,

thank you for the prompt reply. I have fewer features in my taxonomy file (greengenes2_taxonomy.qza) as compared to my FeatureTable (table.qza) and repseq file (rep-seqs.qza). I have a similar outcome which Johann had described above. The first screenshot is the taxonomy file. The second screenshot is the feature-table. However, there is no overlap between the rownames in both files. How do I know what taxonomic classification each of my feature has?

Thanks,
Alex

2 Likes

Hi Alex,

The non-v4-16s step is performing closed reference clustering against the backbone, so it isn't unusual for the number of features to reduce.

The underlying plugin, q2-vsearch, does not expose the query / subject mapping information at this time. Please see the discussion here Introducing Greengenes2 2022.10 - #34 by wasade for more information.

Best,
Daniel

Hi @Alexandra_Bastkowska,

If I may interject. Using both the greengenes2 package and Qiime2 for taxonomic classification caused disparities between the two outputs. I subsequently downloaded the greengenes2 data and followed the Qiime2 classifier tutorial. This allowed me to circumvent the greengenes2 package and produce output I was familiar with (i.e. Qiime2 format).

I hope this helps and let me know if you have any questions!

Kind regards,

Johann

Thanks, @Johanndb!

To expand, we provide precomputed Naive Bayes classifiers similar to the prior Greengenes 13_8 and SILVA 138 classifiers. The Greengenes2 ones, as well as GG 13_8 and SILVA 138, are available on the Data Resources. This is also noted in the Greengenes2 tutorial.

All the best,
Daniel

1 Like