With the l advent if greengenes 2, I thought I'd reanalyse my data with the new database. However, when I exported my data from Qiime2 to R, I ran into a predicament. My ASV table had the normal "4f9070b5d72c4cada7a60e06269fc149" format of classification. However, my taxonomy table was totally different with formats such as "MJ006-1-barcode41-umi149015bins-ubs-3" or "JN873915".
I think this disparity between the two tables is what's causing my phyloseq object error:
Error in validObject(.Object) : invalid class “phyloseq” object:
Component taxa/OTU names do not match.
Taxa indices are critical to analysis.
Try taxa_names()
I have attached screenshots of the tables below, please let me know if you have any advice!
I think my problem was that I tried to use both the Greengenes2 package and qiime2 in tandem to train my own classifier. When I just used Qiime2 to classify my sequences, all went well.
Okay glad it's working. Just to verify, were you successful in using Greengenes2 2022.10 specifically or are you still having an issue that you would like help with?
I came across a similar observation. my taxonomy output (greengenes2_taxonomy.qza) is shorter and has different sample ids, similar to the example abvoe. Am I using the commands correctly?
Thank you for reaching out. Could you clarify, do you mean that you have fewer samples or features in the FeatureTable following application of these commands? These commands should not modify sample IDs, is there an example of how they're changing?
thank you for the prompt reply. I have fewer features in my taxonomy file (greengenes2_taxonomy.qza) as compared to my FeatureTable (table.qza) and repseq file (rep-seqs.qza). I have a similar outcome which Johann had described above. The first screenshot is the taxonomy file. The second screenshot is the feature-table. However, there is no overlap between the rownames in both files. How do I know what taxonomic classification each of my feature has?
The non-v4-16s step is performing closed reference clustering against the backbone, so it isn't unusual for the number of features to reduce.
The underlying plugin, q2-vsearch, does not expose the query / subject mapping information at this time. Please see the discussion here Introducing Greengenes2 2022.10 - #34 by wasade for more information.
If I may interject. Using both the greengenes2 package and Qiime2 for taxonomic classification caused disparities between the two outputs. I subsequently downloaded the greengenes2 data and followed the Qiime2 classifier tutorial. This allowed me to circumvent the greengenes2 package and produce output I was familiar with (i.e. Qiime2 format).
I hope this helps and let me know if you have any questions!
To expand, we provide precomputed Naive Bayes classifiers similar to the prior Greengenes 13_8 and SILVA 138 classifiers. The Greengenes2 ones, as well as GG 13_8 and SILVA 138, are available on the Data Resources. This is also noted in the Greengenes2 tutorial.