As my feature-tabledon't match the taxonomy anymore, I am having trouble to combine greengenes2 taxonomy with other analyses? for example give name to differential-abundant ASVs (found with Songbird), or collapse my feature-tables to do other analyses.
I also tried to train a v3v4-greengenes2-classifier as suggested in the tutorials and comments, and then assign taxonomy to my ASVs with sklearn and therefore, not loosing my original IDs.
However, when I compare the taxonomies generated with these two options with qiime2-rescript I see this latter option may be worse, as it has more unclassified features and moreover, relative abundances at low taxonomic levels change substantially. So, it would be better to use the first option (non-v4-16s)?
Hence, my question is: which would be the best way to combine greengenes2 non-v4-16s with other analyses that require to apply taxonomy on our original IDs (differential abundance, collapse tables and others)?
I think that the main issue here is that (as I understood) GG2 classifier will cluster your original ASVs and 99% identity (or provided one) against reference reads and actually replace this cluster of sequences with a sequence from reference, summarizing counts. In the output, one will get a feature table with new sequence IDs and a taxonomy file with the same IDs as in your new table. Then one can obtain a tree from GG2, already constructed, that can be used for phylogenetic analyses.
The easier way to match them for me will be to perform GG2 annotation right after denoising and filtering steps and use produced files for all the analyses after (DA, core-metrics and so on), or use another approach that you already described by training classifier with GG2 files.
Is methodologically correct (or at least the same as usual) to compute core-metrics and DA-analyses on greengenes2 output table and sequences? I believe they won't change a lot, but I'm afraid not to get the same results since the number of features is reduced, and frequencies per sample change (even the order of samples by frequency).
By the way, I also checked some of the analyses and for example seems Songbird doesn't accept gg2 output table (error message: Invalid value for '--i-table': 'table.1-gg.qza' is not a QIIME 2
Artifact (.qza))
I would say that it is not incorrect.
I am sure that results will change since they will be changed even if you just rerun the core-metrics plugin with the same data (random subsampling). I can not predict whether all the main trends that you already discovered will change or stay unaffected. I would give it a try or go for the trained classifier that you already used.
Okay, I just re-run core-metrics and relative frequency from collapsed tables and results seem not to change a lot (appart from some numbers).
I don't understand why Songbird doesn't accept de feature table coming from greengenes2, and for this reason I'll follow the trained-classifier approach, despite this classification may seem a little worse. If someone has an idea why Songbird does that it would be very much apreciated.
Can you confirm that the table.1-gg.qza file used with Songbird is the same file used with the other analyses?
And correct, it is easiest to interpret the results if the downstream methods (e.g., diversity analyses, songbird, etc), are applied to a derivative of the same feature table such as what's mapped against Greengenes2
Hi @wasade , I saw what was going on!
I did all the greengenes2 commands in the latest qiime2-2023.9 version. After, I also did qiime diversity core-metrics-phylogenetic, as well as other diversity commands inside the same version. However, when I jumped to Songbird I used qiime2-2020.6 as it is suggested in the q2-songbird installation (GitHub - biocore/songbird: Vanilla regression methods for microbiome differential abundance analysis). The error was dued to incompatibility between 2023.9 and 2020.6 feature tables.
So, finally I was able tu run songbird by exporting 2023.9 table and re-importing it with 2020.6. These would be the commands:
Is it correct to export-import qiime2-greengenes tables with different versions? Or it is not recomended to proces greengenes2 tables with qiime2-2020.6 version? I don't know if switching between different versions can impact the analyses...
By the way, in the core-metrics command (#2), is 2022.10.phylogeny.asv.nwk.qza the correct phylogeny tree to use?
Thank you, @pau. I would recommend installing Songbird in the same environment to avoid switching QIIME 2 versions. The issue is most likely not Greengenes2 specific, but rather changes in how the QIIME 2 artifacts are represented as a lot in QIIME 2 has changed in the last three years.
That phylogeny works but the others would also be fine as the IDs are relative to the backbone given the use of non-v4-16s
I tried songbird with 2023.9 and it caused me some errors, in fact in the instructions it is said: make sure that QIIME 2 is in between (version 2019.7 and 2020.6 ) is installed before installing Songbird.
So I'm not completely sure how to avoid the change between qiime versions.