Difficulty with Greengenes2 2022.10

Hello,

I have V3/V4 16s data from 2022 that I have previously analyzed using the Greengenes 13_8 classifier with the “fragment-insertion sepp” command with no issues. For reference, I had been following this tutorial: Parkinson’s Mouse Tutorial — QIIME 2 2022.11.1 documentation

I am trying to publish said data, but the reviewers have asked that I update to Greengenes2. I thought this would be simple, but over the last couple of weeks I have been running into various issues. I was wondering if someone could provide some suggestions/assistance. I am using 341f and 806r primers, which have been cut with CutAdapt and denoised with DADA2.

  1. I have to use GG2 version 2022.10 because my allocation will not let me update beyond QIIME 2023.2. We are currently working to install an updated version, but I fear it will not update before I have to submit the manuscript. It is my understanding that if I can update to a newer version of QIIME, I can use sklearn or fragment-insertion directly on my representative sequences to create the new taxonomy. Is this correct? or can I attempt this in 2022.10?
  2. I am confused about which table to use when using taxonomy-from-features or taxonomy-from-table. Am I using the unrarified DADA2 table, the rarified DADA2 table or the representative sequence table?
  3. What is the difference between the MD5 hash files and the ASV files? How do I know which type of data I have?

I feel like I have tried everything, and I am not sure if the outputs I do have are correct.

Any and all help will be appreciated. :slight_smile:

Many thanks,

Hailey :turtle:

Hey @hamc,

Welcome to the :qiime2: forum :waving_hand:

I have some clarifying questions, just to make sure I understand your workflow here.

You are wanting to use the new Greengenes2 classifier correct? Not q2-greengenes2 (the plugin)?

Assuming this is the case, there are a couple of versions of these classifiers that have been released. Our latest supported version is gg2 2024.09, which is compatible with QIIME 2 versions 2024.5 - present (2025.10). You will need to have at least QIIME 2 2024.5 installed in order to use this classifier within your QIIME 2 analysis due to the version of sklearn it was trained on.

You mention you're unable to update your version of QIIME 2 - is this due to an issue installing on your institution's HPC? Or is there a specific installation issue that you're running into?

Once we get your working environment sorted, it will be easier to answer questions about your specific analysis workflow.

Cheers :lizard:

2 Likes

Hi Liz,

Thank you for your response!

Yes, I want to use the classifier so that my existing workflow isn’t disrupted. I have been able to install the plugin, and have tried to work that way because of QIIME not being able to update. As you can see by my comment, I am a little confused on the proper input and reference files when using the plugin commands.

Yes, the installation issue is due to the HPC we are using. I tried to update QIIME myself, but it would fail because of Ubuntu(?) not being updated. So my installation issues have nothing to do with QIIME itself. I am hoping the people who run our HPC can update it sometime this week.

Does that provide enough clarification?

Hailey

Hi @hamc,

So it sounds like you are attempting to use the q2-greengenes2 plugin for the purpose of utilizing the new classifier? I can't provide much insight into using that plugin, but @wasade may have some insight there as he is one of the developers of q2-greengenes2.

With that being said, if you can get the latest version of QIIME 2 working on your HPC, that would be our recommendation. I'm guessing the issue with the install is due to the fact that your HPC is using a much older linux with an incompatible glibc (likely centOS or something similar, if that rings any bells). Our upcoming QIIME 2 release will contain linux environment files that will be compatible with older glibc versions (>=2.28). If you happen to know what the arch/linux distribution is on your HPC, I can check to see if that will be compatible with the new release.

Cheers :lizard:

Yes, I have been attempting to use the plugin because of being unable to update QIIME. But have been confused about how to use the plugin with my data.

I am not sure how to check the linux distribution.

Hey @hamc,

Understood, thanks for the clarification! I'll defer to @wasade on assistance with the plugin in that case.

With regards to your HPC's linux distribution, no worries if you don't know that offhand - really what I'm interested in is the version of glibc that your cluster's distribution is using. The easiest way to determine that is by running:

ldd --version

If that works, you should see something like:

ldd (GNU libc) 2.28
Copyright (C) 2018 Free Software Foundation, Inc.

If that doesn't work, you can also try:

/lib64/libc.so.6

You should then see something like this:

GNU C Library (GNU libc) stable release version 2.28.

This package version is where we've run into some HPC compatibility issues in the past - if your cluster is on a really old version of glibc (something like 2.17 or earlier), QIIME 2 will need to be installed inside a container on the cluster. However, if your cluster is running on 2.28 or greater you should be able to install the latest version of QIIME 2 once it's released (likely at the end of this week).

Cheers :lizard:

Hi @hamc,

Could you clarify how the plugin is being used? Use of the Naive Bayes models does not require q2-greengenes2 as they are compatible with q2-feature-classifier.

Best,

Daniel

1 Like

Ah, it looks like I do have 2.28. Now running into some permission issues with my HPC. Hoping they solve it soon - thanks for the tip!

1 Like

Hi Daniel,

I have tried using it both ways. When I try to use the classifier, I get the following error:

(1/1) Invalid value for '--i-classifier':
./2022.10.backbone.v4.nb.sklearn-1.4.2.qza was created by 'QIIME
2024.5.0.dev0+4.gc35a5aa'. The currently installed framework cannot
interpret archive version '6'.

I am not really sure why - but I figure this has to do with the version of QIIME that I am using.

Realizing this error, I tried using the non-v4-16s command pipeline. Though I have V4 data, it did not seem that the primers used in development are the same as mine? So my code looks something like this:

qiime greengenes2 non-v4-16s \

--i-table ./DADA2tableFinal2022.qza \

--i-sequences ./rep-seqsFinal.qza \

--i-backbone ./2022.10.backbone.full-length.fna.qza \

--o-mapped-table ./GG2_mapped_table \

--o-representatives ./GG2_rep_seq

qiime greengenes2 taxonomy-from-table \

--i-reference-taxonomy ./2022.10.taxonomy.asv.nwk.qza \

--i-table ./mapped_table.qza \

--o-classification ./gg2.taxonomy.qza

Then tried to use the taxonomy for my diversity calculations. But, it seemed that my diversity outputs changed considerably when doing this. That is why I wasn’t sure if I was using the plugin correctly.

I hope that that makes sense, but let me know if further clarification is needed.

Hailey

Thanks, @hamc. I’m not sure w.r.t. the issue with q2-feature-classifier of versions, but I would recommend using Greengenes2 2024.09, does that pretrained model work?

If your data are V4 off of 515F then there are ~20M ASVs already placed in the tree. filter-features supports this, and will be more precise than close referenced processing which non-v4-16s performs, and allows for use of phylogenetic diversity assessments. The catch is filter-features is an exact match, and we placed at 90, 100, 125 (I think?) and 150nt. Possibly some at 250nt. Our largest set of ASVs will be for 90, 100, 125, 150. If your data are 515F, then you should expect that many to most of your ASVs will have TAC on the 5’ end as those positions are well conserved in 16S, if you don’t see that then it is worth verifying the primers are not still present.

Best,
Daniel

1 Like

Greengenes2 2024.09 does not currently work for me, because I am using an incompatible version of QIIME (2023.2) and cannot update due to HPC permissions. If/when I am able to update to the newer versions, which backbone file do you recommend using with q2-feature-classifier?

We used 341F primers, which was my main logic behind using non-v4-16s. Would you recommend I proceed with non-v4-16s still?

1 Like

Okay. The full length Naive Bayes classifier given the primers. You may be able to run the Docker container of QIIME 2 via Singularity which is not uncommon in HPC environments. An alternative to closed reference with non-v4-16s is to place with SEPP ( Arbitrary ASV fragment placement in Greengenes2 )

Best,

Daniel

1 Like