installing and using q2-greengenes plugin

minjoy · December 13, 2023, 9:42am

I'm glad to use brand-new database Greengenes2!
I have a few questions while I'm using it.

I failed to execute the greengenes2 plugin, although I’ve done 'pip install q2-greengenes2' (error: QIIME 2 has no plugin/command named 'greengenes2'). I suspect it might be a version update problem with Ubuntu (my version is 20.04.6 LTS).
Therefore, instead of using greengenes2 plugin, I downloaded '2022.10.backbone.full-length.fna.qza' and '2022.10.backbone.tax.qza' from the Index of greengenes_release/2022.10.

Is downloading the two files ('2022.10.backbone.full-length.fna.qza' and '2022.10.backbone.tax.qza') and proceeding with the rest of the process the same as creating the two files through 'If there is data other than V4'?
When I download two files from online ('2022.10.backbone.full-length.fna.qza' and '2022.10.backbone.tax.qza' ), are there md5Sum values available?
For performing 'feature-classifier' using naïve-bayes, it only took 1-2 hours. Can it be this fast? (Of course, the speed depends on the CPU and GPU I use.)
I'm training a classifier on V3-V4 regions using Greengenes2 using the following commands below.

qiime feature-classifier extract-reads
--i-sequences 2022.10.backbone.full-length.fna.qza
--p-f-primer CCTAYGGGRBGCASCAG
--p-r-primer GGACTACNNGGGTATCTAAT
--o-reads Greengenes2_ref_seqs.qza --verbose

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads Greengenes2_ref_seqs.qza
--i-reference-taxonomy 2022.10.backbone.tax.qza
--o-classifier Greengenes2-classifier.qza

Are there any problems?

Can I have any sample data that I can test the classifier?

Thank you for helping me a lot!! have a great day!

wasade · December 14, 2023, 4:25pm

Hi @minjoy,

Thank you for the kind words, and I'm sorry for the challenges!

Could you provide the output from pip install q2-greengenes2, and can you describe what QIIME 2 environment this was installed into?

To your questions:

That should work

Great question It wouldn't hurt to include them but we don't currently automatically compute them. I've added an issue.

@ebolyen, do you know off-hand whether md5 is necessary for .qza to assert a download is reasonable?

@minjoy, the relevant MD5s are:

$ md5sum 2022.10.backbone.full-length.fna.qza
eb2c35a194ca340fba4ce486c08747d3  2022.10.backbone.full-length.fna.qza
$ md5sum 2022.10.backbone.tax.qza
010ade1db3c86814cdd0fe61be04dc8a  2022.10.backbone.tax.qza

I don't recall off hand what to expect for the performance of application of feature-classifier. Is that performance okay for your use?

These seem reasonable. For reference, the commands used to construct the release classifiers can be found here.

There are a lot of public data readily accessible in Qiita, and which can be obtained via the command line using redbiom. A tutorial on that can be found here.

All the best,
Daniel

minjoy · December 15, 2023, 6:21am

Hi Daniel,
Thank you for your kind answers!!

When I did 'pip install q2-greengenes2', there were a bunch of 'Requirement already satisfied... sentences, looked like the installation was successful. Also I installed Qiime2 version 2023.9.

Yes! For me, it felt like super fast. I heard this process took like 20 hours or more than a day!

I will try it!

Again, thank you!!
Have a happy day!