Trouble Training a Classifier using green genes

Parul_Baranwal · March 9, 2020, 2:06am

Hello all,

I am trying to train my classifier.
I downloaded gg_13_5_otus.tar.gz and gg_13_taxonomy.txt.gz files from website https://greengenes.secondgenome.com/?prefix=downloads/greengenes_database/gg_13_5/.

I extracted both the files as shown in following pic.

I am running the following command:

Can someone please help.

Thank you

jwdebelius · March 9, 2020, 10:58am

Hi @Parul_Baranwal,

Check the contents of the gg_13_5_otus taxonomy directory. My guess is that it contains several sub directories, including rep_set (or something similar). You will need to use one of those files as your sequences.

Best,
Justine

Parul_Baranwal · March 9, 2020, 7:39pm

Thank you so much!
It worked!

Parul_Baranwal · March 9, 2020, 10:36pm

I have a followup question:

I am using my paired-end-sequences and I am following the moving picture tutorial.

I performed demultiplexing step and then denoising through DADA2. At this step I got rep-seqs.qza which looks like following in qiime2 view

I am not getting any rep-seqs.qza file while importing the dataset from greengenes.
Just want to make sure am I going correct?

Any suggestions please.

jwdebelius · March 10, 2020, 8:58am

Hi @Parul_Baranwal,

"Rep set" is short hand for "representative subset", basically the sequence that represnts an ASV/OTU. But, I recoganize that it can make the nomenclature confusing!

The "rep set" from greengenes is the sequence that represents that database sequence. The "rep set" from dada2 is the representative sequence for your ASV. You'll use the greengenes sequences to train your classifier and then you'll apply it to your ASV rep seqs.

Best,
Justine

system · April 10, 2020, 3:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.