questions on the use of 16s greengenes database

Hi, everyone. I am Mavis, sorry to bother you, I really need some help.

Today I want to generate taxonomy.qza with greengenes 16s database, and I downloaded the newest greengenes 16S rRNA database from the data resources page and chose the one named 13_8.

I noticed the words which say before I use this, I need to import it into artifacts. So I unzip the gz file. However I can’t see the unzipped file type from my computer .

So my question is, what is the type of this file and what command should I use to import it into artifacts that can be used in QIIME2?

I will appreciate it very well if someone can tell me how to deal with it.

Best wishes~

Mavis

Hi @Mavis,

It’s no bother to ask questions! Its the only way we all learn and work through stuff.

When you unzip the database (or, actually tar -xzf if you want to go command line and fancy), you’ll see five folders:

  • otus
  • repset
  • rep_set_aligned
  • taxonomy
  • trees

And then, inside each folder you’ll see a set of files labeled with numbers, like 99_otus.fasta. Then, I’d recommend looking at the taxonomic classifier tutorial, which will walk you through the steps starting from import. (Seriously, the tutorials are my not so secret weapon when it comes to getting data through QIIME 2).

I’d also check the community contributions and data resources to see if there’s one that works for your hypervariable region. Or, you could just use the full length classifier. The quality of classification is lower, but there’s sometimes something to be said for a quick, easily accessible pre-trained classifier.

Best,
Justine

2 Likes

Hi, @jwdebelius Thank you very mych for your warm reply! And sorry for the late reply for I was busy with reading some papers my teacher gave to me recently.:hear_no_evil:

I have imported the data following your guide.That is very useful to me! And now I will go on my analysis. However, I have another question related to the classifier things. Could you share your opinion with me on this issue?:smiling_face_with_three_hearts:

This means that the full length classifier is not that suitable for my 16s data, right? And I wonder can I use full length data resources to classify my 16s data, and then train feature classifers using the resluts genetrated from full length classifier? Or whether I need to use13_8 greengene classifier and then train my own classifier with data from this pre-trained 16s classifier?

Best wishes!

Mavis

No, the full length classifier is perfectly suitable for your data. It's simply worth noting that your resolution is slightly worse than training your own. In your position, I would probably just use the full length pre-trained classifier if you're not doing V4 and can't find your region in the community resources page.

Best,
Justine

2 Likes

OK, I got it!

Thank you again for your answer~ :heartpulse:

Best wishes!

Mavis

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.