Difference between the SILVA and GreenGene Files

(Meha) #1

Hi everybody,

In the page there are two type of SILVA and GreenGene files: With .qza format and with .gz format. What is the difference between the files? Which one is recommended?

Thanks

(Mehrbod Estaki) #2

Hi @Mehrdad,

The Naive Bayes classifiers are pre-trained meaning they are ready to be used to assign taxonomy and are already in qiime2 artifact format (.qza). Among those are a pair pretrained specific to the 515F/806R (V4) region so if these are the primers you used then you’re in luck and you can just use those. If not, you can either train your own classifier as the recommendation goes in that page or use the full-length classifiers. You may want to look through the Forum community contribution section as there are a few other pre-trained classifiers of other regions (ex V3-V4, ITS2) ready for download.

The other .gz files below are the raw sequences available for download directly from the corresponding databases and are not qiime2 files. You would use one of these if you were to train your own classifier specific to your region. Look through the linked tutorial for more details.

1 Like
(Meha) #3

Thanks!
I would like to learn more something on training classifiers.

https://docs.qiime2.org/2019.1/tutorials/feature-classifier/

On the training link you sent above, there are two commands. Would I use only one of them or it needs to use both sequentially? If there is special usage of them why the first lines are same? And Can I use fastq format instead of fasta?
Finally, why the type parameter is in blue? Does it mean special meaning?

I.
*qiime tools import *
–type ‘FeatureData[Sequence]’
–input-path 85_otus.fasta
–output-path 85_otus.qza

II.
*qiime tools import *
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path 85_otu_taxonomy.txt
–output-path ref-taxonomy.qza

Thanks

(Mehrbod Estaki) #4

Hi @Mehrdad,
All the steps in that tutorial are needed and are not ‘optional’ steps.
The 2 import commands you posted are importing 2 different objects. The fasta file is the sequences within the Greengenes database while the second one is the Taxonomy for those sequences (in .txt format). You need both to train the classifier.
You can see a schematic of this in the Qiime2 overview tutorial as well.
Also, please read those pages carefully, including the blue boxes in the training classifier tutorial as it explains why you shouldn’t use 85_otu/85_otu_taxonomy as that threshold is just for demonstration purposes.

(system) closed #5

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.