I am teaching Qiime2 using the tutorial here (https://docs.qiime2.org/2019.10/tutorials/moving-pictures/). I would like to check something about the section – " Taxonomic analysis"
Here is what it says in the tutorials " This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We’ll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy."
1> 515F/806R means the Earth Microbiome Project’s 16S rRNA primers and V4 regions. If I use other primers or same gene markers 16S rRNA but different region such as V1. I have to trained my own taxonomic classifier. Am I correct?
2> It mentions this taxa classifier was trained using Greengenes 13_8 99% OTUs. I downloaded Greengenes V13_8 here (ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz).
After I unzip this, you would get several folders. I used QIIME 1 before, when I use pick_closed_OTU method, I use the “99_otus.fasta” in the rep_set folder as references and “99_otu_taxonomy.txt” in the taxonomy folder for taxonomic name.
Can you tell me which files did you use to train Greengenes’ taxonomic classifier.
3> Back to old days, people use Qiime 1and OTU picking method. Since we use 97% similarity, some people use greengenes database 97% datasets. In this case, they would use 97_otus.fasta and 97_otu_taxonomy.txt?
I notice that QIIME 2 recommends using 99 database. I am wondering how did you train ths classifier. Both references and taxonomic information are used 99 greengenes database, when you trained your own classifier.
In future, if I want to trained my own classifier, I should always try to find 99 level database? If not, I should use the finest level? As far as I know, some database doesn’t have this kind of level (COI, 28SrRNA).
Just want to know the standard? The fine the better?
Thanks.