I want to train my own V3-V4 database. I read turoial 'Training feature classifiers with q2-feature-classifier' but still I am not sure what to use when I want to train my own database. I know that I need the reference sequences and the corresponding taxonomic classifications. But which files from this site: Index of /greengenes_release/2022.10
are the reference sequences and corresponding taxonomic classifications?
I took 2022.10.backbone.full-length.fna.qza) and 2022.10.taxonomy.asv.tsv.qza but after training and testing database, my sequences were assigned only to Bacteria with any further classification. I assume I took wrong files..
Usually this happens when most, or all, of your sequences are oriented differently compared to your reference data. That is both your reference sequences and your data need to be oriented in the 5'-3' direction.
One quick sanity-check, to see if this is the case, is to try feature-classifier classify-consensus-vsearch... as this approach does not care about orientation.
That being said, it might not be a good idea to construct a phylogeny as we're unsure if most, or all, of your reads are oriented similarly.
Now I checked that yestarday, when I tried to train GreenGenes database, I used taxonomy file which has a header, while in command I wrote -input-format HeaderlessTSVTaxonomyFormat . Maybe this is the case?
Sorry for my mistake..
Ewelina
I have not used the new GreenGenes database much myself. Did you obtain any classification by simply using the full-length GreenGenes 2 database? I wonder if something went wrong at the feature-classifier extract-reads step?
Potentially. Just to be clear, are you saying that you got your V3V4 GreenGenes database to work?
And yes I tried again to train gg db with the same files as before, and this time it worked.
So problem solved. Thank you for your assistance!
Here my results comparing GreenGenes database trained by myself (v3-V4 region) vs full length (default ready to use trained db downloaded from qiime site)
my own trained db (V3-V4)