I read the following documents:
Training feature classifiers with q2-feature-classifier — QIIME 2 2023.9.2 documentation
Data resources — QIIME 2 2023.9.1 documentation
Since my samples are composed bacterial communities, as I understand it, isn't it possible to create Taxonomy classifiers.qza files that contain only my bacterial communities?
The standard I am using is the D6305 product from zymo research. They provide a full set of 16s sequence fasta files and genome sequence fasta files. But I opened them and found multiple 16S sequences per species, which is normal but bothers me.
My first question is that when I look at the documentation for teaching, which is the 85_otus.fasta file and the 85_otu_taxonomy file, a classified species correlates to only one feature sequence. So how do I make a species correspond to multiple sequences? It's how do I make use of these fasta files that have multiple sequences.
My second question is, and it may be a bit out of place here, please ask databases like greengene and silva where they source their data from. I would only use NCBI to look up the information I am interested in. As an example, there are tens of thousands of E.coli genome sequences on NCBI, which one do greengene and silva use as their reference sequence?
One last question, I stumbled to find that the rrnD sequence of E.coli-K12 is very vastly different from the sequences of the other 6 16S rrn's! Wouldn't it be more appropriate that each species should correspond to multiple 16S RNA sequences? Or is it actually the case that everyone is already doing this already.