I am looking for database for Cyanobacteria 16S rRNA. Our group generally uses Silva because we delve into many different microbes and microbiome. But, I am running a project that focus on Cyanobacteria diversity in the microbiome and I want to omit other microbes from the analysis.
Do any of you have suggestion?
If this is not an option,
Do you think using Silva and training the classifier with Cyanobacteria-specific primers would result in accurate classifier?
I'd strongly recommend against constructing a reference database that only contains your organism of interest. That is, you need to make sure you have "outgroup" or "decoy" sequences within your reference database. If you only have Cyanobacteria within your reference database, then you'll likely mis-classify many non-cyanobacterial sequences as cyanobacteria. Also, leaving in other taxa will help you filter / remove any non-cyanobacterial sequences from your data prior to analysis.
Furthermore, do not forget that many plastid sequences, i.e. chloroplast sequences, are also fall within cyanobacteria. Using any of the following databases: SILVA, GTDB, RDP, and GreenGenes, should suffice for your needs. You can also use tools like RESCRIPt to add additional cyanobacterial sequences to any of these premade reference databases, or curate them as you need.