Hi @nora,
I wonder if you would be able to avoid the splitting of the sequences into two groups. The step I would use are:
- Classify your Sanger sequences, to check if any other similar species is already in GG.
- Add your sanger sequences (and their taxonomy) to GreenGenes, maybe using some artificial species names for your sanger sequences, so they are easily traceable in the final taxonomy plots. If in GG there are already some species similar to your, you may want to add your sanger sequences before GG sequences, so you will have ‘Sanger Seqs + GG’ in this order. If GG lacks of any sequences similar to your, the order is less important.
Having your sequences at the beginning, will help you at the taxonomic assigning step. If you use ‘qiime feature-classifier classify-consensus-blast’ specifying ‘–p-maxaccepts 1’, when a representative sequence will hit one of your sanger sequences at the beginning, blast+ will stop the search and output this sanger sequence as best match. - Visualise the taxonomy to search for your ‘artificial species’, or you could you ‘qiime taxa filter-(table’'https://docs.qiime2.org/2020.11/plugins/available/taxa/filter-table/) to create a new abundance table including only your artificial species.
Hope it make sense