Unpublissed sequences in SILVA 138 db; preparing personal database

I am using qiime2 for NGS results from apicomplexan study in ruminants. I used SILVA 138 db classifier but found that some of the sequences were unpublished (so can’t be used as reference sequences). Is there a way that we can exclude unpublished sequences from the database ? Or we should make our own database of published sequences. If yes, please provide a guideline or any link to the guideline?

Thanks
Abdul

1 Like

Hello Abdul,

Welcome to the Qiime 2 forums! :qiime2:

While SILVA 138 might include some Candidatus taxonomy, all the entries have been curated and been published in the SILVA release, and I think the SILVA database itself counts as a publication you can site. You are good to to!

If you still want to construct your own database, you can see that full process here:
https://docs.qiime2.org/2019.10/tutorials/feature-classifier/

Colin

2 Likes

Thanks a lot dear Colin

Abdul

1 Like

Hi all fellows,

I have spent sometime re-curating the SILVA database (for Theileria/Babesia) and found some discrepancies. For example, some of Babesia were classified as Bacteria instead of apicomplexa (e.g. KX881914). So it would be better if one curates it again before using. I also found several N… bases in the published sequences. Although its time consuming but really useful and a must do thing as per my experience since it might change the taxonomic output of your data.