Unpublissed sequences in SILVA 138 db; preparing personal database

I am using qiime2 for NGS results from apicomplexan study in ruminants. I used SILVA 138 db classifier but found that some of the sequences were unpublished (so can’t be used as reference sequences). Is there a way that we can exclude unpublished sequences from the database ? Or we should make our own database of published sequences. If yes, please provide a guideline or any link to the guideline?

Thanks
Abdul

1 Like

Hello Abdul,

Welcome to the Qiime 2 forums! :qiime2:

While SILVA 138 might include some Candidatus taxonomy, all the entries have been curated and been published in the SILVA release, and I think the SILVA database itself counts as a publication you can site. You are good to to!

If you still want to construct your own database, you can see that full process here:
https://docs.qiime2.org/2019.10/tutorials/feature-classifier/

Colin

2 Likes

Thanks a lot dear Colin

Abdul

1 Like

Hi all fellows,

I have spent sometime re-curating the SILVA database (for Theileria/Babesia) and found some discrepancies. For example, some of Babesia were classified as Bacteria instead of apicomplexa (e.g. KX881914). So it would be better if one curates it again before using. I also found several N… bases in the published sequences. Although its time consuming but really useful and a must do thing as per my experience since it might change the taxonomic output of your data.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.