Silva database / own taxonomy database

joannakolodz · August 5, 2024, 7:38am

Hi, hope you’re all fine!

I have eukaryotic data (18S, V3-V4) and used Silva database (silva-138-99-nb-classifier). When I compared sequences with NCBI I realized that it’s not unified: sequences were assigned to different species according to database and NCBI. I tried PR2, but it wasn’t much better.

So I decided to work with Silva 138.2 following the procedure written here: Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt.

I failed, because the result was much worse, and those are only result I got:

d__Bacteria;p__Pseudomonadota;c__Alphaproteobacteria;;;;d__Eukaryota;p__Cnidaria;c__Myxozoa;o__Bivalvulida;f__Bivalvulida;g__Bivalvulida;s__d__Eukaryota;;;;;;d__Bacteria;;;;;;Unassigned;;;;;__;__d

I tried to skip all the filtering steps and move from:

qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza

directly to:
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138.1-ssu-nr99-seqs.qza
--i-reference-taxonomy silva-138.1-ssu-nr99-tax.qza
--o-classifier silva-138.1-ssu-nr99-classifier.qza

and got a problem that sequences enlisted in annotated_rep-set file don’t match NCBI results, when copied to the browser, and the seventh level of taxonomy is missing.

I tried to built my own reference database from NCBI, I downloaded sequences from desired region and groups, concatenated to the one .fna file and.. got stuck. Maybe someone has an experience in that?

Thank you in advance for your help!

Best,
Joanna

SoilRotifer · August 5, 2024, 8:03pm

HI @joannakolodz,

By default this plugin will only return domain, phylum, class, order , family, genus. You can return any other ranks from SILVA by specifying them via the --p-ranks flag. This will not work for species as SILVA does not curate species taxonomy. We've added an option to append the organism name as the species label, but these can be unreliable (as outlined in the tutorial you linked). If you would really like to use these labels then you'll need to enable the --include-species-labels flag for parsing the SILVA taxonomy. Note the warnings we provide in the tutorial!

Also, different databases curate taxonomy differently, so expect some differences.

Can you share the classification output files (QZVs)? It is difficult to help without looking at the data. What taxonomy are you expecting?

joannakolodz · August 6, 2024, 9:00am

Hi Mike, thanks for respoding so qiuckly!
Here's a link to dropbox directory, where I placed qzv's for the latest try: Dropbox

I'm expecting to find eukaryotic pathogenic organisms (e.g. Entamoeba, Cryptosporidium, Eimeria).

SoilRotifer · August 6, 2024, 4:00pm

Thank you for sharing these. Nothing appears out of the ordinary. It looks like you are detecting everything but Entamoeba. I spot checked a few sequences via BLAST and they appear to match the classifier taxonomy. Can you please provide explicit examples where the taxonomy is different / problematic?

Have you tried this tutorial?

Also, there are two other reference databases you can try using too: GTDB and RDP.

joannakolodz · August 20, 2024, 7:36am

Thank you so much! It seem to be fine right now:)

system · September 20, 2024, 1:37pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.