silva138 16S database contains eukaryota after filtering

Hi everyone!

I am working with Silva138 16S rRNA database and I have found something that just left me dumbstruck. When I extracted the taxonomy file of my samples, I found Eukaryote domain in it. This make no sensed, because as far as I know, the 16S rRNA gene codifies the 16S rRNA small subunit of prokaryotic ribosomal (Archaea and Bacteria domains). In Eukaryote is the 18S rRNA gene.

I was running the sidle preparation database step. My first step was downloading the Silva 138 database file with the RESCRIPT get-silva-data command. After that, I just run the next filtering steps:

DB filtering steps

remove sequences with > 5 degenerates nucleotides

qiime rescript cull-seqs
--i-sequences silva-138-99-seqs.qza
--p-num-degenerates 5
--p-n-jobs 4
--o-clean-sequences silva-138-99-degen-seqs.qza

filtering sequences

qiime taxa filter-seqs
--i-sequences silva-138-99-degen-seqs.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "p__;,k__;,d__Archaea,d__Eukaryota;,f__mitochondria;,c__Chloroplast;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-abril22.qza

But the one-step-filter didn't work, so I tried the next process:

filter taxonomy

echo "remove empty kingdoms"
qiime taxa filter-seqs
--i-sequences silva-138-99-degen-seqs.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "k__;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

echo "remove empty phylum"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "p__;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

echo "remove eukaryote"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "d__Eukaryota;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

echo "remove archaea"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "d__Archaea;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

echo "remove mitochondria "
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "f__mitochondria;,Mitochondria"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

mkdir separated_filters/

echo "remove chloroplasts "
echo "al ser el último filtro, se copia el filtered completo .qza a la carpeta separated_filters/"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "c__Chloroplast;"
--p-mode contains
--o-filtered-sequences separated_filters/silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza

Following steps will be the preparation of my regional database to explore my data correctly, as it explains in the preparation database step on sidle documents. And, the final output of all pipeline is the extraction of taxonomy.

I have also read the filtering data tutorial on qiime docs. But, right now, I have run out of ideas :frowning:.

Thank you for taking the time to read me!

Best,

Elsa

Hi @elsamdea,

Silva has two reference databases, the Small Sub-unit (SSU; 16S & 18S) and the Large Sub-unit (LSU; 23S & 28S). The reasons are outlined in the following posts:

-Mike

Thank you so much! :slight_smile: I am going to read these two post!!

And sorry for the inconvenience!

Best,

Elsa