Hi everyone!
I am working with Silva138 16S rRNA database and I have found something that just left me dumbstruck. When I extracted the taxonomy file of my samples, I found Eukaryote domain in it. This make no sensed, because as far as I know, the 16S rRNA gene codifies the 16S rRNA small subunit of prokaryotic ribosomal (Archaea and Bacteria domains). In Eukaryote is the 18S rRNA gene.
I was running the sidle preparation database step. My first step was downloading the Silva 138 database file with the RESCRIPT get-silva-data command. After that, I just run the next filtering steps:
DB filtering steps
remove sequences with > 5 degenerates nucleotides
qiime rescript cull-seqs
--i-sequences silva-138-99-seqs.qza
--p-num-degenerates 5
--p-n-jobs 4
--o-clean-sequences silva-138-99-degen-seqs.qza
filtering sequences
qiime taxa filter-seqs
--i-sequences silva-138-99-degen-seqs.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "p__;,k__;,d__Archaea,d__Eukaryota;,f__mitochondria;,c__Chloroplast;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-abril22.qza
But the one-step-filter didn't work, so I tried the next process:
filter taxonomy
echo "remove empty kingdoms"
qiime taxa filter-seqs
--i-sequences silva-138-99-degen-seqs.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "k__;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
echo "remove empty phylum"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "p__;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
echo "remove eukaryote"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "d__Eukaryota;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
echo "remove archaea"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "d__Archaea;"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
echo "remove mitochondria "
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "f__mitochondria;,Mitochondria"
--p-mode contains
--o-filtered-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
mkdir separated_filters/
echo "remove chloroplasts "
echo "al ser el último filtro, se copia el filtered completo .qza a la carpeta separated_filters/"
qiime taxa filter-seqs
--i-sequences silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
--i-taxonomy silva-138-99-tax.qza
--p-exclude "c__Chloroplast;"
--p-mode contains
--o-filtered-sequences separated_filters/silva-138-99-filtered-bacteria-def-sequences-tax-exclude-mitochondria-chloroplast-metazoa-separated-abril22.qza
Following steps will be the preparation of my regional database to explore my data correctly, as it explains in the preparation database step on sidle documents. And, the final output of all pipeline is the extraction of taxonomy.
I have also read the filtering data tutorial on qiime docs. But, right now, I have run out of ideas .
Thank you for taking the time to read me!
Best,
Elsa