Filtering eukaryotes from SILVA database

kevin_SalOrt · January 8, 2025, 7:07pm

I want to remove Eukaryota from SILVA 138, following these instructions i ran:

qiime rescript filter-seqs-length-by-taxon \
    --i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza \
    --i-taxonomy silva-138.1-ssu-nr99-tax.qza \
    --p-labels Archaea Bacteria \
    --p-min-lens 900 1200 \
    --o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza \
    --o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza

But Eukaryota is stiil being assigned with my sequences

SoilRotifer · January 8, 2025, 7:39pm

Hi @kevin_SalOrt,

I'd highly recommend keeping the eukaryotes within the reference database for your classifier. Also, you do not need to make use of filter-seqs-length-by-taxon if you do not want to. Remember, that tutorial is just an example of what you can do. Anyway, if you have some reads that hit those references, and are identified as eukaryotes, you can remove them, see below. These eukaryotic sequences act as "outgroups" or "decoys" to ensure that you are not erroneously assigning these sequences to Bacteria. Remember, reads can be assigned to something they are not simply because they are matching the closest representative within the database.

Once you have classified your reads, you'd follow this approach to remove Eukaryotes and organelle sequences, prior to your downstream analysis.

But if you'd like to remove these sequences prior to making your classifier, you can follow this approach. Again, for purposes of making a classifier I'd strongly suggest you leave the Eukaryote sequences. If your sequences are being identified as eukaryotes, then you likely have contamination... or simply have many eukaryotes within your environment.

kevin_SalOrt · January 17, 2025, 2:41am

Thanks, it was very helpful. So i may remove the contaminant sequences and keep on the remaining steps?

SoilRotifer · January 17, 2025, 2:11pm

Yes. This is a quite common to do, i.e. remove host sequences etc... See the links I provided above.

system · February 17, 2025, 8:12pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.