High Proportion of Unclassified Eukaryotic Sequences in 18S rRNA Analysis

Hello, QIME forum,
I am currently working on 18S rRNA analysis and encountered an issue where 21% of the taxa in my bar plot are assigned to d__Eukaryota;;;;;;****. I am concerned that such a large proportion of ASVs are classified only at the domain level (Eukaryota). Is this a common occurrence? Should I be worried that my classifier or approach isn't capturing more specific taxonomic levels?

Here are the steps I followed in my analysis:

DADA2 denoising for paired-end reads

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trunc-len-f 209
--p-trunc-len-r 174
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

qiime metadata tabulate
--m-input-file denoising-stats.qza
--o-visualization denoising-stats.qzv


2. **Taxonomy Assignment:** I trained my 18S classifier using **RESCRIPt** and the SILVA 138.1 reference database. Here’s how I assigned taxonomy:
# Taxonomy classification using pre-trained classifier
qiime feature-classifier classify-sklearn \
  --i-classifier silva-138.1-ssu-nr99-515f-806r-classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza
  1. Filtering and Visualizing: I also filtered out rare ASVs (minimum frequency = 2) and retained only eukaryotic sequences:

qiime feature-table filter-features
--i-table table.qza
--p-min-frequency 2
--o-filtered-table filtered-table.qza

Filter by taxonomy to include only Eukaryota

qiime taxa filter-table
--i-table filtered-table.qza
--i-taxonomy taxonomy.qza
--p-include Eukaryota
--o-filtered-table eukaryotic-table.qza


qiime taxa barplot \
  --i-table eukaryotic-table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization taxa-bar-plots.qzv

Despite following these steps, most ASVs remain unclassified beyond the domain level (Eukaryota). Is there something I should adjust, or is this outcome expected when working with environmental samples using 18S rRNA? I’ve attached my bar plots and other relevant files for reference.

Any insights or suggestions would be highly appreciated!
Best,
Namraj
eukaryotic-table.qza (501.0 KB)
rep-seqs.qza (437.9 KB)
taxa-bar-plots.qzv (1.3 MB)
taxonomy.qza (508.3 KB)

1 Like

Hi @Namraj_Jaishi,

Looking at your QZV file, this result looks relatively normal. Especially, as there are not as may eukaryotic reference sequences as there are for bacteria & archaea. Have you run BLAST on any of these sequences?

I spot-checked a few via online BLAST, and I found that these unclassified Eukaryotic sequences might be noise / unknown off-targets, as they had very low query coverage (72% - 80%) matches to anything in GenBank. I used "Highly similar sequences (megablast)" and then checked to exclude "Uncultured/environmental sample sequences".

Although it depends on what you're after, it is common practice to remove any sequences that do not have at least a phylum level taxonomy.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.