large number of unclassified fungal reads from UNITE

Hello, I have shotgun sequences that, prior to using in QIIME, I mapped to the UNITE database so I could eliminate extra non-fungal sequences that I wasn't planning on using in the analysis. Then I ran through QIIME, but I'm getting a large number of unclassified fungal reads.


I'm not sure what these reads could be, or what to change about my prior steps to actually classify them. Thoughts?

Here are the steps I've taken:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest-pe.tsv
--output-path pe-demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime vsearch join-pairs --i-demultiplexed-seqs pe-demux.qza --o-joined-sequences demux-joined.qza

qiime vsearch dereplicate-sequences --i-sequences pe-demux.qza --o-dereplicated-table derep-table.qza --o-dereplicated-sequences derep-seq.qza

qiime vsearch cluster-features-de-novo --i-table derep-table.qza --i-sequences derep-seq.qza --p-perc-identity 0.97 --o-clustered-table table-clustered.qza --o-clustered-sequences rep-seqs-clustered.qza

qiime feature-classifier classify-sklearn --i-classifier ../../../../UNITE_database/sh_qiime_release_s_10.05.2021/developer/classifier.qza --i-reads derep-seq.qza --o-classification classify-nofilter.qza --p-confidence .97

Hello Rachel,

Welcome to the forums! :qiime2:

Yes, something is up with that classification...

I appreciate the detail you provided about your methods so far. This part is especially strange to me:

After reads have been pre-filtered to match a database, it's pretty strange that they can't be classified against that exact same database!

I wonder if something is going wrong with that pre-filter mapping or maybe something is wrong with the sklearn classifier.

What program did you use to pre-filter?
How did you train the sklearn classifier?

Thanks for your response. I used Geneious to pre-filter. Here is what I did for training:

qiime tools import --type 'FeatureData[Sequence]' --input-path sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta --output-path UNITE_otus.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path sh_taxonomy_qiime_ver8_97_s_10.05.2021_dev.txt --output-path ref-taxonomy.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads UNITE_otus.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza

1 Like

Thank you for telling me more.

Your classifier code looks okay. I've not used Geneious (Assembly and Mapping | Geneious Prime) for filtering but I imagine it's working as intended.

The unite database is targeted at the eukaryotic ITS region, so I wonder if your shotgun reads from full genomes are largely falling outside of this region.

When using Geneious to prefilter, did you use the exact same sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta as the target database or did you use something provided by Geneious? Different database files could cause this issue, which is why I ask.

Ah that could be it! I used the updated UNITE database for the pre-filter and last year's version in QIIME. I'll rerun with the updated database and hope that helps.

1 Like

Using the correct UNITE fasta helped a bit, but still a significant portion is not identified past kingdom Fungi.


I do think you're right that shotgun reads may have an influence on this.

1 Like

Hey, that's good progress!

After updating the pre-filter, what percentage of reads are filtered out because they do not hit UNITE?

Would you like to explore this more? While some plugins focus on amplicons from a specific region, other plugins like q2-shogun are designed for shotgun reads.

1 Like