large number of unclassified fungal reads from UNITE

Rachel_Keuler · November 2, 2022, 5:17pm

Hello, I have shotgun sequences that, prior to using in QIIME, I mapped to the UNITE database so I could eliminate extra non-fungal sequences that I wasn't planning on using in the analysis. Then I ran through QIIME, but I'm getting a large number of unclassified fungal reads.

I'm not sure what these reads could be, or what to change about my prior steps to actually classify them. Thoughts?

Rachel_Keuler · November 2, 2022, 6:02pm

Here are the steps I've taken:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest-pe.tsv
--output-path pe-demux.qza
--input-format PairedEndFastqManifestPhred33V2

qiime vsearch join-pairs --i-demultiplexed-seqs pe-demux.qza --o-joined-sequences demux-joined.qza

qiime vsearch dereplicate-sequences --i-sequences pe-demux.qza --o-dereplicated-table derep-table.qza --o-dereplicated-sequences derep-seq.qza

qiime vsearch cluster-features-de-novo --i-table derep-table.qza --i-sequences derep-seq.qza --p-perc-identity 0.97 --o-clustered-table table-clustered.qza --o-clustered-sequences rep-seqs-clustered.qza

qiime feature-classifier classify-sklearn --i-classifier ../../../../UNITE_database/sh_qiime_release_s_10.05.2021/developer/classifier.qza --i-reads derep-seq.qza --o-classification classify-nofilter.qza --p-confidence .97

colinbrislawn · November 3, 2022, 1:44am

Hello Rachel,

Welcome to the forums!

Yes, something is up with that classification...

I appreciate the detail you provided about your methods so far. This part is especially strange to me:

After reads have been pre-filtered to match a database, it's pretty strange that they can't be classified against that exact same database!

I wonder if something is going wrong with that pre-filter mapping or maybe something is wrong with the sklearn classifier.

What program did you use to pre-filter?
How did you train the sklearn classifier?

Rachel_Keuler · November 3, 2022, 2:32am

Thanks for your response. I used Geneious to pre-filter. Here is what I did for training:

qiime tools import --type 'FeatureData[Sequence]' --input-path sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta --output-path UNITE_otus.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path sh_taxonomy_qiime_ver8_97_s_10.05.2021_dev.txt --output-path ref-taxonomy.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads UNITE_otus.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza

colinbrislawn · November 3, 2022, 2:43am

Thank you for telling me more.

Your classifier code looks okay. I've not used Geneious (Assembly and Mapping | Geneious Prime) for filtering but I imagine it's working as intended.

The unite database is targeted at the eukaryotic ITS region, so I wonder if your shotgun reads from full genomes are largely falling outside of this region.

When using Geneious to prefilter, did you use the exact same sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta as the target database or did you use something provided by Geneious? Different database files could cause this issue, which is why I ask.

Rachel_Keuler · November 3, 2022, 4:10pm

Ah that could be it! I used the updated UNITE database for the pre-filter and last year's version in QIIME. I'll rerun with the updated database and hope that helps.

Rachel_Keuler · November 4, 2022, 12:21am

Using the correct UNITE fasta helped a bit, but still a significant portion is not identified past kingdom Fungi.

I do think you're right that shotgun reads may have an influence on this.

colinbrislawn · November 4, 2022, 2:00pm

Hey, that's good progress!

After updating the pre-filter, what percentage of reads are filtered out because they do not hit UNITE?

Would you like to explore this more? While some plugins focus on amplicons from a specific region, other plugins like q2-shogun are designed for shotgun reads.

system · December 5, 2022, 8:01pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.