Hello, I have shotgun sequences that, prior to using in QIIME, I mapped to the UNITE database so I could eliminate extra non-fungal sequences that I wasn't planning on using in the analysis. Then I ran through QIIME, but I'm getting a large number of unclassified fungal reads.
I'm not sure what these reads could be, or what to change about my prior steps to actually classify them. Thoughts?
Here are the steps I've taken:
qiime tools import
qiime vsearch join-pairs --i-demultiplexed-seqs pe-demux.qza --o-joined-sequences demux-joined.qza
qiime vsearch dereplicate-sequences --i-sequences pe-demux.qza --o-dereplicated-table derep-table.qza --o-dereplicated-sequences derep-seq.qza
qiime vsearch cluster-features-de-novo --i-table derep-table.qza --i-sequences derep-seq.qza --p-perc-identity 0.97 --o-clustered-table table-clustered.qza --o-clustered-sequences rep-seqs-clustered.qza
qiime feature-classifier classify-sklearn --i-classifier ../../../../UNITE_database/sh_qiime_release_s_10.05.2021/developer/classifier.qza --i-reads derep-seq.qza --o-classification classify-nofilter.qza --p-confidence .97
Welcome to the forums!
Yes, something is up with that classification...
I appreciate the detail you provided about your methods so far. This part is especially strange to me:
After reads have been pre-filtered to match a database, it's pretty strange that they can't be classified against that exact same database!
I wonder if something is going wrong with that pre-filter mapping or maybe something is wrong with the sklearn classifier.
What program did you use to pre-filter?
How did you train the sklearn classifier?
Thanks for your response. I used Geneious to pre-filter. Here is what I did for training:
qiime tools import --type 'FeatureData[Sequence]' --input-path sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta --output-path UNITE_otus.qza
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path sh_taxonomy_qiime_ver8_97_s_10.05.2021_dev.txt --output-path ref-taxonomy.qza
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads UNITE_otus.qza --i-reference-taxonomy ref-taxonomy.qza --o-classifier classifier.qza
Thank you for telling me more.
Your classifier code looks okay. I've not used Geneious (Assembly and Mapping | Geneious Prime) for filtering but I imagine it's working as intended.
The unite database is targeted at the eukaryotic ITS region, so I wonder if your shotgun reads from full genomes are largely falling outside of this region.
When using Geneious to prefilter, did you use the exact same
sh_refs_qiime_ver8_97_s_10.05.2021_dev.fasta as the target database or did you use something provided by Geneious? Different database files could cause this issue, which is why I ask.
Ah that could be it! I used the updated UNITE database for the pre-filter and last year's version in QIIME. I'll rerun with the updated database and hope that helps.
Using the correct UNITE fasta helped a bit, but still a significant portion is not identified past kingdom Fungi.
I do think you're right that shotgun reads may have an influence on this.
Hey, that's good progress!
After updating the pre-filter, what percentage of reads are filtered out because they do not hit UNITE?
Would you like to explore this more? While some plugins focus on amplicons from a specific region, other plugins like q2-shogun are designed for shotgun reads.