many unassigned fungal sequences

Hello, I used the ITS5 data and UNITE database to do the analysis but get many unclassified taxa. Here are the steps I took.

  1. The sequencing company provides the following data, one of which has undergone quality control, although the quality of the middle section is still very poor.

    Another type is raw data, which I have tried to use before. The number of feature counts is very small after I set '--p-trunc-len-r 105'

qiime dada2 denoise-paired
--i-demultiplexed-seqs $workdir/1.import_data/demux.qza
--p-trunc-len-f 0 --p-trunc-len-r 105
--o-table dada2-table.qza
--o-representative-sequences dada2-rep-seqs.qza
--o-denoising-stats denoising-stats.qza

before

after

2.I am not sure if the data after my quality control steps are reliable to use, so I use the joined data provided by company.
(1)
qiime quality-filter q-score
--p-min-quality 25
--i-demux demux.qza
--o-filtered-sequences demux-filtered-25.qza
--o-filter-stats demux-filter-stats-25.qza
(2)
qiime dada2 denoise-single
--i-demultiplexed-seqs $workdir/1.import_data/demux-filtered-25.qza
--p-trim-left 0 --p-trunc-len 0
--o-table dada2-table-25.qza
--o-representative-sequences dada2_rep_set-25.qza
--o-denoising-stats dada2-stats-25.qza
(3)
qiime feature-classifier classify-sklearn
--i-classifier unite-ver9-99-classifier-25.07.2023.qza
--i-reads dada2-rep-seqs-30.qza
--o-classification taxonomy-30.qza

And I get many unclassified taxa.

I have used Vsearch before, but the results seem to be the same as follows.

I wonder why there are so few annotated results, and what should I do?

Hi @Alan,
This could be 2 things:

  1. There could be an issue with your upstream steps.

or

  1. These could be "contamination" i.e. not fungi but the unite classifier only knows about fungi so its classifying them all as k_fungi and Unassigned

The easiest way to check this is to find an ASV feature id that is labeled unassigned in your taxonomy and BLAST that using the rep-seqs.qzv.

If that sequence BLASTs as fungi, that indicates an issue with your upstream steps.

If that sequences BLASTs as not fungi, that indicates contamination and I would think about filtering out these unassigned sequences.

I would repeat these steps a couple of times to make sure that whatever you find seems to spot check that this is a pattern in your data and not a outlier.

Hope this helps!
:turtle:

2 Likes

Hi, thanks for your reply.

I ran a BLAST on 1401 representative sequences classified as k_fungi. Without considering query coverage or identity percentage, 889 sequences were labeled as Uncultured fungi or Fungal sp., 104 sequences are identified as fungi with further taxonomic information. 408 sequences were other species categories.

Does this mean I need to adjust the upstream steps?

And I find the taxonomy barplot of Z4 is kind of strange. Z4 has the most sequences after quality control, but most the sequence are only annotated to ‘k__Fungi;p__Fungi_phy_Incertae_sedis;c__Fungi_cls_Incertae_sedis;o__Fungi_ord_Incertae_sedis;f__Fungi_fam_Incertae_sedis;g__Fungi_gen_Incertae_sedis;s__Fungi_sp;__’.

Do I need some other steps to deal with this data?

Someone else was having problems with the unite-ver9-99-classifier-25.07.2023.qza. You could try the Unite v10 pretrained which just came out.

Hi, thanks for your reply.
I have read your discussion and there are some points that I did not understand. I have been using a QIIME 2 image (version 2022.8) provided by others, which does not directly support your pre-trained UNITE 10 classifier (need QIIME2 version 2024.2) . Therefore, I used the latest data from the UNITE, importing sequences and taxonomy with 'qiime tools import', and trained the classifier using 'qiime feature-classifier fit-classifier-naive-bayes'. The classification results were slightly richer than before.

Does the discussion in this post indicate that different versions of QIIME 2 does not significantly affect species classification?

1 Like

Hello Alan,

Yeah, that small test showed no changes between Qiime2 versions as long as the database was the same. That makes sense as the scikit-learn version was the same in those different versions of Qiime2. I'm not sure what happens when you change the scikit-learn version.


Let's zoom out!

All the taxonomy assignment methods, in qiime2-2022.8 and 2024.2, in scikit-learn and vsearch, are all trying to do the same thing.

It makes sense that results are similar because that means all their results are similarly good!

When the curators add taxa to a database, results could have more annotations, so this makes sense. But if all the classifiers agree on those unclassified reads, maybe something else is going on!

Yes!

1 Like