About 50% unassigned sequences after taxonomy

Hi dear qiime2 team,
I am having trouble with running feature-classifier , where there is a huge amount of 'Unassigned' features.

My system is linux while using qiime2-amplicon and my qiime info is shown below:
`System versions
Python version: 3.8.15
QIIME 2 release: 2024.2
QIIME 2 version: 2024.2.0
q2cli version: 2024.2.0

Installed plugins
alignment: 2024.2.0
composition: 2024.2.0
cutadapt: 2024.2.0
dada2: 2024.2.0
deblur: 2024.2.0
demux: 2024.2.0
diversity: 2024.2.0
diversity-lib: 2024.2.0
emperor: 2024.2.0
feature-classifier: 2024.2.0
feature-table: 2024.2.2
fragment-insertion: 2024.2.0
longitudinal: 2024.2.0
metadata: 2024.2.0
phylogeny: 2024.2.0
quality-control: 2024.2.0
quality-filter: 2024.2.0
rescript: 2024.2.2
sample-classifier: 2024.2.0
taxa: 2024.2.0
types: 2024.2.0
vsearch: 2024.2.0`

My script is as shown below:

time qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path rawdata.txt \
--output-path single-end-demux.qza \
--input-format SingleEndFastqManifestPhred33V2

 time qiime dada2 denoise-single \
    --i-demultiplexed-seqs single-end-demux.qza \
    --p-trim-left 0 \
    --p-trunc-len 0 \
	--p-n-threads 80 \
    --o-representative-sequences dada2-rep-seqs.qza \
    --o-table dada2-table.qza \
    --o-denoising-stats dadaa2-denoising-stats.qza \
	--verbose

qiime feature-classifier classify-consensus-vsearch \
        --i-query phyloseq/rep-seqs-rename.qza \
        --i-reference-reads /mnt/data/database/vsearch/silva-138-99-seqs.qza\
        --i-reference-taxonomy /mnt/data/database/vsearch/silva-138-99-tax.qza \
        --p-threads 120 \
        --o-classification taxonomy-vsearch.qza

time qiime feature-classifier classify-sklearn \
  --i-reads phyloseq/rep-seqs-rename.qza \
  --i-classifier /mnt/data/database/silva/silva-138-99-nb-classifier.qza \
  --p-n-jobs 120 \
  --o-classification taxonomy_sklearn.qza  

I've searched in the forum and I tried a few guesses:

  1. As the remaining 50% of the sequences are well classified, previous answer in the forum suggests it usually is not an issue with the classifier the classifier(silva-138-99-nb-classifier.qza pretrained classifier). I also tried to use classify-consensus-vsearch and the results is the same as classify-sklearn with 50% unassigned sequences so it is not an issue with orientation

  2. I tried to blast the unassigned ASVs and most of them are bacteria with more than 95% identity% and some of them with 100% coverage so maybe it is a not non-target DNA issue?

By the way my input data are 16smitags from marine samples and 18smitags also have about 50% unassigned sequences after taxonomy.

My results is as shown below:
classify-sklearn results:

blast results:

Any help will be much appreciated.
Thank you in advance,

Alena

Hello Alena,

Welcome to the forums! :qiime2:

Good thinking! Finding 50% unassigned reads usually indicates an orientation issue...

Would you be willing to post your rep-seqs-rename.qza file here so we can take a look?

Dear Colin:

Thank you very much for your soon reply! Attached is the 18s mitags rep-seqs-rename.qza,sklearn result taxonomy_sklearn.qza and vsearch result classification.qza. I hope it helpful.
rep-seqs-rename.qza (388.1 KB)
taxonomy_sklearn.qza (304.4 KB)
classification.qza (191.7 KB)

1 Like

Thank you for sharing those files!

What happened after dada2-rep-seqs.qza
that lead to the file phyloseq/rep-seqs-rename.qza?

Hopefully one of the mods or staff members can take a look at this and offer advice. I'm booked next week but can follow up after that.

Dear Colin:

Thanks for your reply!

Since the ASV names obtained through the degenosing step are irregular, I manually reannotated the ASV names in dada2-rep-seqs without changing the sequence.

1 Like

Okay.

While I'm learning more, what primers did you use?

Sorry for the late reply, I ran some tests on my 18s data and it took some time... (And my V8V9 mitags are extracted using hmm models so primers are not the same...)

My tests are as follows: I added some 18srrna samples and combined them with the previous dna samples for degnosing. Then I annotated the eukaryotic data with pr2 instead of silva.

The vsearch results showed that some samples could be well classified. However, some are still 50% unclassified (as shown in the figure), and since the extraction process of mitag is all extracted using the same script , I suspect that there is something wrong with the sequencing results of the samples (although I don't know what causes so many sequences to be unannotated)...


But I think there is no problem with my qiime2 process, the problem is before running qiime.

1 Like

Can you share the command you used to do this outside of Qiime2?