About 50% unassigned sequences after taxonomy

Alena · August 24, 2024, 6:39am

Hi dear qiime2 team,
I am having trouble with running feature-classifier , where there is a huge amount of 'Unassigned' features.

My system is linux while using qiime2-amplicon and my qiime info is shown below:
`System versions
Python version: 3.8.15
QIIME 2 release: 2024.2
QIIME 2 version: 2024.2.0
q2cli version: 2024.2.0

Installed plugins
alignment: 2024.2.0
composition: 2024.2.0
cutadapt: 2024.2.0
dada2: 2024.2.0
deblur: 2024.2.0
demux: 2024.2.0
diversity: 2024.2.0
diversity-lib: 2024.2.0
emperor: 2024.2.0
feature-classifier: 2024.2.0
feature-table: 2024.2.2
fragment-insertion: 2024.2.0
longitudinal: 2024.2.0
metadata: 2024.2.0
phylogeny: 2024.2.0
quality-control: 2024.2.0
quality-filter: 2024.2.0
rescript: 2024.2.2
sample-classifier: 2024.2.0
taxa: 2024.2.0
types: 2024.2.0
vsearch: 2024.2.0`

My script is as shown below:

time qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path rawdata.txt \
--output-path single-end-demux.qza \
--input-format SingleEndFastqManifestPhred33V2

 time qiime dada2 denoise-single \
    --i-demultiplexed-seqs single-end-demux.qza \
    --p-trim-left 0 \
    --p-trunc-len 0 \
	--p-n-threads 80 \
    --o-representative-sequences dada2-rep-seqs.qza \
    --o-table dada2-table.qza \
    --o-denoising-stats dadaa2-denoising-stats.qza \
	--verbose

qiime feature-classifier classify-consensus-vsearch \
        --i-query phyloseq/rep-seqs-rename.qza \
        --i-reference-reads /mnt/data/database/vsearch/silva-138-99-seqs.qza\
        --i-reference-taxonomy /mnt/data/database/vsearch/silva-138-99-tax.qza \
        --p-threads 120 \
        --o-classification taxonomy-vsearch.qza

time qiime feature-classifier classify-sklearn \
  --i-reads phyloseq/rep-seqs-rename.qza \
  --i-classifier /mnt/data/database/silva/silva-138-99-nb-classifier.qza \
  --p-n-jobs 120 \
  --o-classification taxonomy_sklearn.qza

I've searched in the forum and I tried a few guesses:

As the remaining 50% of the sequences are well classified, previous answer in the forum suggests it usually is not an issue with the classifier the classifier(silva-138-99-nb-classifier.qza pretrained classifier). I also tried to use classify-consensus-vsearch and the results is the same as classify-sklearn with 50% unassigned sequences so it is not an issue with orientation
I tried to blast the unassigned ASVs and most of them are bacteria with more than 95% identity% and some of them with 100% coverage so maybe it is a not non-target DNA issue?

By the way my input data are 16smitags from marine samples and 18smitags also have about 50% unassigned sequences after taxonomy.

My results is as shown below:
classify-sklearn results:

blast results:

Any help will be much appreciated.
Thank you in advance,

Alena

colinbrislawn · August 24, 2024, 2:18pm

Hello Alena,

Welcome to the forums! :qiime2:

Good thinking! Finding 50% unassigned reads usually indicates an orientation issue...

Would you be willing to post your rep-seqs-rename.qza file here so we can take a look?

Alena · August 24, 2024, 3:04pm

Dear Colin:

Thank you very much for your soon reply! Attached is the 18s mitags rep-seqs-rename.qza,sklearn result taxonomy_sklearn.qza and vsearch result classification.qza. I hope it helpful.
rep-seqs-rename.qza (388.1 KB)
taxonomy_sklearn.qza (304.4 KB)
classification.qza (191.7 KB)

colinbrislawn · August 24, 2024, 11:08pm

Thank you for sharing those files!

What happened after dada2-rep-seqs.qza
that lead to the file phyloseq/rep-seqs-rename.qza?

Hopefully one of the mods or staff members can take a look at this and offer advice. I'm booked next week but can follow up after that.

Alena · August 26, 2024, 1:15am

Dear Colin:

Thanks for your reply!

Since the ASV names obtained through the degenosing step are irregular, I manually reannotated the ASV names in dada2-rep-seqs without changing the sequence.

colinbrislawn · August 26, 2024, 5:27am

Okay.

While I'm learning more, what primers did you use?

Alena · August 29, 2024, 12:22pm

Sorry for the late reply, I ran some tests on my 18s data and it took some time... (And my V8V9 mitags are extracted using hmm models so primers are not the same...)

My tests are as follows: I added some 18srrna samples and combined them with the previous dna samples for degnosing. Then I annotated the eukaryotic data with pr2 instead of silva.

The vsearch results showed that some samples could be well classified. However, some are still 50% unclassified (as shown in the figure), and since the extraction process of mitag is all extracted using the same script , I suspect that there is something wrong with the sequencing results of the samples (although I don't know what causes so many sequences to be unannotated)...

But I think there is no problem with my qiime2 process, the problem is before running qiime.

colinbrislawn · August 29, 2024, 1:09pm

Can you share the command you used to do this outside of Qiime2?

system · September 29, 2024, 7:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.