Hi dear qiime2 team,
I am having trouble with running feature-classifier
, where there is a huge amount of 'Unassigned' features.
My system is linux while using qiime2-amplicon and my qiime info is shown below:
`System versions
Python version: 3.8.15
QIIME 2 release: 2024.2
QIIME 2 version: 2024.2.0
q2cli version: 2024.2.0
Installed plugins
alignment: 2024.2.0
composition: 2024.2.0
cutadapt: 2024.2.0
dada2: 2024.2.0
deblur: 2024.2.0
demux: 2024.2.0
diversity: 2024.2.0
diversity-lib: 2024.2.0
emperor: 2024.2.0
feature-classifier: 2024.2.0
feature-table: 2024.2.2
fragment-insertion: 2024.2.0
longitudinal: 2024.2.0
metadata: 2024.2.0
phylogeny: 2024.2.0
quality-control: 2024.2.0
quality-filter: 2024.2.0
rescript: 2024.2.2
sample-classifier: 2024.2.0
taxa: 2024.2.0
types: 2024.2.0
vsearch: 2024.2.0`
My script is as shown below:
time qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path rawdata.txt \
--output-path single-end-demux.qza \
--input-format SingleEndFastqManifestPhred33V2
time qiime dada2 denoise-single \
--i-demultiplexed-seqs single-end-demux.qza \
--p-trim-left 0 \
--p-trunc-len 0 \
--p-n-threads 80 \
--o-representative-sequences dada2-rep-seqs.qza \
--o-table dada2-table.qza \
--o-denoising-stats dadaa2-denoising-stats.qza \
--verbose
qiime feature-classifier classify-consensus-vsearch \
--i-query phyloseq/rep-seqs-rename.qza \
--i-reference-reads /mnt/data/database/vsearch/silva-138-99-seqs.qza\
--i-reference-taxonomy /mnt/data/database/vsearch/silva-138-99-tax.qza \
--p-threads 120 \
--o-classification taxonomy-vsearch.qza
time qiime feature-classifier classify-sklearn \
--i-reads phyloseq/rep-seqs-rename.qza \
--i-classifier /mnt/data/database/silva/silva-138-99-nb-classifier.qza \
--p-n-jobs 120 \
--o-classification taxonomy_sklearn.qza
I've searched in the forum and I tried a few guesses:
-
As the remaining 50% of the sequences are well classified, previous answer in the forum suggests it usually is not an issue with the classifier the classifier(silva-138-99-nb-classifier.qza pretrained classifier). I also tried to use
classify-consensus-vsearch
and the results is the same asclassify-sklearn
with 50% unassigned sequences so it is not an issue with orientation -
I tried to blast the unassigned ASVs and most of them are bacteria with more than 95% identity% and some of them with 100% coverage so maybe it is a not non-target DNA issue?
By the way my input data are 16smitags from marine samples and 18smitags also have about 50% unassigned sequences after taxonomy.
My results is as shown below:
classify-sklearn results:
blast results:
Any help will be much appreciated.
Thank you in advance,
Alena