Topics: over70 % of reads are assigned as d__Bacteria;;;;;; qiime feature-classifier classify-sklearn
Hello Qiime2 team:
I try to analyze sequencing results from several body fluid samples using 16S V1+V2 region (68F:TAACACATGCAAGTCRACTYGA/338R:GCTGCCTCCCGTAGGAGT) using qiime2-amplicon-2024.2 on a Ubuntu server. The paired-end reads were filtered and merged using
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux_paired-end.qza
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 150
--p-trunc-len-r 150 \
The output was then filtered to remove non-16S seq and reads with lengh < 230bp
remove non-16S
qiime quality-control exclude-seqs
--i-query-sequences rep-seqs-dada2.qza
--i-reference-sequences silva_otus.qza
--p-method blast
--p-perc-identity 0.6
--p-perc-query-aligned 0.9
--p-threads 12
--o-sequence-hits Silva-blast-hits-60-90.qza
--o-sequence-misses Silva-blast-misses-60-90.qza
--verbose
qiime feature-table filter-features
--i-table table-dada2.qza
--m-metadata-file Silva-blast-misses-60-90.qza
--o-filtered-table no-Silva-blast-misses-table-60-90-dada2.qza
--p-exclude-ids
cp Silva-blast-hits-60-90.qza rep-seqs.qza
cp no-Silva-blast-misses-table-60-90-dada2.qza table.qza
#filter rep-seqs.qza based on length
qiime feature-table filter-seqs
--i-data rep-seqs.qza
--m-metadata-file rep-seqs.qza
--p-where 'length(sequence) > 230'
--o-filtered-data rep-seqs-over230.qza
qiime feature-table filter-features
--i-table table-dada2.qza
--m-metadata-file rep-seqs-less230.qza
--o-filtered-table no-Silva-blast-misses-table-60-90-less230-dada2.qza
--p-exclude-ids
cp no-Silva-blast-misses-table-60-90-less230-dada2.qza table.qza
taxonomy assignment
qiime feature-classifier classify-sklearn
--i-classifier sliva-138.1-ssu-nr99-V1-V2-classifier.qza
--i-reads rep-seqs-over230.qza
--p-read-orientation auto
--o-classification silva-taxonomy-138-99-V1-V2-over230.qza
qiime metadata tabulate
--m-input-file silva-taxonomy-138-99-V1-V2-over230.qza
--o-visualization silva-taxonomy-138-99-V1-V2-over230.qzv
qiime feature-table filter-features
--i-table table.qza
--m-metadata-file silva-taxonomy-138-99-V1-V2-over230.qza
--o-filtered-table id-filtered-table.qza
qiime taxa barplot
--i-table id-filtered-table.qza
--i-taxonomy silva-taxonomy-138-99-V1-V2-over230.qza
--m-metadata-file sample-metadata.tsv
--o-visualization silva-taxa-bar-plots-138-99-V1-V2-over230.qzv
The result shows that over 70% of the reads could not be detailed assigned after level_1 and shows d__Bacteria;;;;;;
Is there any possible reason for this? Any input will be highly appreciated.
rep-seqs-over230.qza, table.qza and sample-metadata.tsv file are uploaded in attachment files for evaluation.
Best,
Jrhau Lung
table.qza (243.4 KB)
rep-seqs-over230.qza (57.4 KB)
sample-metadata.tsv (325 Bytes)