Not able to find E. coli in my taxonomy file

Hello,

I am analyzing mouse fecal samples. We have sequenced the V3-V4 region of the 16s rRNA and the taxonomic classification has been done based on silva-138-99-tax.qza. The code I used is given below.

1. Importing files into QIIME 2

qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--output-path demux-paired-end.qza \
--input-format PairedEndFastqManifestPhred33V2.

2. Trimming primer sequences using Cutadapt

qiime cutadapt trim-paired \

--i-demultiplexed-sequences demux-paired-end.qza \

--p-front-f CCTACGGGNGGCWGCAG \

--p-front-r GACTACHVGGGTATCTAATCC \

--p-discard-untrimmed --o-trimmed-sequences demux-pe-trimmed.qza \

–verbose.

3. Denoising using DADA2

qiime dada2 denoise-paired \

--i-demultiplexed-seqs demux-pe-trimmed.qza \

--p-trunc-len-f 277 \

--p-trunc-len-r 202 \

--o-table table.qza \

--o-representative-sequences rep-seqs.qza \

--o-denoising-stats denoising-stats.qza \

--verbose --p-n-threads 2

4. Assigning Taxonomy

qiime feature-classifier extract-reads \

--i-sequences silva-138-99-seqs.qza \

--p-f-primer CCTACGGGNGGCWGCAG \

--p-r-primer GACTACHVGGGTATCTAATCC \

--o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes \

--i-reference-reads ref-seqs.qza \

--i-reference-taxonomy silva-138-99-tax.qza \

--o-classifier classifier_naive_bayes.qza
qiime feature-classifier classify-sklearn \

--i-classifier classifier_naive_bayes.qza \

--i-reads rep-seqs.qza \

--o-classification taxonomy.qza

I am not sure if the taxonomy has been assigned properly. I am not able to find E. coli in the taxonomy file. taxonomy.qzv (1.5 MB)

1 Like

Hello and Welcome to the forum!
I am not surprised that you can't find it. E. coli is a species, and even full length 16S rRNA sequence is often not enough for species level annotations. The issue is that E. coli has 16S rRNA sequence that is very similar to a lot of 16S rRNA gene sequences from other species. So most of them are annotated as "g__Escherichia-Shigella", because the classifier fails to differentiate species.

Best,

6 Likes

The 16S rRNA sequence of some species of Escherichia are more similar to species of Shigella than other Escherichia! See PMC5711669 Table 1

2 Likes

Thank you. This clarifies things !

1 Like