How to distinguish between species with stable but very small differences?

The species below were obtained by the following analysis steps using qiime2, MultiplexedPairedEndBarcodeInSequence+cutadapt+dada2+gg2_classifier. followed the EMP library building scheme to amplify the V4 region. The results could not distinguish between Salmonella enterica B-4212 and Escherichia coli B-1109 in zymo research standards. as follows:

|Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;__;__|59698|
|Bacteria;Bacillota;Bacilli;Bacillales;Bacillaceae;Bacillus;__|56391|
|Bacteria;Bacillota;Bacilli;Lactobacillales;Lactobacillaceae;Limosilactobacillus;Limosilactobacillus fermentum|45390|
|Bacteria;Bacillota;Bacilli;Bacillales;Listeriaceae;Listeria;__|37691|
|Bacteria;Bacillota;Bacilli;Bacillales;Staphylococcaceae;Staphylococcus;__|36847|
|Bacteria;Bacillota;Bacilli;Lactobacillales;Enterococcaceae;Enterococcus;__|24111|
|Bacteria;Pseudomonadota;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;__|11830|

The correct result for the reference is as follows:

|Lactobacillus fermentum |18400|
|Bacillus subtilis |17400|
|Staphylococcus aureus |15500|
|Listeria monocytogenes |14100|
**|Salmonella enterica |10400|**
**|Escherichia coli|10100|**
|Enterococcus faecalis |9900|
|Pseudomonas aeruginosa |4200|

Where marked out are indistinguishable results.
When I chose to randomly select the sequences of the V4 regions of the corresponding species for comparison, I found that the sequences of the two species were very similar, but there were still conserved different sequences. For example the following:

587bp-597bp
Se: GGTTTGTTAAG
Ec: GGTCTGTCAAG
634bp-645bp
Se: CATCTGATACTG
Ec: CATTCGAAACTG
717bp-722bp
Se: GGACGAAG
Ec: GGACAAAG

So I'm asking, can they be distinguished on the basis of these differences? And how? Thanks for the patience of the professionals on this forum!

1 Like

As well I would like to add that these differences are very conservative and I compared thousands of sequences from both species.

Good morning @KonradV,

Distinguishing closely related microbes can be a big challenge. I'm glad you are using positive controls with known composition to validate your pipeline.

I suspect the grouping/misclassification happened in the DADA2 or taxonomy classification step.

First, check on the seqs in DADA2. Can you find separate rep-set sequences for these two bugs?
(Dada2 can resolve single base-pair differences, in theory.)

Second, try another classification method and see if that helps or hurts your results. There are many choices:
https://docs.qiime2.org/2023.9/plugins/available/feature-classifier/

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.