suggestion for accurate species identification


I have a few mixed DNA samples that I need to process to identify species present there as accurate as possible. Therefore, I decided to not do v3/v4 16S amplification and rather 16S full gene sequencing. I wanted to ask, would it be more convenient that I proceed with 16S full gene amplification and sequencing or that I simply send the DNA samples for PacBio full genome sequencing, and from there see if I can find different 16S full gene copies that may be associated to different bacteria?

1 Like

Hi @rosave,

Hope I understood your questions, if I didn't please let me know :slight_smile:

The answer to your question would really depend on your proposed method, aim, and available resources. What is your proposed method for full 16S amplification/sequencing? Target amplicon sequencing and metagenomic sequencing both are useful but each have their own strengths and weaknesses. Amplicon sequencing is still cheaper than metagenomic sequencing and processing that data is a lot more standardized (imo anyways), but metagenomic sequencing can give you higher resolution classification and of course you also get information about other life domains (fungi, some viruses, archaea, etc.) with metagenomic sequencing and depending on your final depth some functional gene information as well. If you do go with metagenomic sequencing, you may not want to only focus on the 16S gene for classification as you would be dismissing a lot of other useful information and still not get as high resolution as the full genome, you can try alignment methods or building some MAGs instead.


Hi @rosave,
do you have any insight on the number of species in your samples?
I would consider PacBio sequencing for samples with diversity ranging form very low to low, because otherwise you may need good coverage and its very high associated costs. For the PacBio, you probably need to ask for the new HiFi sequencing type, which includes a built-in error correction step. Otherwise, you may have situation where the sequencing error rate is higher than the differences of very close species, and taxonomy assignment as well as assembly step may be in trouble ... This situation may still be occur a bit but the new kit is the best we have to limit that!

Are you in touch with a sequencing facility? Do they have any experiences in processing bacterial population samples, either full16S or genomes? What can they offer to you?

To give you an idea, I work in a facility and I do many of this type of analysis we came across. We have Illumina and PacBio. However, we tend to not suggest PacBio as first instance, because in most of the cases would be not cost effective. So, I do not have personal experience on the limits on taxonomy resolution in this cases. I think your statement 'as accurate as possible' it may have a huge associated cost, and it will depend on how much rigid you are on this given your experimental question.

As references of what you may expect, the attached is a poster from PacBio to showcase HiFi sequences:

That is still after considering all the possible pro/cons very well explained by @Mehrbod_Estaki!



Hi @rosave,

I'd like to add onto @Mehrbod_Estaki and @llenzi's awesome advice. The answer to this question may also depend on what environment you're working in and if you're looking for something specific. For example, if you're doing vaginal samples or oral samples, there are specific curated databases that provide relatively reliable species-level annotation. A lot of more general databases don't provide this level of resolution or reliability, and so your bioinformatics may not be able to resolve species even if your sequencing does.

Additionally, there are some organisms that are hard to resolve in terms of 16S. Shigella and Escherichia are famously similar genera that are super hard to distinguish 16S wise.



Hi @Mehrbod_Estaki
thanks for your reply! real useful to think about. I think as of right now this is what we have decided on doing:
I'm looking at using the Qiagen PowerLyser kit to isolate DNA from these catheter samples, and then use PacBio recommended primers to amplify the full 16S gene and then sending these for PacBio sequencing at 10K depth if possible with SMRT cells. I think a next step would be metagenomic sequencing, but for this initial testing my professors are just interested in amplicon sequencing.

1 Like

Hi @llenzi
Thanks for your reply! We really don't have an approx on # of species present. I could definitely check on the HiFi sequencing type, I do have a contact in PacBio im currently talking with. That's a great suggestion thank you!

Hi @jwdebelius
That's a great idea, I'll probably look for curated databases for human microbiome and pathogens - do you have any suggestions? That's good to keep in mind. I guess it would be reasonable of me to say that with this 16S gene full PacBio sequencing at around 10K depth I may be able to see about 60% of species diversity at best give or take?