Best sequence database for classifying 16s v4 sequences of human stool samples

I agree with @jwdebelius, it depends on your questions, and the resolution by which you analyze your data. I also prefer to analyze via ASVs, as they will not change... but the taxonomy assigned to them might change at a later date given the ever changing world of microbial taxonomy. The taxonomy will also vary between databases depending on the taxonomic schema, and nomenclatural rules they decided to follow.

Some other thoughts ...

  1. I became aware that RDP is no longer funded, and is likely not to going to be maintained much longer... at least not regularly. I am sure someone else closer to the matter can comment on this.
  2. GTDB also follows the philosophy of providing a phylogenetically consistent taxonomy.
    • Fundamentally, I really like the approach that Greengenes2 and GTDB are taking, even if it means that taxonomic labels will be in flux for a while.
  3. Some databases might require further curation.
  4. I do not necessarily trust species-level classifications for such short SSU reads. Many are likely mis- or over-classifications. But your mileage may vary.
  5. Keep in mind, that choosing the proper primer-pair / variable region can be just as important in obtaining accurate taxonomy. There are many papers out there that discuss which primer pairs are ideal for disambiguating taxa that are common for given sample types.
  6. You can compare across classifiers / reference databases, and use tools like RESCRIPt to compare them.
5 Likes