Species-level classification

gregcaporaso · August 3, 2022, 4:49pm

Hi @Soyoung_Yeo,
With 16S sequencing it's not always possible to get species-level resolution. The resolution of 16S amplicon sequencing is typically considered reliable at the family or genus level, and species level assignment is sometimes obtainable. This limitation is inherent in the approach - specifically there is not always variation in the short amplicon sequence at the species level (e.g., all members of a genus often have the same 16S sequence for the short fragments that we sequence).

That said, one thing you can do to try to improve resolution of your classification is to use environment-weighted taxonomy classifiers. These are discussed in this paper, and you can find a tutorial here. Note that the readytowear project provides weights that can be used to train classifiers for different environment. That is the approach that I would recommend in response to your Q2. This approach doesn't get around the limited information in the sequences, but includes additional externally derived information about what organisms are most likely to be found in the environment that you're working in.

In response to your Q1, the species-level assignments that you're getting from BLAST against NCBI are not reliable species-level classifications. Those are showing the closest matches in the NCBI database that you're using, but since that search isn't designed for assigning taxonomy to amplicon sequences it isn't going to give you partial assignments with associated confidence scores at different taxonomic levels. In the BLAST results that you shared, this is illustrated by the fact that there are two nearly identical quality matches (the first and the third matches) that are associated with different Blautia species. The way to interpret that is that you can have confidence in the Blautia (genus) assignment, but the sequence is ambiguous at the species level.

I hope this helps!