How Species Level Classification Works

Hello

I am using QIIME2 (initially version 2023.7 and later 2023.9) to analyze a particular set of paired-end MiSeq V3-V4 samples in which a species (i.e. Lactobacillus crispatus) is expected to be present. I used Greengenes2 (via Naive Bayes classifier) and NCBI (following RESCRIPt tutorial) to get the classification down to species level. I then visualize it using taxa barplot. For both databases, when I set the Taxonomic Level to Level 7, I do not find L.crispatus. However, the barplot (when still set to Level 7) for both databases shows a significant portion of classification belonging to k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__.

In the taxonomy file, there are many Feature IDs associated with k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__. I randomly searched a few of these Feature IDs in repseqs file and NCBI Blast-ed them. The results are L. crispatus of different strains.

I wonder:

  1. why L. crispatus is not shown in the classification results
  2. if there is any way I can infer from the taxa barplot which particular Feature ID contribute to the significant portion of k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__ in the barplot? (or is this significant portion an accumulation of all the Feature IDs associated with k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__ ?)

Please let me know if any other information is needed. Thank you!!

Best Regards
Stephanie

(PS: I can't think of a more accurate but short title for my queries.)

Hello!
First of all, I don't think that there is something wrong with your analyses.
The issue is that 16S rRNA amplicons lack the resolution to work at the species level. So, you will get only genus-level annotation for features that can be assigned to different species due to high similarity in targeted sequences. Moreover, many databases are curated only to the genus level.

Please fell free to ask more questions if my answer is incomplete or not covering the topic well enough.

Best,

1 Like

Hi Timanix

Thank you for your reply!

As we are looking specifically at L.crispatus in our research, is there any way I can find out what the bulk of k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__ in the taxa barplot actually consists of (whether it is one dominant Feature ID or an accumulation of Feature IDs that share the similar taxon k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__ in taxonomy.qza)?

Best Regards
Stephanie

Hello again,

Since you are specifically targeting one species the only thing I can think of is to blast targeted sequences to NCBI and add those taxonomies to your taxonomy qza file. I think you should search this forum on how to do it with Rescript plugin.

But be aware of the fact that even V3-V4 region is not reliable source for species level annotations.

All mods/active users, please feel free to join and add your opinion on the topic.

Best,

1 Like

Hello Stephanie,

As we are looking specifically at L.crispatus in our research, is there any way I can find out what the bulk of k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__
in the taxa barplot actually consists of (whether it is one dominant Feature ID or an accumulation of Feature IDs that share the similar taxon k__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;__ in taxonomy.qza)?

You could export the taxonomy and features and then merge them back together.

That should give you a list of all ASVs and the taxonomy of each one. So you can see if that g__Lactobacillus is coming from a single ASV or a whole bunch of them, just like you described!

1 Like

Hi @Stephanie ,

Actually you do not need to export any data to accomplish this — metadata tabulate will do what you need (e.g., to visualize the taxonomy, or to merge and visualize taxonomy, sequences, and/or feature table):
https://docs.qiime2.org/2024.2/tutorials/metadata/#exploring-feature-metadata

2 Likes

Hi all

Thank you for your reply!

Using metadata tabulate on taxonomy.qza and rep-seqs.qza, it is clearer now that the Feature IDs that are g_Lactobacillus are different strains of Lactobacillus crispatus in NCBI Blast.

However, I do not have the information on each Feature ID's frequency (is it found in 1 sample or multiple samples; and in each sample, how many reads). I apologize for not being able to articulate my problem clearly. Please let me try by writing a scenario. When I see g_Lactobacillus in taxa barplot of sample X, Y, and Z with 25%, 50%, and 75% of g_Lactobacillus, respectively, is there any way I can find out that for X, Y, and Z, respectively:

  1. Feature ID abc constitutes 10%, 15%, and 0%
  2. Feature ID def constitutes 15%, 20%, and 0%
  3. Feature ID ghi constitutes 0%, 15%, and 75%
    (Feature ID abc, def, and ghi are classified as g_Lactobacillus in the taxonomy file).

Thank you!

Best Regards
Stephanie

1 Like

Hi @Stephanie ,
You can use that same action to visualize a feature table (with or without merging with other metadata-transformable artifacts, like the taxonomy and reference sequences). You can run qiime feature-table transpose on your feature table, and then pass it as an additional metadata file input to the qiime metadata tabulate action. This will then show you the abundance (as read counts) for each ASV per sample.

Based on your description, you might also want an overall % instead of counts per sample. For this you should:

  1. use qiime feature-table group to group samples together based on some groups (e.g., to sum counts across all samples or to sum together, e.g., different treatment groups).
  2. use qiime feature-table relative-frequency to convert counts to relative frequencies.
  3. use qiime feature-table transpose ...
  4. qiime metadata tabulate ...

You can use the help documentation to check out the options for each of those actions to see which option(s) make sense for your use case.

Good luck!

3 Likes