Multiple entries for the Same Taxa and How do I retrive the corresponding Fasta sequence for each taxa

Dear All,
I have two questions.

  1. I downloaded csv file from taxonomy.qzv visualization. I got the feature ID , taxon and confidence. Here i Have a doubt. When I check for the taxa information, I found that same taxa presents more than one times with different feature ID and different confidence. Why is it so?

  2. I downloaded .csv from taxa-bar-plots.qzv. How to get the fasta sequences of corresponding taxa?

You are classifying ASVs. Just because two different ASVs (or even OTUs) have different sequences does not mean that they belong to different taxa. Different strains of the same species can have slightly different 16S sequences; even the same strain can have multiple 16S copies with very slightly different sequences. In your example, you are showing genus-level classifications. These different ASVs are all classified to the same genus — they could be different species in that genus, or different strains of the same species, or different copies of the same species! So overlapping taxonomic classifications is never a surprise.

Check out this tutorial. Better yet, try this command:

qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --m-input-file rep-seqs.qza \
  --o-visualization taxonomy.qzv

That will paste together your taxonomy and representative sequences file into one, for easy comparison.

Good luck!


Dear Nicholas,
Thank you for your explanation.

I used

"Silva 132 99% OTUs full-length sequences" for the taxonomic analysis. But still I got the genus level information. How can I achieve species level information?

Thank you. I got the corresponding fasta sequence. I have doubt in next step. I viewed taxa-bar-plots.qzv and downloaded meta file. In that taxa, I got one hit as prevotella. When I went back to get the corresponding fasta sequence (to blast) in taxonomy.qzv, there are many entry as D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Bacteroidales;D_4__Prevotellaceae;D_5__Prevotella 9;.

How do I know which entry is the appropriate one?

I hope I presented my doubt in clearer way

You do have species-level classifications in there. For example, look at row 10. Anything with a “D_6__” label on it is species level (including things like “D_6__uncultured bacterium” that are the species labels in SILVA).

Many of these classifications are shallower than species level, however (anywhere there is an empty label like “" instead of a "D_6”).

That is intended behavior. When species-level classifications are not achieved, it means that the query sequence could not be reliably classified to species level, e.g., because it is equidistant to two or more different species (or genera or families, etc). So no species classification is given — this is preferable to, e.g., giving a species-level classification that has a 50% or greater chance of being wrong. You can change the confidence parameter to alter this behavior, and read this for more details and parameter recommendations.

This is also normal. You are effectively looking at two different types of data.

When you build the barplots, all features get collapsed by taxonomy labels at the taxonomic level that you are viewing. So all Prevotella get lumped together as one bar when you are looking at genus level.

These are still distinct ASVs, though, so appear as separate entries in the fasta, in the taxonomy classifications, in the feature table, etc. Because the sequences are distinct (perhaps by as little as 1 nt) but they all receive the same genus classification.

Does that make sense?

I hope that helps!

Thank you for the explanation. My last doubt is
For example, In Bifidobacterium genus, I got 10 different species with high confidence (ranging from 0.7 to 0.9)
If I wanna do wet lab confirmation for species level, Can I proceed with any one taxa with species-level information (which has the highest confidence)

How confidently can I proceed?

Hi @steffi,
Could you please elaborate? I am not 100% sure what you are trying to do.

Do you want to confirm that that species is present, or confirm the relative abundance? If the former I expect this will be driven by your biological questions (e.g., confirm/quantify the most important species). If the latter, QPCR or similar would be better for quantifying an important species, anyway.

What are the species?

Really sry for the late reply.

I wanna confirm that species is present by designing a species specific primers followed by PCR.

For example, in Streptococcus, i am getting following species:
(1)D_6__Streptococcus sp. DP34
(2)D_6__Streptococcus thermophilus TH1435
(3)D_6__Streptococcus sp. DP34
(4)D_6__Streptococcus sinensis
(5)D_6__Streptococcus anginosus subsp. anginosus

But in the taxa-plot, I got only genus level info (Streptococcus). How can I refer back and come to conclusion at species level.

The barplot should present the same information if you are viewing at species level. The barplot will show whichever taxonomic level you desire — if you are seeing only a genus displayed when you select species level, it is because it is an “unknown species” (or more correctly, the species name is not annotated in the reference database, as described above).

So you want to know how reliable these species-level classifications are? And what the likelihood is that you can design QPCR primers for one of these species and expect that it will work?

There is not really a good answer for that question, because the confidence depends in a very large way on the individual taxon, how well it is represented and annotated in the reference database, how easy it is to differentiate that taxon from near neighbors, the classification method that you used, etc…

Species-level classifications of short DNA sequences can be difficult to achieve reliably.

standard PCR primers would be a lot less expensive than species-specific QPCR probes. You could just order standard primers for all of those to test for presence/absence of each species, then use QPCR primers if you need actual cell numbers.

I hope that helps!

1 Like

Dear Nicolas,

After deep discussion, our research team has decided to go for genus level validation. Thank you for your time and explanation.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.