How are these feature ids different?

Hello. I need some help.
I want to use a otu table for LEfSe analysis.

While i was parsing the otu table, i found the following feature ids.

How are these feature ids different?

What does “;;;" and ";p; ~” mean on this?

Good morning,

What database was used to assign taxonomy? I ask because these probably have the same meaning (unknown taxonomy), but this depends on the specific database.


1 Like

Hey colinbrislawn,

I’m not OP, but I have a similar question.

I have a dataset of around 260 samples from plant tissues. I sequenced the V3-V4 region, and I’m trying to make sure I understand my taxonomic output. I used the following commands to train a classifier and get my taxonomy.

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-99-seqs.qza
–i-reference-taxonomy silva-138-99-tax.qza
–o-classifier classifier.qza

qiime feature-classifier classify-sklearn
–i-classifier classifier.qza
–i-reads demux-rep-seqs.qza
–o-classification custom-taxonomy.qza

When looking at output from core features, I have some ASV’s that will be identified as follows:

and then others that will look something like this:

Do you know if there is a difference between the ASV’s with just blanks at different levels, vs the ones with f/g/ etc.? I know that the letters stand for taxonomic levels, but I’m trying to determine what leads to unassigned taxonomy. Is it the case that QIIME is matching to a sequence in the database that is known to a certain level, or is it the case that QIIME is using the sequence and phylogenetic information to say, “This ASV belongs to Anaerolineae, but I can’t say where.”

Thanks for any help!

Hi @bpscherer,

See here for details: ASV collaps question

The “__” indicates unclassified ranks, but “g__” etc indicates that that rank was left unannotated in the reference database… so basically q2-feature-classifier can confidently deliver a genus-level classification in that case, but the reference database is missing the genus name! (usually because the reference sequence was from an unknown source or uncertain source).

q2-feature-classifier cannot confiently distinguish between multiple members of that clade… so:

that is what yields a classification like so:

That is what leads to a classification like this:

so in other words " k__Bacteria;__;__;__" is total rubbish… q2-feature-classifier says “it looks vaguely bacterial”… usually it is non-target DNA (e.g., crossover? host DNA?) and the classifier was not trained on sufficient information to place it in another clade.


Thank you so much, this clears it up for me!