How are these feature ids different?

Hello. I need some help.
I want to use a otu table for LEfSe analysis.

While i was parsing the otu table, i found the following feature ids.

temp1
How are these feature ids different?

What does ";;;" and ";p; ~" mean on this?

Good morning,

What database was used to assign taxonomy? I ask because these probably have the same meaning (unknown taxonomy), but this depends on the specific database.

Colin

1 Like

Hey colinbrislawn,

I’m not OP, but I have a similar question.

I have a dataset of around 260 samples from plant tissues. I sequenced the V3-V4 region, and I’m trying to make sure I understand my taxonomic output. I used the following commands to train a classifier and get my taxonomy.

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-99-seqs.qza
–i-reference-taxonomy silva-138-99-tax.qza
–o-classifier classifier.qza

qiime feature-classifier classify-sklearn
–i-classifier classifier.qza
–i-reads demux-rep-seqs.qza
–o-classification custom-taxonomy.qza

When looking at output from core features, I have some ASV’s that will be identified as follows:
k__Bacteria;;;;;__
k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacteriales;f__Enterobacteriaceae;__

and then others that will look something like this:
k__Bacteria;p__Chloroflexi;c__Anaerolineae;o__GCA004;f__;g__

Do you know if there is a difference between the ASV’s with just blanks at different levels, vs the ones with f/g/ etc.? I know that the letters stand for taxonomic levels, but I’m trying to determine what leads to unassigned taxonomy. Is it the case that QIIME is matching to a sequence in the database that is known to a certain level, or is it the case that QIIME is using the sequence and phylogenetic information to say, “This ASV belongs to Anaerolineae, but I can’t say where.”

Thanks for any help!

Hi @bpscherer,

See here for details: ASV collaps question

The "__" indicates unclassified ranks, but "g__" etc indicates that that rank was left unannotated in the reference database... so basically q2-feature-classifier can confidently deliver a genus-level classification in that case, but the reference database is missing the genus name! (usually because the reference sequence was from an unknown source or uncertain source).

q2-feature-classifier cannot confiently distinguish between multiple members of that clade... so:

that is what yields a classification like so:

That is what leads to a classification like this:

so in other words " k__Bacteria;__;__;__" is total rubbish... q2-feature-classifier says "it looks vaguely bacterial"... usually it is non-target DNA (e.g., crossover? host DNA?) and the classifier was not trained on sufficient information to place it in another clade.

3 Likes

Thank you so much, this clears it up for me!