qiime rescript dereplicate --p-mode super leads to "Taxonomic label depth is uneven" error with qiime rescript evaluate-fit-classifier

HI @David_Bradshaw,

The dereplicate action was initially set up to handle taxonomies with only the standard 7-ranks (i.e. dpcofgs). That is, if any taxonomy was truncated at a higher level, we'd backfill them with the corresponding prefixes, e.g. f__; g__; s__.

For example, this:
KJ763795.1.1805 d__Eukaryota; k__Alveolata; p__Dinoflagellata; c__Dinophyceae; o__Gymnodiniphycidae

would become this:
KJ763795.1.1805 d__Eukaryota; k__Alveolata; p__Dinoflagellata; c__Dinophyceae; o__Gymnodiniphycidae; f__; g__; s__

It appears you are leveraging all the available SILVA taxonomy. In which case, the taxonomy rank backfilling of the prefixes will not work. We should probably update the LCA functionality so that it'll backfill using any number / combination of taxonomic ranks. :grimacing:

I'd suggest you stick with using the uniq option for now (keeps identical sequences with uniq taxonomic ranks), and let the classifier handle working out the taxonomic assignment. The classifier will, in effect, perform an LCA when it is unable to disambiguate very similar / identical sequences with differing taxonomy.

2 Likes