Fix required: Tips in a tree detected as repeated

Hi
I have been using QIIME 1 for a while now and I’m finally switching to QIIME2 out (v 2019.1). I was trying to compare Q1 and Q2 OTUs with ASVs but I’m having some problems when running Sepp for the phylogenies constructed from OTUs. I used QIIME2 from scratch to create the rep_set qza with the open-reference pipeline. The thing is, sepp carries out some checking step in which it decides I have repeated tips and aborts the phylogeny construction.

I’ve looked down in the forums and it seemed to me it could be something weird about some of my rep seqs having the same name (number, actually) as the reference I used (green genes 2013_08), so I altered the names of each of them (added a set of characters as prefix) and sepp runned without errors (sorry but I did not save the original error it prints).

Now, when working with the core-metrics pipeline, I tried importing a tree from Q1 and had a similar problem as it prints the following error traceback:
File “/home/rodrigo/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/diversity/_util.py”, line 93, in _validate_otu_ids_and_tree
raise DuplicateNodeError(“All tip names must be unique.”)
skbio.tree._exception.DuplicateNodeError: All tip names must be unique.
I checked my files and they are in fact unique nemes, both in the table and the trees. I then changed the names in my otu tables to match those in my sepp newick tree and it worked again.

There seems to be some odd behaviuour in the _validate_otu_ids_and_tree algorithm that does not allow it to continue. I’ve already figured out how to get around this but could you look at it so it can get fixed in later versions, please?

3 Likes

Hey @rodrigogarlop!

Thanks so much for the report. A name match between rep-seqs and reference seems plausible to me for causing this issue, as that would certainly be a rare thing to see. And the following errors all make sense considering the changes you made (to clarify, you were able to get things working eventually right? after a few cycles of renames, otherwise, we can help!)

Would you by chance be able to provide a little bit of data which replicates this? Only a few reads should suffice I think (i.e. just the ones you had to rename). The md5sum of your reference data would also be good, that way I can be sure I’m testing against the same one. Which OTUs did you use from Greengenes? 97% or 99%?

Thanks again!