Error in core metrics ( tree must have a branch lenghth)

I have used XX_otus.tree downloaded from greengenes database and I have imported it according to the importing instruction, but I had this error

Plugin error from diversity:

All non-root nodes in tree must have a branch length.

Debug info has been saved to /var/folders/nk/qyjfz4t11vn5vnx07bkphqrw0000gn/T/qiime2-q2cli-err-u3wm8n8f.log.

can someone help me to solve this error?

Thanks in advance,

Rubina

1 Like

Thanks for reporting, @rparadiso!

First, have you tried the fix reported here?

Alternatively, this is a fix that you can do by just opening the file in a text document, as opposed to trying to work with python:

Please describe what you've tried or give those a try, and we can go from there!

Thanks!

1 Like

I understood what was my problem!!!

FOR ALL PEOPLE WHO LIKE ME HAVE NO EXPERIENCE WITH PYTHON:

the ready to use script is (see attached) script.py (284 Bytes) .
I used the phylogenetic tree 99, if you use a different tree you must open the script and replace the number in the text.
To use it in the shell just write:
python script.py xx_otus.tree
the output of this command will automatically write the file in the format you need xx_otus.qza

I have read in the web how to use python and it was very simple using the script example of the link suggested by Nicholas.

I hope it help!

3 Likes

Hi @rparadiso,
Bravo! I hope this is a first step in a long and fruitful journey with python :smile:

Note, you can use python directly by opening a python interpreter. Just type "python" into your terminal and you are in — I should have clarified before that the example shown at that first link is showing how to fix this directly in the python interpreter, not in bash.

But if you prefer to make a script so you don't need to use the python interpreter, that's fine too! I made some little changes to your script so that you do not need to open the file and rename the input file, you can just write the input filepath on the command line.
fix_tree.py (286 Bytes)

Use like this:
python /path/to/fix_tree.py /path/to/99_otus.tree

(change to include the true filepaths to the script and tree on your computer)

This will read in the newick-format tree file and output a Phylogeny[Rooted] artifact with the same file name as the input file, but with the .qza extension.

Good luck!

4 Likes

Hi all,

Thanks again for the above fix_tree.py script, @rparadiso and @Nicholas_Bokulich! We were able to complete our analysis with out new proposed pipeline thanks to this.

Now, @Lauren and I are trying to complete the same pipeline again (DADA2 --> closed ref OTU clustering --> core metrics analysis --> taxonomy using VSEARCH), but this time using SILVA as our reference database instead of Greengenes. I was able to download the folder called "SILVA_132_QIIME_release" from the SILVA website (Archive), which has all of the equivalent files to the Greengenes 13.8 release folder we used before.

We were able to complete the closed reference OTU clustering, and we successfully ran the fix_tree.py script on the trees provided in the SILVA folder without error. However, when we used the resulting XX_otus.qza file made from the tree for the core metrics analysis, we received the following error:

It says the tree must be rooted, and therefore I guess the python script didn't work? Is the script designed specifically for a Greengenes tree, and if so, is there a way we could modify it?

Thanks as always,
Carli

Hi @cjone228, yes that script was specifically for greengenes trees, which appear to be missing the root branch length, so does not need to be run on the SILVA reference tree.

I am not totally sure but the SILVA trees might be unrooted, in which case you should import as unrooted and root it with qiime phylogeny midpoint-root.

Note that after using closed-ref OTU clustering there is no need to assign taxonomy. This is because closed-ref is aligning your sequences against a reference sequence, and the closest hit is taken as the representative sequence. Thus, you can just adopt the reference tree and taxonomy instead of building/classifying your own. There is no need to reclassify these sequences, since they consist of the full-length reference sequences at that point and you would just align them back against themselves.

This is a weakness with closed-reference clustering, in my opinion. It means that you must adopt the taxonomy (including any species information) of the closest match, but does not give you any confidence that that is the "correct" taxonomy (e.g., there could be an equally good match in the database that belongs to a different species but closed-ref just takes one top hit so would give you the false impression that the species information is correct).

In your case, the strengths of closed-ref (for coalescing multiple amplicons into a single full-length sequence and tree) outweigh these weaknesses (taxonomy and losing sequences that do not align to the reference)... so all I am recommending is that you continue with your proposed pipeline but (a) don't bother with taxonomy classification and (b) take species IDs with a grain of salt.

You could classify your ASVs taxonomically, collapse at species level, and then compare those results to the results from closed-ref to get a sense of how accurate the closed-ref species-level information is.

2 Likes

Hi @Nicholas_Bokulich ,

Thank you so much for the reply. The qiime phylogeny midpoint-root method fixed the issue! :partying_face: I was wondering, how come we didn't apply this method for the Greengenes tree earlier instead of using the fix_tree.py script?

Also, @Lauren and I were wondering if you could clarify what you mean by adopting the reference taxonomy? Below is an example what we did for our mock community with the Greengenes files:

qiime feature-classifier classify-consensus-vsearch \
  -–i-query ~/work/ATCC/rep_seqs/ATCC_cr_85_seqs.qza \ 
  -–i-reference-reads ~/work/gg-13-8/rep_set/85_otus.qza \
  -–i-reference-taxonomy ~/work/gg-13-8/taxonomy/85_otu_taxonomy.qza \ 
  --p-maxhits 1 \
  --p-perc-identity 0.85 \
  --p-strand both \
  --p-threads 5 \
  -–o-classification ~/work/ATCC/taxonomy/85_vsearch_taxonomy_85perc.qza \
  --verbose

Since we used the corresponding reference taxonomy to the tree and rep-set files we used for earlier steps, this is valid, correct?

Thanks again,
Carli :qiime2:

:partying_face:

Because the greengenes tree is already rooted, but the root is missing the length annotation, so it's a different problem entirely.

When you do closed-reference OTU clustering, the output representative sequences are actually the reference sequences that are top hits to your query seqs, and the feature IDs are the reference sequence IDs that correspond to the reference taxonomy for those seqs. So if these are your original query seqs:

id sequence
s1 AAAAAA
s2 CCCCCC

and these are your ref seqs:

id sequence
REF1 GTGTACGATGCTGACAAAAAAGTGCTGTAGTCGTGC
REF2 ATGGCTGTCTGCTGCCCCCCCCCGTGCTGTAGTGTC
REF3 TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGTGG
REF4 ACACACACACACACACACACACACACACACACACAC

then the output of closed-ref clustering will yield representative seqs that look like this:

id sequence
REF1 GTGTACGATGCTGACAAAAAAGTGCTGTAGTCGTGC
REF2 ATGGCTGTCTGCTGCCCCCCCCCGTGCTGTAGTGTC

So these are now the full-length reference sequences, and you can just look up the taxonomy associated with those reference sequences, instead of attempting to reclassify them.

So following your example, if this file is the output of closed-ref clustering:

Then there is no reason to classify with classify-consensus-vsearch, just lookup the taxonomy in this file:

and any QIIME 2 action to which you would pass this artifact (e.g., to make barplots):

can instead just accept the reference taxonomy as input to map the taxonomy associated with your closed-ref OTU representative sequences:

I hope that helps clarify!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.