q2-fragment-insertion tree size and other questions

Hi everyone,

I am using fragment-insertion plugin in qiime2 and I am not quite clear about the insetion-tree.qza. Could you please help me to understand more about this? I also read the paper but couldn’t find answer to the question that I have.

My question is as follows.

I have 7000 unique ASV and I run following command to make the tree by using fragment-insertion plugin.

qiime fragment-insertion sepp
–i-representative-sequences rep-seqs.qza
–o-tree insertion-tree.qza
–o-placements insertion-placements.qza

After running the script, I uploaded insertion-tree.qza in iTOL (interactive Tree of Life) and I noticed only 193 branches.

My first question: Why are there only 193 branches for 7000 unique ASV? Why not 7000?
My second question: I checked the branches names assigned in the insertion-tree.qza and I tried to annotate with the taxonomy_gg99.qza (https://raw.githubusercontent.com/biocore/q2-fragment-insertion/master/taxonomy_gg99.qza) and most of the branches are annotated with Archaea. I took the branch id name from insertion-tree.qza and directly annotated with same id available in taxonomy_gg99.qza.

Could you please help me understand what is going on? Am I missing something?

Thank you so much.

pinging @Stefan
:qiime2:

Hi @microbiolearner,
would you mind sharing your tree and your input (or a subset of it) with us to enable better debugging?

Is it possible, that you confuse the term “branch” (which is a sole “line” in a tree connecting two nodes) with “tip” which is one leaf, i.e. a terminal node, in a tree?

SEPP rejects sequences, if they are too distant to the reference set. Which by default are ~200,000 16S full length bacterial sequences. Thus, if you use sequences from Archaea, it might be possible that the majority of the sequences get rejected. What kind of environment are you investigating with what sequencing method?

Best
Stefan

Hi Stefan,

Thank you, for replying back. Because of some confidential reason I couldn’t share the data. But I will try to explain in detail regarding my confusion.

Background:

When I uploaded the insertion-tree.qza available in this link, (https://github.com/qiime2/q2-fragment-insertion), into iTOL (Interactive Tree of Life) we can see this tree has 193 leaves and they have some numerical value as their id. For example, 4339880, 343305, 317164, 4437874 and so on. I annotated these ids to get taxonomic information from taxonomy_gg99.qza (https://raw.githubusercontent.com/biocore/q2-fragment-insertion/master/taxonomy_gg99.qza), and I ended up getting mostly Archaea for most of the leaves id of insertion-tree.qza.

For example, the taxonomic information for some of the leaves id from insertion-tree.qza are as follows:

4339880 k__Archaea; p__[Parvarchaeota]; c__[Parvarchaea]; o__YLA114; f__; g__; s__

343305 k__Archaea; p__[Parvarchaeota]; c__[Parvarchaea]; o__YLA114; f__; g__; s__

317164 k__Archaea; p__[Parvarchaeota]; c__[Parvarchaea]; o__YLA114; f__; g__; s__

4437874 k__Archaea; p__[Parvarchaeota]; c__[Parvarchaea]; o__YLA114; f__; g__; s__

My question is: why most of the leaf id belong to kingdom Archaea or am I missing something?

Thank you.

Hi @microbiolearner,
the reference tree file you found on the github page contains ~203000 leaves - even before inserting anything into it via SEPP. I bet, the shear size overburdens iTOLs web visualization and it randomly picks 0.1% of tips to at least show something.

I used http://tree.bio.ed.ac.uk/software/figtree/ to manually inspect trees of this size, but it is generally really hard to deal with visualizations of this size.

Another possible explanation could be incompatible usage of the very loosely defined Newick format between Qiime2 and iTOL.

In any case, I don’t think this is an issue of the fragment-insertion algorithm but rather a general problem of the visualization.

Best,
Stefan