I am trying to construct a ghost-tree for my ITS2 amplicon using q2-ghost-tree.

I read your tutorials.It mentions that we cannot using DADA2 feature table and representative sequences directly but have to recluster it to Unite database by command qiime vsearch cluster-features-closed-reference. I have two questions about this procedure:

  1. Since my ITS2 ASV is not trimmed the flank region(5.8S and 28S) through itsxpress ,is that means I should choose developer version Unite database instead of normal version which is pre-trimmed by ITSx for qiime vsearch cluster-features-closed-reference parameter--i-reference-sequencesas QIIME2 team recommend.

  2. which identity threshold should I choose in qiime vsearch cluster-features-closed-reference parameter--p-perc-identity? 0.97 as usually? :thinking:


Hi @sixvable apologies for the delay. Hard to stay on top of things during a move.

I’m honestly not familiar with itsxpress unfortunately. I was performing ITS analysis prior to the development of this tool and I’m currently not working on fungal analysis so I am not fully up to date. Were you able to figure out anything additional with regards to your question #1?

For #2 if you’re building your own ghost-tree, you are able to choose 0.97 or you can even go lower (I tried 90 and even lower). When you cluster your sequences using vsearch you are choosing to lose the benefit of the ASVs and make a little bit more flexible groupings so that you don’t discard ‘unclassified’ sequences when those clusters of seqs do not have a consensus taxonomic match to the foundation tree. To create a ghost tree, there needs to be a match between ‘genus’ in the foundation and ‘genus’ in your extension sequence group. So even when you use something like 90, it sounds terrible at first, but it’s preventing many seqs from being discarded because they might be classified as “unknown”. But this will decrease the quality of your phylogenetic tree. From the tests I did, it was better to have an “acceptable” tree than no tree at all (which would force you to do non-phylogenetic diversity analysis), or to discard a ton of your sequences due to missing nomenclature. At the time I worked on ghost-tree, so many sequences in the UNITE database were “unclassified”… I hope that gets better over time.

Let me know if this answers your questions or if you have any follow up questions especially considering the delay in my response. :slightly_smiling_face:


