How to make a phylogenetic tree looking too complex more readable?

Sparkle · December 3, 2019, 3:23pm

Hello,
I'm trying to create a phylogenetic tree from my representative sequences resulting from a previously performed dada2 analysis, as follows (I'm following this tutorial).

qiime alignment mafft --i-sequences rep-seqs.qza --o-alignment aligned-rep-seqs.qza

qiime alignment mask --i-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza

qiime phylogeny fasttree --i-alignment masked-aligned-rep-seqs.qza --o-tree fasttree-tree.qza

I'm trying to view the resuting tree on iTOL, as suggested in the tutorial. However, its structure is definitely too complicated, and I'd like to filter something, like all the assignments showing low abundances in the samples... (I did that with qiime feature-table filter-features in a previous step, to show only certain data in heatmaps).
I don't know how to perform this in this case, anyway, since I'm only asked rep-seqs.qza, and can't act on FeatureTable directly...

How do I perform something similar in this case?

Any suggestions about how to make a phylogenetic tree more readable in general or any alternative approach? Thanks in advance.

Nicholas_Bokulich · December 3, 2019, 4:51pm

Two steps:

Filter the feature table using filter-features to remove low-abundance features.
Use this filtered table to filter out sequences that are not found in that table. Something like:

qiime feature-table filter-seqs \
  --i-data seqs.qza \
  --i-table filtered-table.qza \
  --o-filtered-data filtered-seqs.qza

Give that a try and let us know if it helps.

colinbrislawn · December 4, 2019, 12:11am

Hello @Sparkle

I have the same experience with making graphs of trees. They should look good, but often start out as a jumbled mess.

Nick has a great suggestion:

filter out features to reduce complexity

Here are two other methods I have used to make clear, elegant trees:

Merge features at a higher taxonomy level. So instead of showing all ASVs, just show all families, or maybe even classes of microbes. 1000s of ASVs are often represented by 100s of classes.
Make a tree showing one taxonomy of interest. So if changes in Methanobacteria were observed in the study, you could make a tree showing just ASVs classified as Methanobacteria. (This is a lot like option 1, but instead of filtering for abundant microbes, you are filtering for interesting microbes.)

I'm not sure how best to do this sort of merging and filtering using the Qiime 2 plugins, but you can definitely do this using the Qiime 2 API, or an R package like Phyloseq.

Good luck!

Colin

Sparkle · December 4, 2019, 9:23am

Thank you so much, that's exactly what I was looking for!

That's, more in detail, what I performed:

Step 1. Removing any chloroplast/mitochondria contaminant
Step 2. Filtering any feature showing an abundance (I tried several thresholds, and chose < 200)
Step 3. Filtering the relative rep-seq as you suggested

qiime taxa filter-table --i-table table26.qza --i-taxonomy taxonomy26.qza --p-exclude mitochondria,chloroplast --o-filtered-table table-no-clo-mit.qza

qiime feature-table filter-features --i-table table-no-clo-mit.qza --p-min-frequency 200 --o-filtered-table filtered_table_for_tree_200.qza

qiime feature-table filter-seqs --i-data rep-seqs26.qza --i-table filtered_table_for_tree_200.qza --o-filtered-data filtered-seqs-for-tree-200.qza

And then, the previous three istructions I posted in the first message, using filtered-seqs-for-tree-200.qza instead of rep-seqs.qza.

Sparkle · December 4, 2019, 9:32am

Hello @colinbrislawn ,
Yes, indeed, filtering input data is important to show a neater tree structure and get a better overview of the overall diversity of the samples!

At the beginning I was thinking of cutting out all 'unclassified' features, like the ones being unable to reach the deepest taxonomy level, (g__, f__, etc), but then I realised I would have ended up ignoring relevant data (as another user pointed out here). They aren't, indeed, nice to read, but readability shouldn't come at the cost of losing information.

Nick has a great suggestion:

filter out features to reduce complexity

Yes, exactly, that was my intuition too, and I'm glad he helped with the code necessary for this.

Merge features at a higher taxonomy level. So instead of showing all ASVs, just show all families, or maybe even classes of microbes. 1000s of ASVs are often represented by 100s of classes.

That's another possibility, indeed!

How do you perform this exactly? I had used qiime taxa collapse before but I don't know what to do after this, in this case, to obtain only the rep-seqs I need.

Make a tree showing one taxonomy of interest

Indeed, this may definitely be used to deepen the analysis about certain relevant taxa which were particularly abundant, focusing only on some branches of interest.

I have never tried PhyloSeq, but I'll definitely do. Thanks for your suggestions!

Sparkle · December 4, 2019, 10:58am

I have another question, that came in my mind after going ahead with alpha and beta-diversity analysis.

I was going to use the tree I created and the FeatureTable, as suggested in the tutorial, obtained by keeping only the sequences aligned to the tree and discarding all the others.

qiime phylogeny filter-table --i-table filtered_table_for_tree_200.qza --i-tree mafft-fasttree-output-filtered-200/rooted_tree.qza --o-filtered-table table_filtered_from200tree.qza

Apparently, according to this tutorial, some type of filtering is performed on FeatureTable before trying to perform alpha and beata-diversity analysis.

We’ll first apply the core-metrics-phylogenetic method, which rarefies a FeatureTable[Frequency] to a user-specified depth

So, my question is...

Is it a right strategy to use a somehow pre-filtered FeatureTable to perform alpha and beta-diversity analysis?
Which one should I use?

The original one, as it came from qiime dada2 denoise-paired, completely unfiltered in terms of frequencies (table.qza)
The one I filtered with qiime feature-table filter-features, removing anything showing a frequency lower than a certain threshold (in my case, 200) (filtered_table_for_tree_200.qza)
The one containing only the reads aligning to the phylogenetic tree, and obtained, as I said in this post, with qiime phylogeny filter-table? (table_filtered_from200tree.qza)

I'm afraid that having performed a frequency filtering will affect alpha and beta-diversity analysis, underestimating it, because in the end it retained only a bunch of relevant taxa, leaving out all the ones with lower abundances.

colinbrislawn · December 4, 2019, 1:54pm

Good morning!

I think the second question is easy:

I use the original table for alpha diversity. Sometimes I perform normalization on that table, but I don't filter it at all. Unless someone proposes otherwise or you have a specific biological question, I would use this option.

Now for the harder question about building filtered trees:

Like I said, I'm not sure how to do this using Qiime 2 plugins, as the qiime taxa collapse command gives you the table you need, but does not collapse the seqs...

I do this in R using the Phyloseq package. The tax_glom() command returns the collapsed table just like Qiime 2, but it also returns the collapsed tree. Very helpful!

I highly recommend Phyloseq because it has great tutorials just like Qiime 2! If you are more comfortable in R, you could try the Qiime 2 API.

Colin

Sparkle · December 4, 2019, 2:09pm

Hello, and, once more, thanks for your answer!

Sometimes I perform normalization on that table, but I don’t filter it at all.

I guessed this... after all, it would make no sense to perform a diversity analysis on something already filtered... thanks for confirming it!

Filtering in my case only made sense for the sake of a cleaner tree visualization, but I guess a complete 'complex' tree, created using the original rep-seq (not filtered in any way) is still required for performing further alpha and beta-diversity analysis.

In terms of excluding sequences, all I did before creating this tree was excluding mitochondria and chloroplasts reads, which won't contribute to any diversity and are only contaminations resulting from 16S amplification...

gives you the table you need, but does not collapse the seqs…

I do this in R using the Phyloseq package. The tax_glom() command returns the collapsed table just like Qiime 2, but it also returns the collapsed tree. Very helpful!

Exactly, that's why I asked.

I'll try this, thanks again!

colinbrislawn · December 5, 2019, 3:22pm

5 posts were split to a new topic: What plugins perform normalization automatically and why?

colinbrislawn · December 4, 2019, 3:01pm

Some diversity methods like Faith's PD and UniFrac require a tree, but there is no problem using the complex full tree without filtering. 'Full uncut' trees are fine for diversity analysis as you never see them. I only make trees less complex right before I graph them.
The rest of the time, I use the full tree.

Normalization is very contentious. I'll say stick with the methods in the tutorials and see that reviewers suggest you do.

Colin

system · January 5, 2020, 9:22pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.