Hello @colinbrislawn ,
Yes, indeed, filtering input data is important to show a neater tree structure and get a better overview of the overall diversity of the samples!
At the beginning I was thinking of cutting out all 'unclassified' features, like the ones being unable to reach the deepest taxonomy level, (g__, f__, etc), but then I realised I would have ended up ignoring relevant data (as another user pointed out here). They aren't, indeed, nice to read, but readability shouldn't come at the cost of losing information.
Nick has a great suggestion:
- filter out features to reduce complexity
Yes, exactly, that was my intuition too, and I'm glad he helped with the code necessary for this.
Merge features at a higher taxonomy level. So instead of showing all ASVs, just show all families, or maybe even classes of microbes. 1000s of ASVs are often represented by 100s of classes.
That's another possibility, indeed!
How do you perform this exactly? I had used qiime taxa collapse before but I don't know what to do after this, in this case, to obtain only the rep-seqs I need.
Make a tree showing one taxonomy of interest
Indeed, this may definitely be used to deepen the analysis about certain relevant taxa which were particularly abundant, focusing only on some branches of interest.
I have never tried PhyloSeq, but I'll definitely do. Thanks for your suggestions!