Can I take a subsample of representative sequences for phylogenetic tree

Lamm-a · April 4, 2023, 10:45am

I have a very large merged dataset and it is to large to run typical phylogenetic inference on with align-to-tree-mafft-iqtree. The dataset is not 16S based so I can use the fragment-insertion alternative. Thus, is there a way to subsample my data after consulting alpha rarefaction curves and summary visualisation of the feature table/seqs?

Ideally I would then like to use the subsampled dataset for all analysis going forward.

colinbrislawn · April 4, 2023, 1:46pm

I'm not sure this will work.

First, 'subsampling' happens to feature counts, not the features themselves.

Example raw table:

Feature	Sample1	Sample2	Sample3	Sum of this ASV
ASV1	100	100	80	280
ASV2	100	50	50	200
ASV3	50	20	1	71
Sum of this Sample	250	170	131

Example table after subsampling:
(Note how counts per sample are all the same, and all the features are still there.)

Feature	Sample1	Sample2	Sample3	Sum of this ASV
ASV1	42	70	76	188
ASV2	44	35	44	123
ASV3	34	15	0	49
Sum of this Sample	120	120	120

Ideally I would then like to use the subsampled dataset for all analysis going forward.

Because subsampling changes the counts and not the features, the tree would still be large.

Lamm-a · April 4, 2023, 1:57pm

Maybe I do not mean subsampling then. When looking at sampling depth determined from rarefication in the feature table summary visualisation it shows x% of features would be lost. Would that not result in a smaller tree but still conveying similar information as per the theory rarefication ?

colinbrislawn · April 4, 2023, 2:52pm

Ah! Thank you for clarifying. In that case, yes, those features would be lost at that subsampling depth and would be dropped from the tree. (You may have to drop them using an additional command, but it's possible.)

In my example, ASV3 in Sample3 had a count of zero after subsampling. If it had a count of zero in all samples, the feature could be dropped from the table and the tree. This sounds like what you want.