QIIME2R -> trouble with phyloseq and rooted tree

Hi all,

Strange behavior, which wasn't happening when I used the QIIME2R prior, but I am having issues now. I have a merged file w/ 5 different runs. I merged these runs and filtered the file.

I then created a tree with MAFFT and then rooted it. So I have a "rooted_tree.qza" file. I then used the command from QIIME2R:

phy<-qza_to_phyloseq("filtered.merge.table.qza", "rooted-tree.qza", "taxonomy.qza","MSQ.mouse.master.map.txt")

This object looks like this:

phyloseq-class experiment-level object
otu_table() OTU Table: [ 12722 taxa and 903 samples ]
sample_data() Sample Data: [ 903 samples by 34 sample variables ]
tax_table() Taxonomy Table: [ 12722 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 12722 tips and 12548 internal nodes ]

I then give the column names ranks:

colnames(tax_table(phy))=c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "OTU")

This file can create a PCOA that is rooted:

When I try to subset this file by piping via sample type (e.g., Lung).

Lung.phy.table = subset_samples(phy.relative.table, Sample %in% c('Lung'))

Then try to recreate the PCOA with experiment I get this error:

Warning message:
In matrix(tree$edge[order(tree$edge[, 1]), ][, 2], byrow = TRUE, :
data length [25269] is not a sub-multiple or multiple of the number of rows [12635]

This sounds like it's an issue with the tree. I've googled this and it's because the tree doesn't match the input file.

Interestingly, it also has issues with Bray distance which shouldn't be based on the tree. I"m wondering if there's something to do with how the object is created from the QIIME2 import? Has anyone else had this issue? Ben

edit: Just to be clear, I think I know what the problem is: when I subset the data, what is happening is that the root of the tree is in a ASV which is filtered out with the rest of the subletting to Lung samples only. So, somehow, I need to re-root the tree within this subsetted phyloseq object.

This behavior is slightly troublesome, because I did this before with the objects and I never had a problem with QIIME2 and R, so I'm wondering if something changed now that there's some compatibility issue.

edit edit: I am an idiot. I referred to another phyloseq object.

1 Like

Hmm, could you post phy_tree(phy) and otu_table(phy) both before and after subsetting? I wouldn't think this is a function of the import, but maybe in pruning the tree to reflect only the features found in the subset samples.

1 Like

Sorry, I noticed that the tree was having problems as phy too. I think that it's something to do with the tree object when imported in.

phy_tree(phy) Phylogenetic tree with 12722 tips and 12548 internal nodes.
Tip labels:
3ea2daa1de72aad93dc07c126b4b48e0, d5c3468dc10229eb5ff8ca83c8380894, af42247fdba1e701315071dfe550fdfa, 98cd99d1c9418a0416dac9d5972afb14, 166781e9335f9d9c4d636da6b160648f, 4f407fe8db166c82928d1797b6093a4e, ...
Node labels:
root, 0.935, 0.980, 0.769, 0.947, 0.284, ...
Rooted; includes branch lengths.

otu_table(phy.relative.table)
OTU Table: [12722 taxa and 903 samples]
taxa are rows

Then it spits out a bunch of relative abundances.

phy_tree(Lung.phy.table)
Phylogenetic tree with 12722 tips and 12548 internal nodes.
Tip labels:
3ea2daa1de72aad93dc07c126b4b48e0, d5c3468dc10229eb5ff8ca83c8380894, af42247fdba1e701315071dfe550fdfa, 98cd99d1c9418a0416dac9d5972afb14, 166781e9335f9d9c4d636da6b160648f, 4f407fe8db166c82928d1797b6093a4e, ...
Node labels:
root, 0.935, 0.980, 0.769, 0.947, 0.284, ...
Rooted; includes branch lengths.

otu_table(Lung.phy.table)
OTU Table: [12722 taxa and 228 samples]
taxa are rows

When I run distance calculations I get:

wUnif.dist = distance(phy.relative.table, method = "wunifrac")Warning message:
In matrix(tree$edge[order(tree$edge[, 1]), ][, 2], byrow = TRUE, :
data length [25269] is not a sub-multiple or multiple of the number of rows [12635]

wUnif.dist = distance(Lung.phy.table, method = "wUniFrac")
Warning message:
In matrix(tree$edge[order(tree$edge[, 1]), ][, 2], byrow = TRUE, :
data length [25269] is not a sub-multiple or multiple of the number of rows [12635]

I want to let you know that the distance matrix is still generated even though there's a warning. Ben

hmm, based on the github issue, perhaps try rerooting the tree. Something like this should do it:

phy_tree(Lung.phy.table)<-phangorn::midpoint(phy_tree(Lung.phy.table))
1 Like

Thanks, that's what I figured - I'm not sure if it really impacted anything downstream, but I've used QIIME2R before and I can't recall if I've gotten anything issue like this before. Ben

phy_tree(phy.relative.table)<-phangorn::midpoint(phy_tree(phy.relative.table))
wUnif.dist = phyloseq::distance(phy.relative.table, method = "wunifrac")
Warning message:
In matrix(tree$edge[order(tree$edge[, 1]), ][, 2], byrow = TRUE, :
data length [25269] is not a sub-multiple or multiple of the number of rows [12635]

estimate number of axes

wUniF.pco = dudi.pco(cailliez(wUnif.dist), scannf = FALSE, nf = 3)

Weird, I re-rooted the tree, but I'm still having the issue. Ben

Just a follow up, I re-ran portions of my pipeline (to exclude a set of samples) and the problem went away with the new tree.

So, I'm not sure, maybe I was using the wrong tree at the time. So sorry for the trouble. Ben

1 Like

I don't think anything in Phyloseq needs rooted trees, just FYI.

1 Like

Hm, weird - I guess I was using the wrong tree. Ben