Thanks again! I check the data again and using dim() to show the data, there was no sample loss in data import(441 samples):
mer_physeq <- qza_to_phyloseq(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 536 tips and 535 internal nodes ]
dim(mer_physeq@otu_table@.Data)
[1] 536 441
dim(mer_physeq@tax_table@.Data)
[1] 536 8
fea_physeq <- qza_to_phyloseq(features = "feature-table.qza")
dim(fea_physeq@.Data)
[1] 1942 441
taxo_physeq <- qza_to_phyloseq(taxonomy = "taxonomy.qza")
dim(taxo_physeq@.Data)
[1] 1942 8
notree_physeq <- qza_to_phyloseq(features = "feature-table.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
notree_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
dim(notree_physeq@otu_table@.Data)
[1] 536 441
dim(notree_physeq@tax_table@.Data)
[1] 536 8
It's too weird for me. When I try to go through the code of phyloseq:phyloseq(), I found the rownames difference between feature-table and taxonomy may contribute to this error. So I check the rownames of both, I found the key to the problem, the rownames are not exactly the same, there are some blank space in the rownames of taxonomy, like 10139.
rownames(fea_physeq@.Data)[1:10]
[1] "1001762" "1006042" "1009722" "1013134" "10139" "1015143" "1016138" "1017181" "1018892" "1020931"
rownames(taxo_physeq@.Data)[1:10]
[1] "1001762" "1006042" "1009722" "1013134" " 10139" "1015143" "1016138" "1017181" "1018892" "1020931"
table(rownames(fea_physeq@.Data)%in%rownames(taxo_physeq@.Data))
FALSE TRUE
1406 536
Thus, only 536 OTUs were extracted from the data. I go back to check the code of qiime2R:qza_to_phyloseq(), and I found the parse_taxonomy() function, which generates a taxonomy table, may be involved in it. I go through the code, and it showed that when the taxonomy$Feature.ID is num, it generates the rownames with blank space. I defined the taxonomy$Feature.ID as the character before using it as rownames. Finally, I got the correct result with 1942 OTUs output.
parse_taxonomynew <- function(taxonomy, tax_sep, trim_extra){
- if(missing(taxonomy)){stop("Taxonomy Table not supplied.")}
- if(missing(trim_extra)){trim_extra=TRUE}
- if(missing(tax_sep)){tax_sep="; |;"}
- if(sum(colnames(taxonomy) %in% c("Feature.ID","Taxon","Confidence", "Consensus"))!=3){stop("Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))")}
- taxonomy$Confidence<-NULL
- taxonomy$Consensus<-NULL # new add
- taxonomy$Feature.ID<-as.character(taxonomy$Feature.ID) # new add
- if(trim_extra){
-
taxonomy$Taxon<-gsub("[kpcofgs]__","", taxonomy$Taxon) #remove leading characters from GG
-
taxonomy$Taxon<-gsub("D_\\d__","", taxonomy$Taxon) #remove leading characters from SILVA
- }
- taxonomy<-suppressWarnings(taxonomy %>% separate(Taxon, c("Kingdom","Phylum","Class","Order","Family","Genus","Species"), sep=tax_sep))
- taxonomy<-apply(taxonomy, 2, function(x) if_else(x=="", NA_character_, x))
- taxonomy<-as.data.frame(taxonomy)
- rownames(taxonomy)<-taxonomy$Feature.ID
- taxonomy$Feature.ID<-NULL
- return(taxonomy)
- }
qza_to_phyloseqnew<-function(features,tree,taxonomy,metadata, tmp){
- if(missing(features) & missing(tree) & missing(taxonomy) & missing(metadata)){
-
stop("At least one required artifact is needed (features/tree/taxonomy/) or the metadata.")
- }
- if(missing(tmp)){tmp <- tempdir()}
- argstring<-""
- if(!missing(features)){
-
features<-read_qza(features, tmp=tmp)$data
-
argstring<-paste(argstring, "otu_table(features, taxa_are_rows=T),")
- }
- if(!missing(taxonomy)){
-
taxonomy<-read_qza(taxonomy, tmp=tmp)$data
-
taxonomy<-parse_taxonomynew(taxonomy)
-
taxonomy<-as.matrix(taxonomy)
-
argstring<-paste(argstring, "tax_table(taxonomy),")
- }
- if(!missing(tree)){
-
tree<-read_qza(tree, tmp=tmp)$data
-
argstring<-paste(argstring, "phy_tree(tree),")
- }
- if(!missing(metadata)){
-
if(is_q2metadata(metadata)){
-
metadata<-read_q2metadata(metadata)
-
rownames(metadata)<-metadata$SampleID
-
metadata$SampleID<-NULL
-
} else{
-
metadata<-read.table(metadata, row.names=1, sep='\t', quote="", header=TRUE)
-
}
-
argstring<-paste(argstring, "sample_data(metadata),")
- }
- argstring<-gsub(",$","", argstring) #remove trailing ","
- physeq<-eval(parse(text=paste0("phyloseq(",argstring,")")))
- return(physeq)
- }
mer_physeq_new <- qza_to_phyloseqnew(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq_new
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1942 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 1942 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 1942 tips and 1940 internal nodes ]
dim(mer_physeq_new@otu_table@.Data)
[1] 1942 441
dim(mer_physeq_new@tax_table@.Data)
[1] 1942 7
Thanks again for your help. I hope it could be improved in the updated qiime2R version
@ Jordan Bisanz