OTU number loss using qza_to_phyloseq (qiime2R) import .qza data to R

HeinzG · September 22, 2020, 2:53pm

Hi,
I have used the qza_to_phyloseq (qiime2R) to import the .qza data to the R successfully. However, when I check the feature-table and taxonomy data, I found the number of OTU was different between QIIME2 and R. There were 1942 OTU in the feature-table.tsv file of QIIME2, while only 536 OTUs left after being imported into R, about 3/4 OTUs loss. I can not figure out what happend. Did anyone else meet the same issue?

Qiime2:
wc -l feature-table.tsv
1943 feature-table.tsv

R

nrow(mer_physeq@otu_table@.Data)
[1] 536

jbisanz · September 22, 2020, 4:26pm

Could you please email me the OTU table artifact and I will look into this? jordan.bisanz@ucsf.edu.

HeinzG · September 23, 2020, 6:12am

Thank you very much for your reply and help. The OTU table artifacts have been sent to your email.

jbisanz · September 23, 2020, 4:57pm

I'm not sure what is going on there as they appear to be getting imported correctly:

tsv<-read.table("~/Desktop/test/feature-table.tsv", skip=1, sep="\t", row.names=1, comment="", header=TRUE)
q2<-read_qza("~/Desktop/test/feature-table.qza")$data
phy<-qza_to_phyloseq(features="~/Desktop/test/feature-table.qza")

sum(tsv)
[1] 11867121
sum(q2)
[1] 11867121
sum(phy)
[1] 11867121

dim(tsv)
[1] 1942 441
dim(q2)
[1] 1942 441
dim(phy)
[1] 1942 441

This could be some strange behavior of phyloseq but to know what is happening I would need to see the code you are using and all the inputs to qza_to_phyloseq.

HeinzG · September 24, 2020, 5:29am

Thanks for your help. I import data again according to your method and I found a strange thing. When I import the feature-table and taxonomy separately with qza_to_phyloseq, it seems to work correctly and get 1942 OTU. However, when I import them together with the metadata, the strange thing happens, and only 536 OTU output.

mer_physeq <- qza_to_phyloseq(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
nrow(mer_physeq@otu_table@.Data)
[1] 536
nrow(mer_physeq@tax_table@.Data)
[1] 536
fea_physeq <- qza_to_phyloseq(features = "feature-table.qza")
nrow(fea_physeq@.Data)
[1] 1942
taxo_physeq <- qza_to_phyloseq(taxonomy = "taxonomy.qza")
nrow(taxo_physeq@.Data)
[1] 1942
notree_physeq <- qza_to_phyloseq(features = "feature-table.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
nrow(notree_physeq@otu_table@.Data)
[1] 536
nrow(notree_physeq@tax_table@.Data)
[1] 536

Does it mean I need to use qza_to_phyloseq to import data separately instead of importing all the data together?

jbisanz · September 24, 2020, 5:13pm

There is qza_to_phyloseq() function is just a wrapper for a number of individual read_qza() functions which then uses phyloseq:phyloseq() to build the object. If the individual items are imported correctly, it would suggest the issue is coming from phyloseq. I wonder if there is an issue with sample names which is causing you to lose samples, and then all 0-features are being stripped on import?

Perhaps you could try manually making the object and see if the issue persists. Maybe instead of nrow you could use dim to make sure the right number of columns are also there?

HeinzG · September 25, 2020, 6:30pm

Thanks again! I check the data again and using dim() to show the data, there was no sample loss in data import(441 samples):

mer_physeq <- qza_to_phyloseq(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 536 tips and 535 internal nodes ]
dim(mer_physeq@otu_table@.Data)
[1] 536 441
dim(mer_physeq@tax_table@.Data)
[1] 536 8
fea_physeq <- qza_to_phyloseq(features = "feature-table.qza")
dim(fea_physeq@.Data)
[1] 1942 441
taxo_physeq <- qza_to_phyloseq(taxonomy = "taxonomy.qza")
dim(taxo_physeq@.Data)
[1] 1942 8
notree_physeq <- qza_to_phyloseq(features = "feature-table.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
notree_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
dim(notree_physeq@otu_table@.Data)
[1] 536 441
dim(notree_physeq@tax_table@.Data)
[1] 536 8

It's too weird for me. When I try to go through the code of phyloseq:phyloseq(), I found the rownames difference between feature-table and taxonomy may contribute to this error. So I check the rownames of both, I found the key to the problem, the rownames are not exactly the same, there are some blank space in the rownames of taxonomy, like 10139.

rownames(fea_physeq@.Data)[1:10]
[1] "1001762" "1006042" "1009722" "1013134" "10139" "1015143" "1016138" "1017181" "1018892" "1020931"
rownames(taxo_physeq@.Data)[1:10]
[1] "1001762" "1006042" "1009722" "1013134" " 10139" "1015143" "1016138" "1017181" "1018892" "1020931"
table(rownames(fea_physeq@.Data)%in%rownames(taxo_physeq@.Data))
FALSE TRUE
1406 536

Thus, only 536 OTUs were extracted from the data. I go back to check the code of qiime2R:qza_to_phyloseq(), and I found the parse_taxonomy() function, which generates a taxonomy table, may be involved in it. I go through the code, and it showed that when the taxonomy$Feature.ID is num, it generates the rownames with blank space. I defined the taxonomy$Feature.ID as the character before using it as rownames. Finally, I got the correct result with 1942 OTUs output.

parse_taxonomynew <- function(taxonomy, tax_sep, trim_extra){

if(missing(taxonomy)){stop("Taxonomy Table not supplied.")}
if(missing(trim_extra)){trim_extra=TRUE}
if(missing(tax_sep)){tax_sep="; |;"}
if(sum(colnames(taxonomy) %in% c("Feature.ID","Taxon","Confidence", "Consensus"))!=3){stop("Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))")}
taxonomy$Confidence<-NULL
taxonomy$Consensus<-NULL # new add
taxonomy$Feature.ID<-as.character(taxonomy$Feature.ID) # new add
if(trim_extra){

taxonomy$Taxon<-gsub("[kpcofgs]__","", taxonomy$Taxon) #remove leading characters from GG

taxonomy$Taxon<-gsub("D_\\d__","", taxonomy$Taxon) #remove leading characters from SILVA

}
taxonomy<-suppressWarnings(taxonomy %>% separate(Taxon, c("Kingdom","Phylum","Class","Order","Family","Genus","Species"), sep=tax_sep))
taxonomy<-apply(taxonomy, 2, function(x) if_else(x=="", NA_character_, x))
taxonomy<-as.data.frame(taxonomy)
rownames(taxonomy)<-taxonomy$Feature.ID
taxonomy$Feature.ID<-NULL
return(taxonomy)
}

qza_to_phyloseqnew<-function(features,tree,taxonomy,metadata, tmp){

if(missing(features) & missing(tree) & missing(taxonomy) & missing(metadata)){

stop("At least one required artifact is needed (features/tree/taxonomy/) or the metadata.")

}
if(missing(tmp)){tmp <- tempdir()}
argstring<-""
if(!missing(features)){

features<-read_qza(features, tmp=tmp)$data

argstring<-paste(argstring, "otu_table(features, taxa_are_rows=T),")

}
if(!missing(taxonomy)){

taxonomy<-read_qza(taxonomy, tmp=tmp)$data

```
taxonomy<-parse_taxonomynew(taxonomy)
```
```
taxonomy<-as.matrix(taxonomy)
```

argstring<-paste(argstring, "tax_table(taxonomy),")

}
if(!missing(tree)){
```
tree<-read_qza(tree, tmp=tmp)$data
```

argstring<-paste(argstring, "phy_tree(tree),")

}
if(!missing(metadata)){
```
if(is_q2metadata(metadata)){
```
```
  metadata<-read_q2metadata(metadata)
```

  rownames(metadata)<-metadata$SampleID

```
  metadata$SampleID<-NULL
```
```
} else{
```

  metadata<-read.table(metadata, row.names=1, sep='\t', quote="", header=TRUE)

```
}
```

argstring<-paste(argstring, "sample_data(metadata),")

}
argstring<-gsub(",$","", argstring) #remove trailing ","
physeq<-eval(parse(text=paste0("phyloseq(",argstring,")")))
return(physeq)
}

mer_physeq_new <- qza_to_phyloseqnew(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq_new
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1942 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 1942 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 1942 tips and 1940 internal nodes ]
dim(mer_physeq_new@otu_table@.Data)
[1] 1942 441
dim(mer_physeq_new@tax_table@.Data)
[1] 1942 7

Thanks again for your help. I hope it could be improved in the updated qiime2R version @ Jordan Bisanz

system · October 27, 2020, 12:30am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.