OTU number loss using qza_to_phyloseq (qiime2R) import .qza data to R

Hi,
I have used the qza_to_phyloseq (qiime2R) to import the .qza data to the R successfully. However, when I check the feature-table and taxonomy data, I found the number of OTU was different between QIIME2 and R. There were 1942 OTU in the feature-table.tsv file of QIIME2, while only 536 OTUs left after being imported into R, about 3/4 OTUs loss. I can not figure out what happend. Did anyone else meet the same issue?

Qiime2:
wc -l feature-table.tsv
1943 feature-table.tsv

R

nrow(mer_physeq@[email protected])
[1] 536

Could you please email me the OTU table artifact and I will look into this? [email protected].

Thank you very much for your reply and help. The OTU table artifacts have been sent to your email.

I'm not sure what is going on there as they appear to be getting imported correctly:

tsv<-read.table("~/Desktop/test/feature-table.tsv", skip=1, sep="\t", row.names=1, comment="", header=TRUE)
q2<-read_qza("~/Desktop/test/feature-table.qza")$data
phy<-qza_to_phyloseq(features="~/Desktop/test/feature-table.qza")

sum(tsv)
[1] 11867121
sum(q2)
[1] 11867121
sum(phy)
[1] 11867121

dim(tsv)
[1] 1942 441
dim(q2)
[1] 1942 441
dim(phy)
[1] 1942 441

This could be some strange behavior of phyloseq but to know what is happening I would need to see the code you are using and all the inputs to qza_to_phyloseq.

Thanks for your help. I import data again according to your method and I found a strange thing. When I import the feature-table and taxonomy separately with qza_to_phyloseq, it seems to work correctly and get 1942 OTU. However, when I import them together with the metadata, the strange thing happens, and only 536 OTU output.

mer_physeq <- qza_to_phyloseq(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
nrow(mer_physeq@[email protected])
[1] 536
nrow(mer_physeq@[email protected])
[1] 536
fea_physeq <- qza_to_phyloseq(features = "feature-table.qza")
nrow([email protected])
[1] 1942
taxo_physeq <- qza_to_phyloseq(taxonomy = "taxonomy.qza")
nrow([email protected])
[1] 1942
notree_physeq <- qza_to_phyloseq(features = "feature-table.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
nrow(notree_physeq@[email protected])
[1] 536
nrow(notree_physeq@[email protected])
[1] 536

Does it mean I need to use qza_to_phyloseq to import data separately instead of importing all the data together?

There is qza_to_phyloseq() function is just a wrapper for a number of individual read_qza() functions which then uses phyloseq:phyloseq() to build the object. If the individual items are imported correctly, it would suggest the issue is coming from phyloseq. I wonder if there is an issue with sample names which is causing you to lose samples, and then all 0-features are being stripped on import?

Perhaps you could try manually making the object and see if the issue persists. Maybe instead of nrow you could use dim to make sure the right number of columns are also there?

Thanks again! I check the data again and using dim() to show the data, there was no sample loss in data import(441 samples):

mer_physeq <- qza_to_phyloseq(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 536 tips and 535 internal nodes ]
dim(mer_physeq@[email protected])
[1] 536 441
dim(mer_physeq@[email protected])
[1] 536 8
fea_physeq <- qza_to_phyloseq(features = "feature-table.qza")
dim([email protected])
[1] 1942 441
taxo_physeq <- qza_to_phyloseq(taxonomy = "taxonomy.qza")
dim([email protected])
[1] 1942 8
notree_physeq <- qza_to_phyloseq(features = "feature-table.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
notree_physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 536 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 536 taxa by 8 taxonomic ranks ]
dim(notree_physeq@[email protected])
[1] 536 441
dim(notree_physeq@[email protected])
[1] 536 8

It's too weird for me. When I try to go through the code of phyloseq:phyloseq(), I found the rownames difference between feature-table and taxonomy may contribute to this error. So I check the rownames of both, I found the key to the problem, the rownames are not exactly the same, there are some blank space in the rownames of taxonomy, like 10139.

rownames([email protected])[1:10]
[1] "1001762" "1006042" "1009722" "1013134" "10139" "1015143" "1016138" "1017181" "1018892" "1020931"
rownames([email protected])[1:10]
[1] "1001762" "1006042" "1009722" "1013134" " 10139" "1015143" "1016138" "1017181" "1018892" "1020931"
table(rownames([email protected])%in%rownames([email protected]))
FALSE TRUE
1406 536

Thus, only 536 OTUs were extracted from the data. I go back to check the code of qiime2R:qza_to_phyloseq(), and I found the parse_taxonomy() function, which generates a taxonomy table, may be involved in it. I go through the code, and it showed that when the taxonomy$Feature.ID is num, it generates the rownames with blank space. I defined the taxonomy$Feature.ID as the character before using it as rownames. Finally, I got the correct result with 1942 OTUs output.

parse_taxonomynew <- function(taxonomy, tax_sep, trim_extra){

  • if(missing(taxonomy)){stop("Taxonomy Table not supplied.")}
  • if(missing(trim_extra)){trim_extra=TRUE}
  • if(missing(tax_sep)){tax_sep="; |;"}
  • if(sum(colnames(taxonomy) %in% c("Feature.ID","Taxon","Confidence", "Consensus"))!=3){stop("Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))")}
  • taxonomy$Confidence<-NULL
  • taxonomy$Consensus<-NULL # new add
  • taxonomy$Feature.ID<-as.character(taxonomy$Feature.ID) # new add
  • if(trim_extra){
  • taxonomy$Taxon<-gsub("[kpcofgs]__","", taxonomy$Taxon) #remove leading characters from GG
    
  • taxonomy$Taxon<-gsub("D_\\d__","", taxonomy$Taxon) #remove leading characters from SILVA
    
  • }
  • taxonomy<-suppressWarnings(taxonomy %>% separate(Taxon, c("Kingdom","Phylum","Class","Order","Family","Genus","Species"), sep=tax_sep))
  • taxonomy<-apply(taxonomy, 2, function(x) if_else(x=="", NA_character_, x))
  • taxonomy<-as.data.frame(taxonomy)
  • rownames(taxonomy)<-taxonomy$Feature.ID
  • taxonomy$Feature.ID<-NULL
  • return(taxonomy)
  • }

qza_to_phyloseqnew<-function(features,tree,taxonomy,metadata, tmp){

  • if(missing(features) & missing(tree) & missing(taxonomy) & missing(metadata)){
  • stop("At least one required artifact is needed (features/tree/taxonomy/) or the metadata.")
    
  • }
  • if(missing(tmp)){tmp <- tempdir()}
  • argstring<-""
  • if(!missing(features)){
  • features<-read_qza(features, tmp=tmp)$data
    
  • argstring<-paste(argstring, "otu_table(features, taxa_are_rows=T),")
    
  • }
  • if(!missing(taxonomy)){
  • taxonomy<-read_qza(taxonomy, tmp=tmp)$data
    
  • taxonomy<-parse_taxonomynew(taxonomy)
    
  • taxonomy<-as.matrix(taxonomy)
    
  • argstring<-paste(argstring, "tax_table(taxonomy),")
    
  • }
  • if(!missing(tree)){
  • tree<-read_qza(tree, tmp=tmp)$data
    
  • argstring<-paste(argstring, "phy_tree(tree),")
    
  • }
  • if(!missing(metadata)){
  • if(is_q2metadata(metadata)){
    
  •   metadata<-read_q2metadata(metadata)
    
  •   rownames(metadata)<-metadata$SampleID
    
  •   metadata$SampleID<-NULL
    
  • } else{
    
  •   metadata<-read.table(metadata, row.names=1, sep='\t', quote="", header=TRUE)
    
  • }
    
  • argstring<-paste(argstring, "sample_data(metadata),")
    
  • }
  • argstring<-gsub(",$","", argstring) #remove trailing ","
  • physeq<-eval(parse(text=paste0("phyloseq(",argstring,")")))
  • return(physeq)
  • }

mer_physeq_new <- qza_to_phyloseqnew(features = "feature-table.qza",tree = "tree.qza",taxonomy = "taxonomy.qza",metadata = "merge_metadata.tsv")
mer_physeq_new
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1942 taxa and 441 samples ]
sample_data() Sample Data: [ 441 samples by 9 sample variables ]
tax_table() Taxonomy Table: [ 1942 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 1942 tips and 1940 internal nodes ]
dim(mer_physeq_new@[email protected])
[1] 1942 441
dim(mer_physeq_new@[email protected])
[1] 1942 7

Thanks again for your help. I hope it could be improved in the updated qiime2R version :pray: @ Jordan Bisanz

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.