Qiime2r file read issue

Hi everyone,
Wondering if anyone has encountered an issue with importing feature tables from .qza format into R with the qiime2R library (thanks @jbisanz!). There doesn’t appear to be a problem when working within the QIIME environment:

I can export it with QIIME’s tools:

qiime tools export --input-path my.qza --output-path tmp
biom head -i ./tmp/feature-table.biom

… and get this output:

# Constructed from biom file
#OTU ID	6212017EGA1	6212017EGA2	6212017EGA3	6212017EGA4	6212017EGB1
65d5fa6d7e70a2048699fd898caf1fca	12307.0	243.0	125.0	4266.0	23113.0

And by checking the format type:

file ./tmp/feature-table.biom

… it appears to suggest it is what we’d expect:

feature-table.biom: Hierarchical Data Format (version 5) data

My issue is when trying to import the .qza into R with qiime2R. After loading the library, I get this error message when trying to read the file:

Error in read_biom(paste0(tmp, "/", artifact$uui, "/data/feature-table.biom")) : 
  Both attempts to read input file:
/tmp//ffae49ca-489b-492e-a38f-de408cf88625/data/feature-table.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.

It’s strange to me that it isn’t recognized as a BIOM-v2 type, given that is exactly what the file program thinks it is.

Anyone ever run into this issue? Thanks for any suggestions!

Here’s the R session info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.4.3  qiime2R_0.99.1  forcats_0.4.0   stringr_1.4.0   dplyr_0.8.0.1   purrr_0.3.0     readr_1.3.1     tidyr_0.8.2     tibble_2.1.1   
[10] ggplot2_3.1.0   tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] nlme_3.1-137        phyloseq_1.26.1     lubridate_1.7.4     RColorBrewer_1.1-2  httr_1.4.0          tools_3.5.3         backports_1.1.3    
 [8] utf8_1.1.4          R6_2.4.0            vegan_2.5-4         rpart_4.1-13        Hmisc_4.2-0         lazyeval_0.2.2      BiocGenerics_0.28.0
[15] mgcv_1.8-27         colorspace_1.4-1    permute_0.9-4       ade4_1.7-13         nnet_7.3-12         withr_2.1.2         tidyselect_0.2.5   
[22] gridExtra_2.3       compiler_3.5.3      cli_1.1.0           rvest_0.3.2         Biobase_2.42.0      htmlTable_1.13.1    xml2_1.2.0         
[29] scales_1.0.0        checkmate_1.9.1     digest_0.6.18       foreign_0.8-71      XVector_0.22.0      base64enc_0.1-3     pkgconfig_2.0.2    
[36] htmltools_0.3.6     htmlwidgets_1.3     rlang_0.3.1         readxl_1.3.0        rstudioapi_0.9.0    generics_0.0.2      jsonlite_1.6       
[43] acepack_1.4.1       magrittr_1.5        Formula_1.2-3       biomformat_1.10.1   Matrix_1.2-15       Rcpp_1.0.1          munsell_0.5.0      
[50] S4Vectors_0.20.1    Rhdf5lib_1.4.2      fansi_0.4.0         ape_5.2             stringi_1.4.3       yaml_2.2.0          MASS_7.3-51.1      
[57] zlibbioc_1.28.0     rhdf5_2.26.2        plyr_1.8.4          grid_3.5.3          parallel_3.5.3      crayon_1.3.4        lattice_0.20-38    
[64] Biostrings_2.50.2   haven_2.1.0         splines_3.5.3       multtest_2.38.0     hms_0.4.2           knitr_1.21          pillar_1.3.1       
[71] igraph_1.2.4        codetools_0.2-16    stats4_3.5.3        glue_1.3.1          latticeExtra_0.6-28 data.table_1.12.0   modelr_0.1.4       
[78] foreach_1.4.4       cellranger_1.1.0    gtable_0.2.0        assertthat_0.2.0    xfun_0.5            broom_0.5.1         survival_2.43-3    
[85] iterators_1.0.10    IRanges_2.16.0      cluster_2.0.7-1

I ran into this issue my self last week. My issue was that there was a sample with 0 reads which was causing the biom format to not be read properly via the biomformat package that qiime2r uses. Can you check to see if this is the case with your biom file? You can strip the 0 samples with something like :

qiime feature-table filter-samples
–i-table $PWD/Output/SVtable_nofilt.qza
–p-min-features 1
–o-filtered-table $PWD/Output/SVtable.qza

Good call @jbisanz! After applying that --p-min-features requirement the resulting filtered .qza file was imported without issue. I compared this object against the original unfiltered object that had been converted from .qza to .biom to .tsv and everything looks identical.

So happy this was a simple fix.

Random question/request to the QIIME pros (@thermokarst - I'm looking at you :slight_smile: ) - is there a function that checks to see if there are empty rows or columns in a feature-table? I'm not clear what value there is in retaining features or samples with nothing but 0's (I'm sure there is some reason).

All I'm thinking is that a user could pass a feature-table and determine whether any rows or columns are empty:

qiime tools emptycheck --input-path feature-table

... where the output would flag something like:

## Warning -- the following Samples contained no Feature data:
  sample_173
  sample_289

.. (or)

## Warning -- the following Features contained no Sample data:
  106866132fd52b9d5c96c38bde87c51f
  4486a2c60a1eeb1e9e651102734f0aa4

Thanks again @jbisanz for the fix (and the program!)

Use feature-table summarize. Otherwise there is no such stand-alone checking function that operates as you propose (i.e., issues a warning to stdout).

Agreed: there is probably no use in keeping these. Hence, the filtering functions in QIIME 2 will automatically find and drop these features or samples if they are empty after filtering on one or the other axis of a feature table.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.