Hi,
I ran QIIME2 for my raw reads via jupyter using %%bash
in the shared Mac for heavy analysis (also a little Python for data viz). After that, we copied several outputs to build a phyloseq
object in our personal computer (Win).
Previously, I also did that but found an interesting topic here to mix run Python and R in one Jupyter, so I tried it. Because I am not comfortable with jupyter yet, I only stopped at building phyloseq (also for others to use the shared Mac).
# in bash
# install brew first, check brew.sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# then install R and libgit2, either from jupyter or shell
brew install r
brew install libgit2 # this is for the R devtools package
# in Jupyter
%load_ext rpy2.ipython
%%R
if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools", repos = "https://cloud.r-project.org")}
devtools::install_github("jbisanz/qiime2R", quiet = TRUE, upgrade = FALSE)
%%R
library(qiime2R)
%%R
physeq <- qza_to_phyloseq(
features="table.qza",
tree="rooted-tree.qza",
taxonomy="taxonomy.qza",
metadata = "metadata.tsv"
)
%%R
saveRDS(physeq, file = "ps_mother.rds")
It runs without any error notice, but no output. Is this normal?
So it went well and we continued the analysis with RStudio by ps0 <- readRDS("ps_mother.rds")
to do some filtering. After finishing, we export it back to make both new taxonomy2.(qza|qzv)
s.
Only to realize that there is a small difference in our taxonomy files. We are missing the taxonomy level prefixes.
Filenames | Different taxa names |
---|---|
taxonomy.qzv | d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Rikenellaceae; g__Alistipes; s__uncultured_bacterium |
taxonomy2.qzv | d__Bacteria;Bacteroidota;Bacteroidia;Bacteroidales;Rikenellaceae;Alistipes;uncultured_bacterium |
We then tried to investigate at what step does this change occurred by open ps_mother.rds
first time in R.
ps0 <- readRDS("ps_mother.rds")
ps0 %>% tax_table() %>% data.frame() %>% filter(Species != "<NA>") %>% head(1)
Kingdom Phylum Class Order
f876bfbfd29d664d606243d9f95edf84 d__Bacteria Proteobacteria Alphaproteobacteria Rickettsiales
Family Genus Species
f876bfbfd29d664d606243d9f95edf84 Mitochondria Mitochondria uncultured_bacterium
We found that it is already like this from the first phyloseq object. Do you perhaps know what is wrong here? We worried that other invisible changes might have also occurred.
Our current solution here is to just go back to our regular steps (i.e. build our phyloseq inside R), but we are curious about what happened here.
Any thoughts are appreciated.
Thank you very much.
R `sessionInfo()`
R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rlang_1.0.2 ggrepel_0.9.1 biomformat_1.22.0
[4] RColorBrewer_1.1-3 ggstatsplot_0.9.3 cowplot_1.1.1
[7] decontam_1.12.0 speedyseq_0.5.3.9018 phyloseq_1.38.0
[10] qiime2R_0.99.6 forcats_0.5.1 stringr_1.4.0
[13] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[16] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[19] tidyverse_1.3.1 here_1.0.1
loaded via a namespace (and not attached):
[1] readxl_1.4.0 backports_1.4.1 Hmisc_4.7-0
[4] plyr_1.8.7 igraph_1.3.1 splines_4.1.3
[7] TH.data_1.1-1 GenomeInfoDb_1.30.1 digest_0.6.29
[10] foreach_1.5.2 htmltools_0.5.2 fansi_1.0.3
[13] magrittr_2.0.3 checkmate_2.1.0 paletteer_1.4.0
[16] cluster_2.1.3 tzdb_0.3.0 Biostrings_2.62.0
[19] modelr_0.1.8 vroom_1.5.7 sandwich_3.0-1
[22] jpeg_0.1-9 colorspace_2.0-3 rvest_1.0.2
[25] haven_2.5.0 xfun_0.31 crayon_1.5.1
[28] RCurl_1.98-1.6 jsonlite_1.8.0 zeallot_0.1.0
[31] survival_3.3-1 zoo_1.8-10 iterators_1.0.14
[34] ape_5.6-2 glue_1.6.2 gtable_0.3.0
[37] zlibbioc_1.40.0 emmeans_1.7.4-1 XVector_0.34.0
[40] statsExpressions_1.3.2 Rhdf5lib_1.16.0 BiocGenerics_0.40.0
[43] scales_1.2.0 mvtnorm_1.1-3 DBI_1.1.2
[46] Rcpp_1.0.8.3 performance_0.9.0 xtable_1.8-4
[49] htmlTable_2.4.0 bit_4.0.4 foreign_0.8-82
[52] Formula_1.2-4 stats4_4.1.3 DT_0.23
[55] truncnorm_1.0-8 datawizard_0.4.1 htmlwidgets_1.5.4
[58] httr_1.4.3 ellipsis_0.3.2 farver_2.1.0
[61] pkgconfig_2.0.3 NADA_1.6-1.1 nnet_7.3-17
[64] dbplyr_2.1.1 utf8_1.2.2 labeling_0.4.2
[67] tidyselect_1.1.2 reshape2_1.4.4 munsell_0.5.0
[70] cellranger_1.1.0 tools_4.1.3 cli_3.3.0
[73] generics_0.1.2 ade4_1.7-19 broom_0.8.0
[76] evaluate_0.15 fastmap_1.1.0 yaml_2.3.5
[79] rematch2_2.1.2 bit64_4.0.5 knitr_1.39
[82] fs_1.5.2 nlme_3.1-157 xml2_1.3.3
[85] correlation_0.8.1 compiler_4.1.3 rstudioapi_0.13
[88] png_0.1-7 zCompositions_1.4.0-1 reprex_2.0.1
[91] stringi_1.7.6 parameters_0.18.1 lattice_0.20-45
[94] Matrix_1.4-1 vegan_2.6-2 permute_0.9-7
[97] multtest_2.50.0 vctrs_0.4.1 pillar_1.7.0
[100] lifecycle_1.0.1 rhdf5filters_1.6.0 estimability_1.3
[103] data.table_1.14.2 bitops_1.0-7 insight_0.17.1
[106] patchwork_1.1.1 R6_2.5.1 latticeExtra_0.6-29
[109] gridExtra_2.3 IRanges_2.28.0 codetools_0.2-18
[112] MASS_7.3-57 assertthat_0.2.1 rhdf5_2.38.1
[115] rprojroot_2.0.3 withr_2.5.0 multcomp_1.4-19
[118] S4Vectors_0.32.4 GenomeInfoDbData_1.2.7 mgcv_1.8-40
[121] bayestestR_0.12.1 parallel_4.1.3 hms_1.1.1
[124] grid_4.1.3 rpart_4.1.16 coda_0.19-4
[127] rmarkdown_2.14 Biobase_2.54.0 lubridate_1.8.0
[130] base64enc_0.1-3