QIIME2R, silva, and phyloseq

Hey again!
I have been trying out the new Silva 138 (SILVA 138 Classifiers). I have have used both methods to create phyloseq object for both Silva 132 and 138. When compare both methods using all.equal (), both outcomes for Silva 132 are OK. However, for 138, both outcomes differed in the number of NAs in tax_table. After checking NAs in the Silva138 tax table, I could notice Phyloseq () was correct. Something is happening with qiimeR2. I am happy with my objects created with phyloseq () but thought it was good to let you guys know about this. :slight_smile:

1 Like

Interesting, could you possibly send me the files or share lines from the files where the import differs?

1 Like

Hi @jbisanz, this is what I did Supportcode.txt (4.8 KB). I will also send you the .qza in a message.

1 Like

@Cotissima: I'm sorry maybe I'm out off topic a little bit.

I've been trying to create phyloseq object woth the new Silva 138 with qiime2R, but I'm stucked.

This is the error message:
df must be a data frame without row names in column_to_rownames().
This is the codes which involved the taxonomy:
error_making_phyloseq.txt (3.3 KB)

I' m already try this methods:

[Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R]
But still stucked

Can I ask your workflow codes to import basic qiime2 files to successfully make phyloseq object?

Thank you and sorry for OOT.

1 Like

hi @didietkeren,
I actually had the same problem as you. What I did is trying the tax_table() command line by line to see what was wrong. I realised that when you use as.data.frame()at the beginning. The structure of your data changes to: Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’ rather than just data.frame. I think that is what generates the issue.
It worked by not including the as.data.frame bit

                phy_tree(tree$data),
                tax_table( taxtable138 %>%
                             select(-Confidence) %>%
                             column_to_rownames(var = "Feature.ID") %>%
                             as.matrix()),
                sample_data(metadata %>% 
                             remove_rownames() %>%
                             column_to_rownames(var = "SampleID"))
                )

hope this works!

3 Likes

I’ve looked into this and both methods appear to give the same import when I adjust the string splitting on line 9 below.

library(tidyverse)
library(qiime2R)
library(phyloseq)

setwd("~/Desktop/silvacheck/")
tax_manual<-
  read_qza("taxonomy_silva138.qza")$data %>% 
  as_tibble() %>% 
  separate (Taxon, sep="; ", c("Kingdom","Phylum","Class","Order","Family","Genus","Species")) %>% # need to change sep to "; " from ";" to remove leading space in SILVA taxonomy strings
  as.data.frame() %>%
  column_to_rownames("Feature.ID") %>%
  select(-Confidence)

tax_phy<-qza_to_phyloseq(features="merged-table.qza",tree = "rooted-tree.qza", taxonomy="taxonomy_silva138.qza", metadata = "JFFS_metadata_qiime2.tsv") %>% tax_table() %>% as.data.frame()

merger<-
tax_phy %>% as.data.frame() %>% rownames_to_column("SV") %>% gather(-SV, key="Level", value="Phy_Assignment") %>%
  full_join(
    tax_phy %>% as.data.frame() %>% rownames_to_column("SV") %>% gather(-SV, key="Level", value="Manual_Assignment")
  )
  
merger %>% filter(Phy_Assignment!=Manual_Assignment) # nothing
merger %>% filter(is.na(Phy_Assignment) | is.na(Manual_Assignment)) #nothing with discordant NAs

I think the issue you are having is coming from lines like this:

036a90f816e0e135614f344bdf517e5d d__Bacteria; p__Firmicutes; c__Clostridia; o__Oscillospirales; f__Oscillospiraceae; g__NK4A214_group;Ambiguous_taxa

This is something I have not seen before in SILVA and needs to be account for. Could you please look into the taxonomic assignment of this feature in your 132 dataset and/or send me the artifact and I will take a closer look and do some thinking on the best way to catch and remove?

1 Like

@jbisanz that make sense. If I understand correctly, the problem is generated due to the lack of space after the “;” when moving into species level? I have sent you the .qza taxonomy file for SILVA 132.
I am a bit confused now. If the problem is in the taxonomy file itself, why do I only see it in one of the import methods?

Thanks so much for looking into this

The method used to parse the taxonomy strings between the methods is slightly different. I am going to write an additional function for parsing taxonomy strings to help streamline this for the future.
Jordan

3 Likes

I have not used qiime2r yet, but I have used phyloseq with silva data. The function that I wrote to parse silva128 taxonomy strings can be found in the MicrobiomeR package vignette. The taxonomy strings may have changed in between 128 and 138, but this will give you an idea of how to proceed. You might also look at phyloseq’s parsing functions for inspiration, which is what I based my function on.

If you end up writing a new function please make a gist, or create a PR/issue in MicrobiomeR or phyloseq.

I actually did end up making this function at some point in the past as below. It would allow for "; " or “;” to be the separator and attempts to parse off the gg and silva prefixes (ex p__ D_1__. The only issue with this function is that it has come to my attention that some lesser used databases such as MIDORI can skip levels in which case a strategy that identifies the taxonomic level based on the prefix would be preferable.


parse_taxonomy {qiime2R} R Documentation

Parse Q2 taxonomy

Description

Parse Q2 taxonomy

Usage

parse_taxonomy(taxonomy, tax_sep, trim_extra)

Arguments

taxonomy the imported taxonomy table (the result of read_qza(“yourfile.qza”)$data containing columns Feature.ID, Taxon, and Confidence)
tax_sep The separator between taxonomic levels. Defaults to one compatible with both GreenGenes and SILVA ("; " OR “;”)
trim_extra Remove leading characters from taxonomic levels: ex: k__ or D_0__. TRUE/FALSE. default=TRUE

Value

a data.frame with feature IDs as row names and the columns: Kingdom, Phylum, Class, Order, Family, Genus, Species

Examples

Not run: taxonomy<-parse_taxonomy(taxonomy_artifact)

[Package qiime2R version 0.99.34 Index]


function(taxonomy, tax_sep, trim_extra){
if(missing(taxonomy)){stop(“Taxonomy Table not supplied.”)}
if(missing(trim_extra)){trim_extra=TRUE}
if(missing(tax_sep)){tax_sep="; |;"}
if(sum(colnames(taxonomy) %in% c(“Feature.ID”,“Taxon”,“Confidence”, “Consensus”))!=3){stop(“Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))”)}

taxonomy$Confidence<-NULL
if(trim_extra){
taxonomy$Taxon<-gsub("[kpcofgs]","", taxonomy$Taxon) #remove leading characters from GG
taxonomy$Taxon<-gsub("D_\d
","", taxonomy$Taxon) #remove leading characters from SILVA
}
taxonomy<-suppressWarnings(taxonomy %>% separate(Taxon, c(“Kingdom”,“Phylum”,“Class”,“Order”,“Family”,“Genus”,“Species”), sep=tax_sep))
taxonomy<-apply(taxonomy, 2, function(x) if_else(x=="", NA_character_, x))
taxonomy<-as.data.frame(taxonomy)
rownames(taxonomy)<-taxonomy$Feature.ID
taxonomy$Feature.ID<-NULL
return(taxonomy)
}

Nice, @jbisanz!

I did some more digging and found that you can export table.qza to BIOM and rooted-tree.qza to Newick format. I’m sure you can do the same with the silva138.qza. The metadata is already in an appropriate format, so after conversion, you could import your data with phyloseq and use their parsing functions. I am going to try this myself, so I will report back here with the results.
https://docs.qiime2.org/2020.8/tutorials/exporting/