Qiime2R error: taxa names do not match

Hi,

I’m trying to create a phyloseq object from some QIIME 2 artifacts. I have a problem similar to later posts in this question.

I have imported the data from QIIME 2 as follows:

> #Importing metadata
> HGS_metadata<-read_tsv("HGS_mapping_file.txt")
> HGS_metadata
> #Looks good
> 
> #Importing feature table
> HGS_features<-read_qza("HGS_merged_table_filtered3.qza")
> HGS_features$data[1:5,1:5]
> 
> #Importing taxonomy
> HGS_taxonomy<-read_qza("silva_HGS_taxonomy.qza")
> #Converting to taxtable
> HGS_taxtable<-HGS_taxonomy$data %>% as.tibble() %>% separate(Taxon, sep = ";", c("Kingdom","Phylum","Class","Order","Family","Genus","Species"))
> HGS_taxtable
> #Looks good I think
> 
> HGS_tree<-read_qza("insertion-tree.qza")
> HGS_tree$data

This was the command I used to try to create a phyloseq object:

HGS_phy<-phyloseq(otu_table(HGS_features$data, taxa_are_rows=T), phy_tree(HGS_tree$data), tax_table(as.data.frame(HGS_taxtable%>% select(-Confidence) %>% column_to_rownames("Feature.ID") %>% as.matrix()), sample_data(HGS_metadata %>% as.data.frame() %>% column_to_rownames("SampleID"))))

This is the error I got:

Error in validObject(.Object) : invalid class “phyloseq” object: 
 Component taxa/OTU names do not match.
 Taxa indices are critical to analysis.
 Try taxa_names()
In addition: Warning message:
In .local(object) : Coercing from data.frame class to character matrix 
prior to building taxonomyTable. 
This could introduce artifacts. 
Check your taxonomyTable, or coerce to matrix manually.

I tried making a Venn diagram of IDs shared between the feature table and taxonomy table with gplots as recommended in the linked post:
> gplots::venn(list(taxonomy=rownames(HGS_taxtable), featuretable=colnames(HGS_features)))
39%20am

When I view the HGS_features data and HGS_taxtable, I can see that feature IDs are definitely shared between the two even though the program doesn’t seem to be picking them up. I’m wondering if this is a problem with how I’m importing the feature table (which has no header names, just all the sample names listed across the top and feature IDs down the side) … but not sure how to fix this.

Thank you for any help!

Could you post the previews of your objects so I can take a look and try to figure out the issue?

Hi Jordan,

Thanks for agreeing to take a look. I’m not sure exactly what you’re asking for – is a preview the output of a specific command, or are you looking for a screenshot or copy-paste of the objects … ?

I’ve pasted the first few lines of the taxtable and feature table in the meantime in case that’s what you wanted?

> head(HGS_taxtable)
# A tibble: 6 x 9
  Feature.ID      Kingdom   Phylum    Class     Order     Family     Genus   Species  Confidence
  <fct>           <chr>     <chr>     <chr>     <chr>     <chr>      <chr>   <chr>         <dbl>
1 000071683f5ef8… D_0__Bac… D_1__Pro… D_2__Alp… D_3__Cau… D_4__Caul… NA      NA            1.000
2 000641f2935341… D_0__Bac… D_1__Pro… D_2__Del… D_3__Oli… D_4__0319… NA      NA            1.000
3 0008fceb4dfefc… D_0__Bac… D_1__Arm… D_2__unc… D_3__met… D_4__meta… D_5__m… D_6__me…      1.000
4 00093c7b060f8f… D_0__Bac… D_1__Pla… D_2__Pla… D_3__Gem… D_4__Gemm… D_5__u… NA            0.999
5 000b2eeefd90bd… D_0__Bac… D_1__Aci… D_2__The… D_3__The… D_4__Ther… D_5__S… NA            1.000
6 000c1c5797d21f… D_0__Bac… D_1__Bac… D_2__Bac… D_3__Cyt… D_4__Micr… D_5__u… D_6__me…      0.987
> HGS_features$data[1:5, 1:5]
                                 PB11A1.PlayfordHGS.CAS.2016 PB11A2.PlayfordHGS.CAS.2016
0008fceb4dfefccceae9bc9980f70df8                           0                           0
000b2eeefd90bdf6052dc1287b0615cb                           0                           0
000c1c5797d21feba7165b842eef553a                           0                           0
000c729b6daaebda387c5ba21ca7b560                           0                           0
000cac2874f76f8eaf4d1fa39bddfaff                           0                           0
                                 PB11AC.PlayfordHGS.CAS.2016 PB11AC2.PlayfordHGS.CAS.2016
0008fceb4dfefccceae9bc9980f70df8                           0                            0
000b2eeefd90bdf6052dc1287b0615cb                           0                            0
000c1c5797d21feba7165b842eef553a                           0                            0
000c729b6daaebda387c5ba21ca7b560                           0                            0
000cac2874f76f8eaf4d1fa39bddfaff                           0                            0
                                 PB11N1.PlayfordHGS.CAS.2016
0008fceb4dfefccceae9bc9980f70df8                           0
000b2eeefd90bdf6052dc1287b0615cb                           0
000c1c5797d21feba7165b842eef553a                           0
000c729b6daaebda387c5ba21ca7b560                           0
000cac2874f76f8eaf4d1fa39bddfaff                           0

Thanks,
Matilda

Hi @Matilda_H-D! This is a bit of a “just passing through, don’t mind me” type of comment, but your venn diagram looks pretty weird to me - why are there zero column names in the feature table? Also, why are you using the column names for that comparison, since your feature table has features as rows, not _columns? Just some food for thought.

Sorry for not seeing this earlier, Mathew is correct re the issue with the venn plot. In your code:
as.data.frame(HGS_taxtable%>% select(-Confidence) %>% column_to_rownames("Feature.ID") %>% as.matrix()
I have the sneaking suspicion that you are loosing your row names in the conversion when you make your phyloseq object. Perhaps this would fix your issue:
(HGS_taxtable %>% select(-Confidence) %>% as.data.frame() %>% rownames_to_column("Feature.ID))

1 Like