general discussion of qiime2r tutorial

Hi @jbisanz,

I have some trouble with the following script as it gives me an error which I searched online but could not rectify it. Do you have some time to look into it?
Thank you,

metadata <- read_q2metadata(“qiime2/metadata.tsv”)
SVs <- read_qza(“table-with-no-mt-no-chlp.qza”)$data
taxonomy <- read_qza(“qiime2/taxonomy-single-end.qza”)
SVs <- apply(SVs, 2, function(x) x/sum(x)*100)
SVsToPlot <- data.frame(MeanAbundance=rowMeans(SVs)) %>%

  • rownames_to_column(“Feature.ID”) %>%
  • arrange(desc(MeanAbundance)) %>%
  • top_n(30, MeanAbundance) %>%
  • pull(Feature.ID)

SVs %>%

  • as.data.frame()%>%
  • rownames_to_column(“Feature.ID”) %>%
  • gather(-Feature.ID, key=“SampleID”, value = “Abundance”) %>%
  • mutate(Feature.ID=if_else(Feature.ID %in% SVsToPlot, Feature.ID, “Remainder”)) %>%
  • group_by(SampleID, Feature.ID) %>%
  • summarize(Abundance=sum(Abundance)) %>%
  • left_join(metadata) %>%
  • mutate(NormAbundance=log10(Abundance+0.01)) %>%
  • left_join(taxonomy) %>%
  • mutate(Feature=paste(Feature.ID, Taxon)) %>%
  • mutate(Feature=gsub("[kpcofgs__", “”, Feature)) %>%
  • ggplot(aes(x=SampleID, y=Feature, fill= NormAbundance))+
  • geom_tile()+
  • facet_grid(~category, scales = “free-x”) +
  • theme_q2r() +
  • theme(axis.text.x = element_text(angle = 45, hjust=1)) +
  • scale_fill_viridis_c(name= “log10(% Abundance)”)
    summarise() regrouping output by ‘SampleID’ (override with .groups argument)
    Joining, by = “SampleID”
    Error: x and y must share the same src, set copy = TRUE (may be slow).
    Run rlang::last_error() to see where the error occurred.

rlang::last_error()
<error/rlang_error>
x and y must share the same src, set copy = TRUE (may be slow).
Backtrace:

  1. base::as.data.frame(.)
  2. tibble::rownames_to_column(., “Feature.ID”)
  3. tidyr::gather(., -Feature.ID, key = “SampleID”, value = “Abundance”)
  4. dplyr::mutate(…)
  5. dplyr::group_by(., SampleID, Feature.ID)
  6. dplyr::summarize(., Abundance = sum(Abundance))
  7. dplyr::left_join(., metadata)
  8. dplyr::mutate(., NormAbundance = log10(Abundance + 0.01))
  9. dplyr::left_join(., taxonomy)
  10. dplyr::auto_copy(x, y, copy = copy)
  11. dplyr:::glubort(…)
    Run rlang::last_trace() to see the full context.

rlang::last_trace()
<error/rlang_error>
x and y must share the same src, set copy = TRUE (may be slow).
Backtrace:

  1. └─%>%(…)
  2. ├─base::withVisible(eval(quote(_fseq(_lhs)), env, env))
  3. └─base::eval(quote(_fseq(_lhs)), env, env)
  4. └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
    
  5.   └─`_fseq`(`_lhs`)
    
  6.     └─magrittr::freduce(value, `_function_list`)
    
  7.       └─function_list[[i]](value)
    
  8.         ├─dplyr::left_join(., taxonomy)
    
  9.         └─dplyr:::left_join.data.frame(., taxonomy)
    
  10.           └─dplyr::auto_copy(x, y, copy = copy)
    
  11.             └─dplyr:::glubort(...)

That is an interesting one… I have not seen this before. I would start by updating dplyr to the latest version and starting a fresh session. By any chance are you working on a remote server?

Thank you @jbisanz for getting back so quickly - I really appreciate that. The RStudio is installed on my computer (standalone), however, the data I am analyzing is coming from the institute server (supercomputer).

Do you want me to start a fresh session and get dplyr updated? The same script was working two days ago on the same table and then suddenly it start behaving differently.

Could you please:
1- transfer the relevant artifacts to your personal computer
2- make sure you have the working directory set to one on your computer
3- Rerun the code and see if you can replicate this error?

@jbisanz Thank you for your insight. Starting a new session worked for me.

Second, is there a way to show features on a heatmap at the higher taxonomical level (Class, Order)?

Thanks,
Irshad

Hi @jbisanz,

I get an error with the following while making a taxa barplot. The metadata file has two columns i.e. first coulmn = Sample ID, second column = category (Ff, Pb, and NA). Can I get rid of the following error? If I remove the “category” then the script works but I want the bar plots to be separated (facet grids) based on the above categories.

metadata<-read_q2metadata(“qiime2/metadata.tsv”)
SVs<-read_qza(“table-with-no-mt-no-chlp.qza”)$data
taxonomy<-read_qza(“qiime2/taxonomy-single-end.qza”)$data %>% parse_taxonomy()
taxasums<-summarize_taxa(SVs, taxonomy)$Genus
taxa_barplot(taxasums, metadata, “category”)
Error in get(category) : invalid first argument

Ha, this is a very funny error which appears to be caused by the fact that the variable inside the function is named category. Please rename the column to anything but “category” and it should work. I will patch this in the future.

Hello,
Your tutorial has been very helpful. We were able to make volcano plots and CLR bar plots for our differential taxa using your script. However, I was inputting the commands for the differential abundance volcano plot again, but this time trying to take out the leading higher taxonomy levels (K, P, O) to only label the lowest unambiguous taxonomic names (usually family and genus, but sometimes class) and it seems that our taxonomy file is not formatted specifically for your script adjustment mutate(Feature=paste(Phylum, Genus)). Similarly, when we try to parse our taxonomy file: it gives the error:

taxonomy<-parse_taxonomy(taxonomy)
Error in parse_taxonomy(taxonomy) :
Table does not match expected format (colnames(obj) are Feature.ID, Taxon, (Confidence OR Consensus))

#so maybe the datatype in the second column (Taxon) is incorrect, or maybe it doesn’t like the spaces in between the taxon names.

head(taxonomy)

A tibble: 6 x 2

Feature.ID Taxon

1 TACGAAGGGGGCTAGCGTTGTTCGGAATCACTGGGCGTAAAG… k_ Bacteria; p Proteobacteria; c__Alph…
2 TACGTAGGGGGCAAGCGTTGTCCGGAATCATTGGGCGTAAAG… k
Bacteria; p Actinobacteria; c__Ther…
3 TCCTGTTTGCTCCCCACGCTTTCGCGCCTCAGCGTCAGTAAT… k
Bacteria; p Proteobacteria; c__Delt…
4 TACGTAGGGGGCGAGCGTTGTCCGGAATGACTGGGCGTAAAG… k
Bacteria; p Firmicutes; c__Clostrid…
5 TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAG… k
Bacteria; p Proteobacteria; c__Alph…
6 TCCTGTTTGCTACCCACACTTTCGTGCCTGAGCGTCAGTTAC… k
Bacteria; p _Bacteroidetes; c__Ignav…

There isn’t a third column data (confidence/consensus) because to the original taxonomy file was exported out of R by a third party service and then I imported the taxonomy file into Qiime2 to create .qza format used here. Below is the script we used. Could this error be circumvented with some creative R script using the current format? If the third column is the problem, is there a way to disregard the empty third column or would be have to create a bogus third column to use this script as is? Or is the issue with the first or second column?

Thanks,
Christina

metadata<-readr::read_tsv(“Modified-Macaque-skinmicrobiome-metadata-LB-July2020-R.tsv”)
SVs<-read_qza(“asv_feature-table_2013_2015TS_75.qza”)$data
results<-read_qza(“ALDEx2_TS_75animals/differentials_75TS.qza”)$data
taxonomy<-read_qza(“taxonomy_for_qiime.qza”)$data
tree<-read_qza(“rooted-tree_17May2020.qza”)$data
#Find significantly different taxa (less than .01 p-value) and volcano plot
results %>%
left_join(taxonomy) %>%
mutate(Significant=if_else(we.eBH<0.01, TRUE, FALSE)) %>%
mutate(Taxon=as.character(Taxon)) %>%
mutate(Taxon=gsub("[[kpco]__*;]", “”, Taxon)) %>%
mutate(TaxonToPrint=if_else(we.eBH<0.01, Taxon, “”)) %>% #only provide a label to signifcant results
ggplot(aes(x=diff.btw, y=-log10(we.ep), color=Significant, label=TaxonToPrint)) +
geom_text_repel(size=3, nudge_y=0.1) +
geom_point(alpha=0.6, shape=16) +
theme_q2r() +
xlab(“log2(fold change); Differences Between”) +
ylab("-log10(P-value)") +
theme(legend.position=“none”) +
scale_color_manual(values=c(“black”,“red”))
ggsave(“volcano_75TS_0.01_shorter_taxa_d.pdf”, height=9, width=9, device=“pdf”)

Hi Christina,

Easiest fix would be to just add a dummy column with something like:

taxonomy<-read_qza(“taxonomy_for_qiime.qza”)$data %>% mutate(Confidence=1)

Thanks @jbisanz , a quick question: in taxa_barplot function the example command given is:

taxa_barplot(taxasums, metadata, "body-site")

is there an easy way to sort the order of the samples by another metadata column? For example if we have a variable of time and wanted to show the samples faceted by body-site and ordered by increasing time within the facets?

If you sort your table by time and then convert sample ID to be a factor with that sorted order it should print out as you are looking for!

Hi! right now i’m using this Package(qiime2R) and following one of your tutorials, but the function taxa_barplot it’s not available? does any other function that i can use for build up my taxonomy bar plot?
Thank you!!!

Hello @LiiBotero,

Welcome to the QIIME 2 forum! If you don’t mind a Python solution, you can take a look at the taxa_abundance_bar_plot method I wrote.

1 Like

Thank you so much for such a useful Tutorial. I was able to generate all the graphs, but I am more interested in having A graph base on Family level (group the family base on their names), not feature ID. Is there a way to do that?

Is it also possible to get a picture like this (please see figure 4 end of this page) (https://github.com/hollylutz/BatMP/blob/master/Figure4_Scatterplots.ipynb) with QiimeR?

Yes, just use the summarize_taxa() command first and pass the Family-summarized abundance table to the heatmap function.

1 Like

It should be relatively straight forward to reproduce the plot using ggplot2 with data imported using qiime2R.

1 Like

Hello,

I have gotten to the stage where I have a heat map!! But, it has no colour on it? I presume I have done something wrong in the code at some point?

No colour eh? Could you maybe post what you get and the code as you wrote it? I imagine it could be scaling issue or somehow NAs are getting introduced.


Hello, I have attached an image of the code.