Integrating Q2 and R - Problem to plot PCoA

,

Hi everyone,
I'm following the qiime2 integration tutorial (Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R) and also the discussion's tutorial (PCoA plots with confidence ellipsoids - #2 by Nicholas_Bokulich) to plot my unweighted analysis and I'm having the error:

Error in left_join():
! by must be supplied when x and y have no common variables.
:information_source: Use cross_join() to perform a cross-join.
Run rlang::last_trace() to see where the error occurred.

Follow the codes that I ran:
metadata = readr::read_tsv("sample-metadata_Melipona11_PCoA.tsv")
uwunifrac<-read_qza("unweighted_unifrac_pcoa_results.qza")
shannon<-read_qza("shannon_vector.qza")

shannon<-shannon$data %>% rownames_to_column("SampleID")

uwunifrac$data$Vectors %>%
select(SampleID, PC1, PC2) %>%
left_join(metadata) %>%
left_join(shannon) %>%
ggplot(aes(x=PC1, y=PC2, color=TIME, shape=SAMPLES, size=shannon_entropy)) +
geom_point(alpha=0.5) +
theme_q2r() +
scale_shape_manual(values=c(16,1), name="SAMPLES")
scale_size_continuous(name="Shannon Diversity") +
scale_color_discrete(name="TIME") +
ggtitle("Unweighted UniFrac")+
stat_ellipse()

So, I tried to add the by and supply the x and y variables but I got the Error: object 'x' not found.

Does anyone have a simple way to fix it without creating a data frame for all data in each column, since I have a lot of data? Does anyone know what I'm doing wrong with the codes that I'm following according to the tutorial? Ps: I'm not an expert in R.

Thank you a lot for the help.

1 Like

Hello @Patricia_Azevedo,

The left_join function merges two data frames together column-wise, meaning columns from both data frames will be combined into a new data frame. To know how to line the rows up when when doing so, the function needs to know of one or more columns that are identical between the two data frames. By default the function will use variables (column names) that are the same in the two data frames. In your case, there were no such columns, so it tells you you need the by parameter to explain which column(s) match.

With this info in mind, I would try to work backwards to see where the assumptions of the tutorial are being violated, i.e. where/why your data frames don't have the column names expected. I see two calls to left_join in the code you provided, it could be coming from either one of those. You can use traceback() to find out or just start looking at the columns of each dataset with the colnames function to get a sense of what's going on.

2 Likes

Dear @colinvwood thank you for your answer.
I revised the qiime2 integration tutorial and saw that I didn't do the step: metadata<-
metadata %>% left_join(shannon). That's why my code didn't work.

Now I'm having a problem with the PCoA plot. I'm trying to run this code:

uwunifrac$data$Vectors %>%
select(SampleID, PC1, PC2) %>%
left_join(metadata) %>%
left_join(shannon) %>%
ggplot(aes(x=PC1, y=PC2, color='SAMPLES', shape='TIME')) +
geom_point(alpha=0.5) +
theme_q2r() +
scale_shape_manual(values=c(16,1), name="TIME") + #see
scale_color_discrete(name="SAMPLES") +
ggtitle("Unweighted UniFrac")+
stat_ellipse()

But my samples' color, the time's shape, and the Shannon size stayed all the same.

I saw in the forum that my ID needs to be an alphanumeric number. Can you tell me if it is my alphanumeric ID, with the symbol - is the problem? I'm just asking because I'm not sure, and I need to redo the analysis if it was the problem.


Thank you so much for helping me to solve my problem!

Hi Patricia,

Other Colin here :wave:

Try removing the 'single quotes' around the column names.
ggplot(aes(x=PC1, y=PC2, color=SAMPLES, shape=TIME))


Not for ggplot! Some programs have problems with some IDs, so I don't worry about it until a program changes something.

Numbers, letters, .periods. and -dashes- should all be fine, see
https://docs.qiime2.org/2024.2/tutorials/metadata/#recommendations-for-identifiers

1 Like

Thank you @colinbrislawn, now it works!

1 Like