Qiime2r missing sample

Hello,

I still have a problem with the missing sample. If I leave in the # sign before SampleID, I get the following output with one sample missing as shown in the first part below. If I leave out the # sign before SampleID, I get an error as shown in the second part below. I have also attached my metadata file.met.tsv (570 Bytes)

Output with #SampleID in first line (missing sample 1)

physeq<-qza_to_phyloseq(“table.qza”,“rooted-tree.qza”,“taxonomy.qza”, “met.tsv”)
sample_names(physeq)
[1] “2” “3” “4” “5” “6” “7” “8”

Output with just SampleID without # sign

physeq<-qza_to_phyloseq(“table.qza”,“rooted-tree.qza”,“taxonomy.qza”, “met.tsv”)
Show Traceback

Rerun with Debug
Error in validObject(.Object) : invalid class “phyloseq” object:
Component sample names do not match.
Try sample_names()

sample_names(physeq)
[1] “2” “3” “4” “5” “6” “7” “8”

For some reason, the first line is not being read properly. Thanks again for your help!

Ben

That is not a valid QIIME 2 identifier. Please see the list @Nicholas_Bokulich posted above.

1 Like

I don’t think your problem actually has to do with the header, I think the issue may lie in the fact that your sample names are only numbers. In my opinion nothing good can come from using numbers as sample IDs, I would recommend using something containing a more descript alpha-numeric unique identifier. It would appear based on my playing around that phyloseq does not take numbers as sample names so this is where your issue is. I would recommend reading your objects in separately, appending and appending something on like below:

#not actually tested, may need to fix a typo or two
table<-read_qza("table.qza")$data
meta<-read_q2metadata("met.tsv")
meta$SampleID<-paste0("Sample_", meta$SampleID)
colnames(table)<-paste0("Sample_", colnames(table))

#Then build your phyloseq object manually

See more info here:

1 Like

Thank you for your help Jordan! When I try to build the phyloseq object manually, I get the following error after typing the command:
colnames(tax_table)<-c(“Kingdom”,“Phylum”,“Class”,“Order”,“Family”,“Genus”,“Species”)

Error in dimnames(x) <- dn :
length of ‘dimnames’ [2] not equal to array extent

I suspect this is because some of the entries in my tax_table go to only the genus level and some are unassigned. Anything I can do to make the above command work?

Thank you!

Ben

That sounds about correct, you could double check with length(colnames(tax_table)) or dim(tax_table). In this case, just remove the ,"Species" from your first line and the problem should be solved.

I have a similar error when I tried to read in my alpha diversity artifacts.
Could you please let me know how you resolved it?

shannon <- read_qza(“shannon_vector.qza”)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 387 did not have 2 elements

Thanks for your help.
Abby

Abby, I think your problem is quite different, my best guess is that it has something to do with odd characters in your samples names, or if your sample names are numbers. Can you create a copy of your artifact, change the name to shannon_vector.zip, unzip it, and then show me the top few lines of yourartifact/data/alpha-diversity.tsv?

Thanks for getting back to me Jordan. I really appreciate it. Here is the top few lines of the alpha-diversity.tsv file:

shannon

05092.FW1 5.822203397010234
A02144.FW1 6.268439793200891
A99062.FW1 7.146380238682139
F96066.FW1 6.333436458114405
K98109.FW1 6.36621378061856
L02394.FW2 6.4210331964211935
T02319.FW1 6.081276327206204
Afrika.O 4.282472402860403
Afrika.R 6.610457875024227
Afrika.V 4.12223902710633
Asega.E 4.509307942478237
Asega.O 3.9782264647941186
Asega.P 3.899171432711717
Asega.R 5.472436097985946

Thanks again for your help

hmm, nothing looks too strange, looking closer at the error, the issue is on line 387, is this the end of your alpha-diversity.tsv file? What are the sample names around this line?

I think I may have found the issue.

This is the error message from another data set:

shannon <- read_qza(“shannon_vector.qza”)
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 8 did not have 2 elements

I think the Bob II.R might be the issue.

shannon

Alberto.R 3.7326397121177632
Amanda.R 4.1073638010037214
Anthony.R 4.492168436839438
Ardis.R 5.0241187150844775
Arthur.R 5.167374184517388
Bebe.R 5.6840622966030905
Berkie.R 5.304247539185696
Bob II.R 3.1230155160493176
Brett.R 5.281071328044178
Buboo.R 3.4288150622431113
Buffy.R 6.048936657468467
Butters.R 3.126678900844961
Callie.R 3.799108522814666
Carl.R 5.941865116376522

Yes! Editing the Bob II.R fixed it.
Thanks so much for all your help. I really appreciate it.

1 Like

Hi,
I am facing a similar problem when importing my metadata. I’ve tried all the suggestions but still didn’t find a solution for the problem.

metadata<-read_q2metadata(“metadata4b.tsv”)
Error in read_q2metadata(“metadata4b.tsv”) :
Metadata does not define types (ie second line does not start with #q2:types)

I have a metadata table with 74 samples. The first row are the identifiers of each column.
When I import using this command: metadata<-read.table(“metadata4b.txt”), it works but it says that I have 75 observations instead of 74.

Then when I go on to generate some graphics I always get errors messages which I believe is because of this.

For example, when I try to generate barplots using this command:

metadata %>%
filter(!is.na(shannon)) %>%
ggplot(aes(x=season, y=shannon, fill=season)) +
stat_summary(geom=“bar”, fun.data=mean_se, color=“black”) + #here black is the outline for the bars
geom_jitter(shape=21, width=0.2, height=0) +
coord_cartesian(ylim=c(2,7)) + # adjust y-axis
facet_grid(~species) + # create a panel for each body site
xlab(“Antibiotic Usage”) +
ylab(“Shannon Diversity”) +
theme_q2r() +
scale_fill_manual(values=c(“cornflowerblue”,“indianred”,“chocolate”)) + #specify custom colors
theme(legend.position=“none”) #remove the legend as it isn’t needed
ggsave("…/…/…/images/Shannon_by_abx.pdf", height=3, width=4, device=“pdf”) # save a PDF 3 inches by 4 inches

I get this error:

Error: Problem with filter() input ..1.
x Input ..1 must be of size 75 or 1, not size 74.
i Input ..1 is !is.na(shannon).
Run rlang::last_error() to see where the error occurred.

Thank you,
Danilo

Hi,
I’m having the same problem too. I wanted to generate a heatmap.
image

Please help. Thank you!

Suet Li

Hey @suetli19, You can use two methods to solve this

  1. using manual method ( by adding an new row in excel and mentioning whether the data is categorical or numerical) and import it (or)
  2. use metadata<-readr::read_tsv(“path of the file”)
    For Reference check this link. Hope this helps

Best,
Sreevatshan

1 Like

Sreevatshan is on the money.

To be more clear on the conceptual side, read_q2metadata() is specifically designed for the q2 metadata format wherein the second line contains the data types (which starts with #q2:types. The error message you were both receiving is the result of the second line of your metadata not containing that definition line, and thus, not being appropriate for loading with read_q2metadata(). As such you can use any function to read your metadata (of which read_tsv is my favourite!).

If you try to read a q2 metadata file using read_tsv() you will see that it imports the definition line as its own sample with the id #q2:types which is probably not desirable in many cases.

Note @Danilo_Reis, in your example of using read.table() as opposed to read_tsv() you need to be sure to indicate that your columns have names as below. I would also specify that it is tab (\t) separated just to be safe:

read.table(“metadata4b.txt”, header=TRUE, sep="\t"),
2 Likes

Hi @Sreevatshan,
I added a new row using the manual method, and now it works!
Thank you so much!

Best regards,
Suet Li

1 Like

Hi @jbisanz ,
Thanks for your reply! It finally worked! :slight_smile:
But I still get some error messages when I try to generate the graph you show in your tutorial.
I am using this command to generate bar plots (shannon diversity in three seasons - spring, summer, autumn grouped by each plant species (A,B,C and D).

Do you have any idea what can be the reason?

metadata %>%
filter(!is.na(shannon)) %>%
ggplot(aes(x=season, y=shannon, fill=season)) +
stat_summary(geom=“bar”, fun.data=mean_se, color=“black”) + #here black is the outline for the bars
geom_jitter(shape=21, width=0.2, height=0) +
coord_cartesian(ylim=c(2,7)) + # adjust y-axis
facet_grid(~species) + # create a panel for each body site
xlab(“Seasons”) +
ylab(“Shannon Diversity”) +
theme_q2r() +
scale_fill_manual(values=c(“cornflowerblue”,“indianred”,“chocolate”)) + #specify custom colors
theme(legend.position=“none”) #remove the legend as it isn’t needed
ggsave("…/…/…/images/Shannon_by_abx.pdf", height=3, width=4, device=“pdf”) # save a PDF 3 inches by 4 inches

Don’t know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (74): y

Is shannon a column in your metadata table, or is it a separate table? It needs to be a column which might require joining metadata and shannon.

Hi @jbisanz ,
Now it worked! :slight_smile: I don’t know why but it generated a “shannon” column in the medatada but with a different name (shannon_entropy). That`s the reason why the last command was not working.

Thanks a lot!

2 Likes

How can you change the error in the shannon_vector.qza? I found my problem, but I don't know how to change it in the original file.