Qiime2r missing sample

Hello @jbisanz,
Thank you very much for this tutorial.
I am having an issue when using qza_to_phyloseq. My 'phy' data is missing 1 sample; it contains 33 samples and there should be 34. This is the code I used:

phy<-qza_to_phyloseq("table.qza", "rooted-tree.qza", "taxonomy.qza","sample-metadata.tsv", tmp="C:/tmp")

I have no problem using the second option you provide with phyloseq though.

I am not sure what I am doing wrong with the qza_to_phyloseq command. Do you have any suggestions?

Thank you! :blush:

Hi @helenaax2r,
I cannot speak to qiime2r specifically and will let @jbisanz reply but one possibility occurs to me. What does the header line of your sample metadata file look like? Does it begin with #SampleID? If so, try replacing with sample-id and see what happens... the former causes issues with R because the first line is interpreted as a comment line (but QIIME 2 can support multiple different formats — the #SampleID header line is a throwback from QIIME 1 that we support for backwards compatibility).

qiime2r may already check for and correct that issue — but pls give that a try. If it still does not work, let us know and we will see what @jbisanz has to say!

1 Like

I too suspect that is the case regarding the second line defining the variable type. At some point I will try to bang out a new function to read qiime2 formatted metadata and preserve the data-types in a data.frame or tibble.

Can you confirm how many samples are in your table.qza artifact? Are you sure it was supposed to be 34? I am also aware there is an issue with the biomformat package in R that can cause issues if a sample has 0 counts.

Please let me know what you find

2 Likes

I have pushed an update to qiime2R with a function called read_q2metadata() which will be used in qza_to_phyloseq() when the column specifications are found on the 2nd line of the file. Let me know if it works for you.

3 Likes

Thank you so much @Nicholas_Bokulich and @jbisanz for your fast responses!
I am sorry I didn't mention in my original post that I was using the table.qza and all files from the “Moving Pictures” tutorial, as suggested by the qiime2R tutorial.
I updated qiime2R and rerun it, and it works! I am now getting the same number of samples (34) by using the qza_to_phyloseq or the phyloseq method.
Thank you so much! :star_struck:

4 Likes

Hi,

I'm having the same problem. I have 8 samples and the qza_to_phyloseq method reads only 7 samples. The first two lines of my metadata file are below:

#id BarcodeSequence LinkerPrimerSequence BarcodeName ProjectName Description
1 CGTAACCA AGRGTTTGATCMTGGCTCAG Ill.27F.bar1 BacProj 1

Thanks for any help.

Ben

Remove the # from the header line.

Oops, I did not allow #id as a valid identifier for a sample, it will be supported now, or as Nicholas said, remove the #

Thanks for the replies. I removed the # and I get the following error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 7 elements

Thanks again for your help.

Ben

#id is not a valid identifier in QIIME 2, so no need to add now. See here for a list of valid metadata identifiers.

Hello,

I still have a problem with the missing sample. If I leave in the # sign before SampleID, I get the following output with one sample missing as shown in the first part below. If I leave out the # sign before SampleID, I get an error as shown in the second part below. I have also attached my metadata file.met.tsv (570 Bytes)

Output with #SampleID in first line (missing sample 1)

physeq<-qza_to_phyloseq("table.qza","rooted-tree.qza","taxonomy.qza", "met.tsv")
sample_names(physeq)
[1] "2" "3" "4" "5" "6" "7" "8"

Output with just SampleID without # sign

physeq<-qza_to_phyloseq("table.qza","rooted-tree.qza","taxonomy.qza", "met.tsv")
Show Traceback

Rerun with Debug
Error in validObject(.Object) : invalid class “phyloseq” object:
Component sample names do not match.
Try sample_names()

sample_names(physeq)
[1] "2" "3" "4" "5" "6" "7" "8"

For some reason, the first line is not being read properly. Thanks again for your help!

Ben

That is not a valid QIIME 2 identifier. Please see the list @Nicholas_Bokulich posted above.

1 Like

I don't think your problem actually has to do with the header, I think the issue may lie in the fact that your sample names are only numbers. In my opinion nothing good can come from using numbers as sample IDs, I would recommend using something containing a more descript alpha-numeric unique identifier. It would appear based on my playing around that phyloseq does not take numbers as sample names so this is where your issue is. I would recommend reading your objects in separately, appending and appending something on like below:

#not actually tested, may need to fix a typo or two
table<-read_qza("table.qza")$data
meta<-read_q2metadata("met.tsv")
meta$SampleID<-paste0("Sample_", meta$SampleID)
colnames(table)<-paste0("Sample_", colnames(table))

#Then build your phyloseq object manually

See more info here:
https://github.com/joey711/phyloseq/issues/395

1 Like

Thank you for your help Jordan! When I try to build the phyloseq object manually, I get the following error after typing the command:
colnames(tax_table)<-c("Kingdom","Phylum","Class","Order","Family","Genus","Species")

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

I suspect this is because some of the entries in my tax_table go to only the genus level and some are unassigned. Anything I can do to make the above command work?

Thank you!

Ben

That sounds about correct, you could double check with length(colnames(tax_table)) or dim(tax_table). In this case, just remove the ,"Species" from your first line and the problem should be solved.

I have a similar error when I tried to read in my alpha diversity artifacts.
Could you please let me know how you resolved it?

shannon <- read_qza("shannon_vector.qza")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 387 did not have 2 elements

Thanks for your help.
Abby

Abby, I think your problem is quite different, my best guess is that it has something to do with odd characters in your samples names, or if your sample names are numbers. Can you create a copy of your artifact, change the name to shannon_vector.zip, unzip it, and then show me the top few lines of yourartifact/data/alpha-diversity.tsv?

Thanks for getting back to me Jordan. I really appreciate it. Here is the top few lines of the alpha-diversity.tsv file:

shannon

05092.FW1 5.822203397010234
A02144.FW1 6.268439793200891
A99062.FW1 7.146380238682139
F96066.FW1 6.333436458114405
K98109.FW1 6.36621378061856
L02394.FW2 6.4210331964211935
T02319.FW1 6.081276327206204
Afrika.O 4.282472402860403
Afrika.R 6.610457875024227
Afrika.V 4.12223902710633
Asega.E 4.509307942478237
Asega.O 3.9782264647941186
Asega.P 3.899171432711717
Asega.R 5.472436097985946

Thanks again for your help

hmm, nothing looks too strange, looking closer at the error, the issue is on line 387, is this the end of your alpha-diversity.tsv file? What are the sample names around this line?

I think I may have found the issue.

This is the error message from another data set:

shannon <- read_qza("shannon_vector.qza")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 8 did not have 2 elements

I think the Bob II.R might be the issue.

shannon

Alberto.R 3.7326397121177632
Amanda.R 4.1073638010037214
Anthony.R 4.492168436839438
Ardis.R 5.0241187150844775
Arthur.R 5.167374184517388
Bebe.R 5.6840622966030905
Berkie.R 5.304247539185696
Bob II.R 3.1230155160493176
Brett.R 5.281071328044178
Buboo.R 3.4288150622431113
Buffy.R 6.048936657468467
Butters.R 3.126678900844961
Callie.R 3.799108522814666
Carl.R 5.941865116376522