Qiime2r missing sample


(Helena) #1

Hello @jbisanz,
Thank you very much for this tutorial.
I am having an issue when using qza_to_phyloseq. My ‘phy’ data is missing 1 sample; it contains 33 samples and there should be 34. This is the code I used:

phy<-qza_to_phyloseq(“table.qza”, “rooted-tree.qza”, “taxonomy.qza”,“sample-metadata.tsv”, tmp=“C:/tmp”)

I have no problem using the second option you provide with phyloseq though.

I am not sure what I am doing wrong with the qza_to_phyloseq command. Do you have any suggestions?

Thank you! :blush:


Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R
(Nicholas Bokulich) #2

Hi @helenaax2r,
I cannot speak to qiime2r specifically and will let @jbisanz reply but one possibility occurs to me. What does the header line of your sample metadata file look like? Does it begin with #SampleID? If so, try replacing with sample-id and see what happens… the former causes issues with R because the first line is interpreted as a comment line (but QIIME 2 can support multiple different formats — the #SampleID header line is a throwback from QIIME 1 that we support for backwards compatibility).

qiime2r may already check for and correct that issue — but pls give that a try. If it still does not work, let us know and we will see what @jbisanz has to say!


(Jordan Bisanz) #3

I too suspect that is the case regarding the second line defining the variable type. At some point I will try to bang out a new function to read qiime2 formatted metadata and preserve the data-types in a data.frame or tibble.

Can you confirm how many samples are in your table.qza artifact? Are you sure it was supposed to be 34? I am also aware there is an issue with the biomformat package in R that can cause issues if a sample has 0 counts.

Please let me know what you find


(Jordan Bisanz) #4

I have pushed an update to qiime2R with a function called read_q2metadata() which will be used in qza_to_phyloseq() when the column specifications are found on the 2nd line of the file. Let me know if it works for you.


(Helena) #5

Thank you so much @Nicholas_Bokulich and @jbisanz for your fast responses!
I am sorry I didn’t mention in my original post that I was using the table.qza and all files from the “Moving Pictures” tutorial, as suggested by the qiime2R tutorial.
I updated qiime2R and rerun it, and it works! I am now getting the same number of samples (34) by using the qza_to_phyloseq or the phyloseq method.
Thank you so much! :star_struck:


(Ben) #6

Hi,

I’m having the same problem. I have 8 samples and the qza_to_phyloseq method reads only 7 samples. The first two lines of my metadata file are below:

#id BarcodeSequence LinkerPrimerSequence BarcodeName ProjectName Description
1 CGTAACCA AGRGTTTGATCMTGGCTCAG Ill.27F.bar1 BacProj 1

Thanks for any help.

Ben


(Nicholas Bokulich) #7

Remove the # from the header line.


(Jordan Bisanz) #8

Oops, I did not allow #id as a valid identifier for a sample, it will be supported now, or as Nicholas said, remove the #


(Ben) #9

Thanks for the replies. I removed the # and I get the following error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 7 elements

Thanks again for your help.

Ben


(Nicholas Bokulich) #10

#id is not a valid identifier in QIIME 2, so no need to add now. See here for a list of valid metadata identifiers.


(Ben) #11

Hello,

I still have a problem with the missing sample. If I leave in the # sign before SampleID, I get the following output with one sample missing as shown in the first part below. If I leave out the # sign before SampleID, I get an error as shown in the second part below. I have also attached my metadata file.met.tsv (570 Bytes)

Output with #SampleID in first line (missing sample 1)

physeq<-qza_to_phyloseq(“table.qza”,“rooted-tree.qza”,“taxonomy.qza”, “met.tsv”)
sample_names(physeq)
[1] “2” “3” “4” “5” “6” “7” “8”

Output with just SampleID without # sign

physeq<-qza_to_phyloseq(“table.qza”,“rooted-tree.qza”,“taxonomy.qza”, “met.tsv”)
Show Traceback

Rerun with Debug
Error in validObject(.Object) : invalid class “phyloseq” object:
Component sample names do not match.
Try sample_names()

sample_names(physeq)
[1] “2” “3” “4” “5” “6” “7” “8”

For some reason, the first line is not being read properly. Thanks again for your help!

Ben


(Matthew Ryan Dillon) #12

That is not a valid QIIME 2 identifier. Please see the list @Nicholas_Bokulich posted above.


(Jordan Bisanz) #13

I don’t think your problem actually has to do with the header, I think the issue may lie in the fact that your sample names are only numbers. In my opinion nothing good can come from using numbers as sample IDs, I would recommend using something containing a more descript alpha-numeric unique identifier. It would appear based on my playing around that phyloseq does not take numbers as sample names so this is where your issue is. I would recommend reading your objects in separately, appending and appending something on like below:

#not actually tested, may need to fix a typo or two
table<-read_qza("table.qza")$data
meta<-read_q2metadata("met.tsv")
meta$SampleID<-paste0("Sample_", meta$SampleID)
colnames(table)<-paste0("Sample_", colnames(table))

#Then build your phyloseq object manually

See more info here: