Phyloseq to QIIME2

I have done all of my sample processing and I now have multiple phyloseq objects I would like to export to QIIME2 for ease of visual analyses and construction of phylogenetic trees. I found this forum post and tried the posted code but for some reason, the import couldn’t find the file even though I designated the full path. I tried to copy/paste the code from the updated phyloseq2qiime2.R code posted here, but got the following error:
There was a problem importing feature-table.biom:
feature-table.biom is not a(n) BIOMV210Format file
I’m not sure what to do about this, could you please help?

Hi @rlhughes,
It’s hard to troubleshoot this properly without seeing any of the commands you’ve used but I’m guessing the issue is just the wrong biom format being used. You may have to convert your biom version before importing. See this post for an example and more details.
Also, if you are using a windows system R and importing to a mac/linux qiime (maybe through a virtual machine) you might find this recent post useful too regarding changing line endings.

If these don’t resolve your issue could you please provide us with the exact commands you’ve used since exporting the biom table out of R.

1 Like

Hi @rlhughes,

If you tried the code here with any version of QIIME2 after 2018.8 it won’t work. I just added an update to the post, but basically you have to change “–source-format” to “–input-format” in the QIIME2 commands.

The phyloseq2qiime2.R code creates a BIOMV100Format biom file, so that may be the issue there.

3 Likes

Thank you @ChristianEdwardson, this worked! Stupid mistake on my part. The only issue I am having now is that it works on a couple of my phyloseq objects but, on one of them, I get the following error (traceback shown):
Error in .Call2(“new_XStringSet_from_CHARACTER”, ans_class, ans_elementType, :
key 51 (char ‘3’) not in lookup table
9. .Call2(“new_XStringSet_from_CHARACTER”, ans_class, ans_elementType,
x, start(solved_SEW), width(solved_SEW), get_seqtype_conversion_lookup(“B”,
seqtype), PACKAGE = “Biostrings”)
8. .charToXStringSet(seqtype, x, start, end, width, use.names)
7. XStringSet(“DNA”, x, start = start, end = end, width = width,
use.names = use.names)
6. XStringSet(“DNA”, x, start = start, end = end, width = width,
use.names = use.names)
5. DNAStringSet(names(unqs))
4. ShortRead(sread = DNAStringSet(names(unqs)), id = BStringSet(ids))
3. writeFasta(object = ShortRead(sread = DNAStringSet(names(unqs)),
id = BStringSet(ids)), file = fout, mode = mode, width = width,
…)
2. uniquesToFasta(t(otu), fout = paste0(ps_name, “_ref-seqs.fasta”),
ids = rownames(otu))

  1. phyloseq2qiime2(phy_merged_F)

To be honest, I’m not sure what to do with this…

Does your phyloseq object contain reference sequences? It looks like an error from the package ‘biostrings’ which is trying to write the reference sequences to a fasta file. It might have something to do with non-standard characters in your reference sequences. Thinking this could be caused by gaps (- or _, due to them being aligned sequences) or other weird characters (*)? Also, the ‘biostrings’ call I’m using assumes that the refseq object is in FASTA format.

Christian

The @refseq position is NULL in all of the phyloseq objects

Hmm not sure then. If you can send me one of your phyloseq objects that is giving you the error as an .RDS file (with function ‘saveRDS’), I can try to troubleshoot on my end.

I tried uploading the RDS file but it was not an allowable extension. Is there another way I can send the RDS file to you?

OK I got your RDS containing the phyloseq object and examined it.

The reason it is failing is that part of my R function (phyloseq2qiime2) does a check for sequences (ATCG only) before writing a representative/reference sequence fasta file based on those sequences it finds either in the row names or column names (depending on if taxa_are_rows=TRUE or FALSE for your phyloseq object).

In your case, your phyloseq object contained both sequences and MD5 hashes in the colnames. This is probably because you ran dada2 in QIIME2 for some of your samples, where the default parameter is --p-hashed-feature-ids. You can add the argument --p-no-hashed-feature-ids to output the ASV sequences instead of hashes.

Unfortunately, I don’t have a fix for getting you a FASTA file right now using my script. The other files should be created correctly. If you need a FASTA file, can export a representative set of sequences from qiime2 and run uniquesToFasta from dada2 in R you could combine the two on the command line using cat and then filter for unique sequences using one of the various command line tools out there for FASTA file manipulation.

Hope this helps!
Christian

Ok sorry just want to make sure I understand correctly. The phyloseq object I have is a merged object containing the same set of samples run through 1) DADA2 in R and 2) Deblur in QIIME2. In order to export the merged phyloseq object to a qiime2 object, I need to either A) go back to do the Deblur denoising step and indicate --p-no-hashed-feature-ids (then use the downstream output to merge with dada2 results before exporting back to QIIME2) or B) not sure I fully understand the process here in terms of which files need to be manipulated. Thank you for all of your help!

OK that makes sense.

Yes, if you want to export the representative sequences in this manner. Alternatively, you could just export the DADA2 run from R, import it into QIIME2 and do the merging in QIIME2 (example here).

Perfect, thank you so much!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.