I'm having trouble fetching the sequence data for PRJNA688998, I've validated the accession id in the NCBI database, and also got the metadata.qza file but I'm getting the following error:
Object of type set is not JSON serializable
Here I'm attaching the qza file and screenshot for better understanding-
metadata_file_runs.qza (5.4 KB)
Hey @Ayazaeroth, welcome to the forum!
I'm sorry to hear you're experiencing trouble while fetching your sequences. I have been trying to reproduce your issue which pulled me into the hole of SRA data retrieval issues that are not necessarily related to q2-fondue itself. It would appear that the data you are trying to fetch does not actually exist on the NCBI servers . There are BioSample/SRA sample IDs linked to this BioProject ID (which you can find using the SRA Browser), however when I try to identify the SRA run IDs that should be linked to those, I can't find any (check here). Basically, whatever I do, I cannot find the linked SRA run IDs.
Do you know that these runs exist? As in, have you (or anyone you know) ever worked with those sequences before?
We will try to improve handling cases like this within q2-fondue itself (so thanks for bringing this to our attention!), however at this time we are not able to conjure up any non-existent data ... (if my theory above is correct, that is).
Let me know whether you can confirm my findings - if the data does exist after all, we will need to look into what is actually happening in more detail.
@misialq Thank you for going through the troubles.
I think your findings are justified. I just collected the accession id from a paper and got confused when NCBI database showed the existence of the biosamples but I couldn't fetch their data. Since I am not directly involved with the data, it is possible that the author may have removed it.
I'm also having issue with another id, which does exist in the SRA browser. For this one I'm getting a different error.
/tmp/q2-SRAMetadataFormat-ci0_vw48 is not a(n) SRAMetadataFormat file:
Some required columns are missing from the metadata file: Organism, Instrument, Platform, Bases, Bytes, Public, Library Selection, Library Source, Library Layout, ID, Biosample ID, Bioproject ID, Experiment ID, Study ID, Sample Accession.
Would you kindly check into this one as well and let me know the problem?
I'm attaching the metadata file below-
Accession id: PRJNA191059
metadata_file_runs.qza (4.9 KB)
I tried fetching metadata (and sequences) for the BioProject ID which you provided but everything worked for me... Could you please check (and share) which version of q2-fondue you are using? (you can run
qiime info to do that). Maybe you can try creating a new environment with a fresh installation of fondue, by following the instructions from our GitHub repo, and then re-try fetching your metadata?
Let me know how it went.
I completely removed miniconda and then reinstalled everything and now it is working. Thanks <3