reprositories to upload microbiota datasets

MarwaTawfik · November 23, 2023, 9:31am

Hi guys

Thanks for all the support we got throughout the years. I came to the stage that I need to upload my microbiota datasets (16S) on an online reprository before submission of the manuscirpt. Please let me know based on your experience, what are the pros and cons of some of the available tools. I was told that SRA is tricky if we have many samples and need to split into batches.
Thanks in advance
Marwa

timanix · November 23, 2023, 12:24pm

Hi!
I only have experience with ENA, so can not say anything about other repos. Just uploaded 892 paired samples, pair by pair with CLI and Python scripts. But I know that they also have a tool for Windows for bunch uploading.

colinbrislawn · November 23, 2023, 8:44pm

Yeah, I've heard that too.

Yet, the last time I submitted to SRA, it was super easy and streamlined!

Here are my notes about submitting to SRA:

go to the Submission portal
On step 4: Biosample type, choose NCBI packages>Metagenome or environmental.
Build 5: Biosample Attributes .tsv file for upload
- https://gensc.org/mixs/ and Biosample Attributes - BioSample - NCBI

The big thing SRA needs is the Biosample Attributes .tsv file, which I build from my metadata.

You could deposit each sequencing run as a batch or each paper as a batch. Both are pretty easy.

MarwaTawfik · November 24, 2023, 9:55am

Thanks for the details about SRA procedures, will look into it now.
How it might take to get them public? I have over 100 samples for the first study and less than 50 for the other, some samples have been used in both manuscripts. All samples were on one sequencing run.
Better to split them? ok to have same samples (18 samples) in both?
Each study have their own metadata where we were answering different question.

MarwaTawfik · November 24, 2023, 9:56am

Thanks @timanix
Those scripts are available online? or you can share?

timanix · November 24, 2023, 10:15am

I would split them by papers and make them public before or after submitting a manuscript.

They are not since it is just my script to upload samples one by one in the loop. I can share my notebook with the code I used if you are going to submit to ENA.

For it, you will need:

Register a study (paper)
Register all samples from it (by uploading the corresponding table)
Get accession numbers of the study and samples
Upload samples by accesion number of sample and study.

Drop me PM if you will decide to upload there and I will send it.

wasade · November 27, 2023, 7:52pm

Hi @MarwaTawfik,

I recommend considering deposition into Qiita [ref]. It already houses hundreds of thousands of 16S and metagenomic samples, and provides an easy automated mechanism to deposit in ENA which satisfies journal data deposition requirements.

Best
Daniel

MarwaTawfik · December 1, 2023, 9:27am

Thanks so much @wasade, very good to know about this tool and the possibility to run meta-analyses

MarwaTawfik · December 18, 2023, 4:29pm

how long it takes to get a project ID to share in my manuscript after submission either for ENA-EBI, SRA-NCBI, or qiita? from your experience. I submitted to ENA two weeks ago via the windows file explorer option: Uploading Files To ENA — ENA Documentation 1 documentation and got no info or update when I contacted the ENA team.
I could retract my files and check for other reprository if it is faster. Please let me know how long it takes to get ID after sending your files (that I will add in my mansucript).

timanix · December 19, 2023, 5:42am

In ENA, one usually register project first (with project ID issued immediately), and then upload their sequences to the project. If you uploaded your sequences then I guess you should already have Project ID.

MarwaTawfik · January 4, 2024, 1:24pm

Thanks for all of your repsonses, I don't know but perhaps the way I submitted my sequences need follow-ups from ENA side to create project ID. I created a BioProject ID instead to add in my manuscript then can upload datasets later on today.
@colinbrislawn
I tried qiita but I couldn't manage. ENA didn't respond for long time.
The ones that responded once I created my BioProject is NCBI.
I am trying now to do the final step (BioSample attributes upload). I used the template file and I filled it in using my metadata, but I am getting error!
Not sure if you encountered that error before? I emailed ncbi help hoping they will help, not sure.
I have a unique ID for each sample but some samples are replicates so normally some rows will be similar in other columns.
Screenshot for the error and also my biosample_attribute file (I tried two formats either with and without 1 in the name of the excel file):

MarwaTawfik · January 5, 2024, 7:12pm

solution for those looking for it.