What is the correct way to submit amplicon sequencing data to NCBI

shira · August 22, 2019, 8:07am

This is a basic question but I couldn't find this topic in the forum so here goes -

When you are ready to submit a paper that includes amplicon sequencing data - how do you submit the sequences/data to NCBI to get accession numbers? Back in the day you would do individual/batch submission, but then the assumption was that each sequence was obtained independently. Can you still use batch submission for amplicon data? If so, what do you put under 'isolate' (its supposed to be a unique alphanumeric code describing the sample from which each sequence was obtained). Alternatively - are you supposed to submit the whole project, including the raw data? Wouldn't this, in fact, lead to a less informative/accessible submission, or are the amplicons then somehow incorporated into Genebank?

In our case we are really tempted to submit individually since we used targeted amplicon sequencing, so we only have 25 sequences, all from the same genus.

I would be happy to hear from people who have experience dealing with NCBI submission of amplicon data - many thanks!

colinbrislawn · August 22, 2019, 1:33pm

This is a really good question! I hope people can share their methods about how they do it!

Here's my solution: Don't use NCBI. I use other databases instead.
Here are two of my papers on Qiita (which can be mirrored to ENA):
https://qiita.ucsd.edu/study/description/1191
https://qiita.ucsd.edu/study/description/10481
Here are two other of my papers on OSF.io
OSF | Zegeye - NAG consortia
OSF | bernstein-2016-productivity-and-diversity

Note that the data for that last paper is on three platforms, Qiita, OSF, and ENA.
More places is better!

EDIT: It's now the future, and SRA works great for me!

How I post to SRA today

go to the Submission portal
On step 4: Biosample type, choose NCBI packages > Metagenome or environmental.
Build 5: Biosample Attributes .tsv file for upload
- https://gensc.org/mixs/ and Biosample Attributes - BioSample - NCBI

The big thing SRA needs is the Biosample Attributes .tsv file, which I build from my metadata.

Colin

Nicholas_Bokulich · August 22, 2019, 1:58pm

Great question @shira!

Just want to add a few nuggets of information to @colinbrislawn's advice:

some journals (or reviewers!) will ask for specific data repositories to be used. ENA and SRA are pretty commonly accepted, and I have been asked to deposit in one of these instead of/in addition to QIITA in the past (that was before QIITA was still young, so times may have changed).

BUT:

QIITA will automatically submit your data to ENA for you, if you ask it to (and provide some specific metadata that ENA requires). This really streamlines submission and you get your data deposited in two locations!

ALSO

If you are unfamiliar with QIITA, read more about it. It is not only a place to deposit your data, but it facilitates re-use of those data in the future for meta-analysis and all sorts of great uses. So you help the community, and help your work get a little more attention! (I have had a couple studies of mine cited pretty much just because they were used in a meta-analysis on QIITA).