Manually Adding Positive Controls to classifier or reference database

I'm wondering if there is an easy way to add sequences/taxonomy to a reference database created with QIIME.

I'm working on an eDNA metabarcoding project and want to detect species not present on NCBI's GenBank database. I have tissue samples from those species, and can thus sequence them with the same metabarcoding primers that I'm using for my study.

I would either Sanger sequence them with the metabarcoding primers, or would sequence them on the Illumina platform, then get a consensus sequence for that individual (although that seems a bit overkill).

I could add these directly to GenBank, then use rescript get-ncbi-data to download them in the proper format. Instead, I'm wondering if there's an easy way to manually add them instead.

Thanks for your help! I'd love to know if there is a standard procedure for this.

Hi @alexkrohn ,

  1. Use rescript to download your database etc
  2. import your new sequences and taxonomy as FeatureData[Sequence] and FeatureData[Taxonomy]
  3. use qiime feature-table merge-seqs to add the new sequences to your database
  4. use qiime feature-table merge-taxa to add the new taxonomies to your database (rescript also has a merge-taxa action that could be used for this, but that is for a more advanced case).

Good luck!

2 Likes

Hi @Nicholas_Bokulich. Finally getting around to testing this.

I've imported a multi-fasta of reference samples sequenced with Sanger sequence using qiime tools import --type 'FeatureData[Sequence'.

Here's a screenshot of the first few sequences in the FASTA:
Screenshot 2024-08-30 at 11.17.57 AM

Based on these instructions, I created a headerless taxonomy file with just the accession number from my FASTA. As with NCBI, I assumed Qiime would consider everything left of the first space in the header to be the accession number.

Both the taxonomy and sequences import without a problem, but if I just merge them to my GenBank taxonomy and sequences, I don't really have a way of knowing whether the new taxonomy and sequences that I imported are properly connected.

Is there a way to verify that? Or can you affirm that this should work?

Alternatively, it seems you can use the feature IDs of the sequences, or the accession number in the taxonomy file. Is there an easy way to find the feature IDs from a FASTA that I just imported?

Thanks as always for your help!

HI @alexkrohn one quick way would be to create a visualization of your final taxonomy file, using qiime metadata tabulate ... then you can use the search box in the upper right of the visualization to search for your newly added sequence IDs.

You can also do this with the sequence file, but there is no search box for this visualization. You can simply use the browser search function to search for your newly added IDs.

1 Like

Confirmed! Thank you so much for the guidance!

1 Like