Creating QIIME-Compatible 23S Plastid Gene Database

Hello,

I am analyzing 23S plastid data to look at algal species present in a water sample based on their plastid rNA. I have found a 23S plastidial reference database here: http://microgreen-23sdatabase.ea.inra.fr/ressources.html, and would like to make the one using the PR2/SILVA taxonomy reference QIIME-compatible.

I am very new to this and appreciate any help or suggestions you have! Thank you so much.

Welcome to the forum @Chantel!

It sounds like they may have already done the hard work (converting to a compatible taxonomy format). You will just need to make sure you have the correct files:

  1. Sequences in fasta format like so:
>sequence1
ACGTACGTAGTCA
>sequence2
ACAGTGTGATTATA
...
  1. Taxonomy in a TSV file (tab-separated values) in the format <sequence ID> [tab] <semicolon-delimited taxonomy> like so:
sequence1    blah;blah;blah;blah
sequence2    blah;blabbity;blabbity;blab

One thing to look out for: the taxonomy strings should have an equal number of ranks. This is not a requirement for QIIME 2, but some methods will raise errors if you have uneven ranks (i.e., a different number of semicolon-delimited taxonomic levels)

Once you have those files prepared, you can import to QIIME 2 like this:

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path sequences.fasta \
  --output-path sequences.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path taxonomy.txt \
  --output-path taxonomy.qza

(note in the second command I imported as HeaderlessTSVTaxonomyFormat but if your taxonomy file has a header line you should import as TSVTaxonomyFormat)

Good luck!

1 Like

Wow, thank you so much for the informative reply, I’ll get started!

3 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.