separate archaea-only database from Silva database

Susun · January 8, 2021, 3:32pm

Hi! I did 16S analysis on environmental samples, but I met a problem. The sequence with archaeal primers annotated was mostly bacteria, so someone suggested that I can separate the archaea-only database from the Silva database and re-annotate it.
I don’t know if this is right?
I use Matlab to separate the archaeal sequence and taxonomy and always report an error when importing, like this:
time qiime tools import
--type 'FeatureData[Sequence]'
--input-path silva_bacteria.fasta
--output-path silva_bacteria.qza
There was a problem importing silva_bacteria.fasta:

** silva_bacteria.fasta is not a(n) DNAFASTAFormat file:**

** ' at position 0 on line 25 (does not match IUPAC characters for a DNA sequence).**
I want to know what tool should be used to separate the archaeal sequence, so that can be successfully imported into qiime2.
Thank you in advance for your help!

Nicholas_Bokulich · January 8, 2021, 3:38pm

HI @Susun,

Who recommended this? This sounds an awful lot like this recent conversation, which you can read for more details. In my opinion, making an archaea-only database might not be explicitly wrong but it could really open up a new set of unintended issues so you should be careful if you want to do this. See this conversation:

Don't use matlab... looks like it is mangling the format and inserting special characters. Probably an easy fix, but there are easier fixes:

qiime taxa filter-seqs would do what you need, you can see the documentation for more usage details.

You might also need to trim the taxonomy file as well, to match the filtered sequence file. We have a tool for that as well, but it is not part of the QIIME 2 2020.11 standard release and needs to be installed separately:
https://library.qiime2.org/plugins/rescript/27/

if you install and use that, the method you want is called "filter-taxa", use like so:

qiime rescript filter-taxa --i-taxonomy taxa.qza --m-ids-to-keep-file filtered-sequences.qza --o-filtered-taxonomy filtered-taxonomy.qza

Good luck!

Susun · January 8, 2021, 3:53pm

Sequencing company

Susun · January 8, 2021, 3:56pm

Nicholas_Bokulich:

This sounds an awful lot like this recent conversation, which you can read for more details. In my opinion, making an archaea-only database might not be explicitly wrong but it could really open up a new set of unintended issues so you should be careful if you want to do this. See this conversation:

I am using qiime 2 for taxonomic classification of archaea 16 v4-v5 amplicons. When I used silva 138(including bacteria and archaea) as the reference database, most reads were classified as bacteria. It resulted in less than 2,000 reads kept as archaea in some samples. When I used silva 138(only including archaea) as the reference database, many reads classified as bacteria before were classified as archaea. And all the samples had more than 10,000 reads as archaea. I’m confused that which on…

Thank you!
I will seriously consider your suggestions!

system · February 8, 2021, 9:56pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.