Hi! I did 16S analysis on environmental samples, but I met a problem. The sequence with archaeal primers annotated was mostly bacteria, so someone suggested that I can separate the archaea-only database from the Silva database and re-annotate it.
I don’t know if this is right?
I use Matlab to separate the archaeal sequence and taxonomy and always report an error when importing, like this:
time qiime tools import
--type 'FeatureData[Sequence]'
--input-path silva_bacteria.fasta
--output-path silva_bacteria.qza There was a problem importing silva_bacteria.fasta:
** silva_bacteria.fasta is not a(n) DNAFASTAFormat file:**
** ' at position 0 on line 25 (does not match IUPAC characters for a DNA sequence).**
I want to know what tool should be used to separate the archaeal sequence, so that can be successfully imported into qiime2.
Thank you in advance for your help!
Who recommended this? This sounds an awful lot like this recent conversation, which you can read for more details. In my opinion, making an archaea-only database might not be explicitly wrong but it could really open up a new set of unintended issues so you should be careful if you want to do this. See this conversation:
Don't use matlab... looks like it is mangling the format and inserting special characters. Probably an easy fix, but there are easier fixes:
qiime taxa filter-seqs would do what you need, you can see the documentation for more usage details.
You might also need to trim the taxonomy file as well, to match the filtered sequence file. We have a tool for that as well, but it is not part of the QIIME 2 2020.11 standard release and needs to be installed separately: https://library.qiime2.org/plugins/rescript/27/
if you install and use that, the method you want is called "filter-taxa", use like so: