How Merge separate sequence and taxonomy artifacts output from RESCRIPt

Hi, greetings to everyone.
Currently I'm working with CO1 database from NCBI downloaded by [RESCRIPt] but it comes with seperated taxonomy and sequence for me further with blast using Qiime. So for now I'm trying to merge the separate sequence and taxonomy using sugested command below:

  1. Import the Data: Import your sequence and taxonomy artifacts into QIIME 2 as separate data files. You can use the qiime tools import command. The exact commands may vary depending on your data formats. Below are example commands:

qiime tools import *
** --input-path sequences.fasta *

** --output-path sequences.qza **
** --type 'FeatureData[Sequence]'**

qiime tools import *
** --input-path taxonomy.tsv *

** --output-path taxonomy.qza **
** --type 'FeatureData[Taxonomy]'**

  1. Merge the Data: You can merge the sequence and taxonomy artifacts using the qiime rescript merge-taxa command. Here's an example command:
    qiime rescript merge-taxa *
    ** --i-data sequences.qza *

    ** --i-taxonomy taxonomy.qza **
    ** --o-merged-sequences taxonomy_and_sequences.qza**

  2. Export Merged Data: Once the data is merged, you can export it to a format of your choice. In your case, you want a FASTA file with taxonomy annotations. You can export the merged artifact to a BIOM format, and then use biom convert to convert it to FASTA.

qiime tools export --input-path taxonomy_and_sequences.qza --output-path merged_data_export

biom convert -i merged_data_export/feature-table.biom -o merged_sequences.fasta --to-fasta

However I'm already stuck at the step 1 to import the data using this command qiime tools import --type 'FeatureData[Sequence]' --input-path dna-sequences.fasta --output-path sequence.new.qza the error was come out as ,

An unexpected error has occurred:

** BLAST6 is not a variant of SampleData.field['type']**

See above for debug info.

therefore, if anyone may help me to solve this error could be very helpful.
thank you.

Hello @fatihahnajihah,

It looks like you are trying to import a file that is not in .fasta format into an artifact that expects such a format. What does head dna-sequences.fasta output?

1 Like

Hi, sir thank you for your reply,
attached here is the screenshot for output command "head dna-sequences.fasta" for you to review more.
thank you.

the main purpose for me to merge the sequence and taxonomy so that the fasta file will come out as this example.

Hello @fatihahnajihah,

That definitely looks like fasta. Could you provide the entire output of the error you get when you use the import command on this file?

1 Like

so I had to used the other version of ubuntu (Ubuntu 20.04.6 LTS) to import the file from fasta format to qza.
this error was occured by using Ubuntu 22.04.2 LTS.

Hello @fatihahnajihah,

Are you saying that changing OS versions fixed this problem for you?

1 Like

yes, since version of Ubuntu 22.04.2 LTS also was installed with RESCRIPt package, somehow Ubuntu 20.04.6 LTS without RESCRIPt package on it. Does it will affect any of it?
for version of Ubuntu 22.04.2 LTS, I did try the command with the latest version of qiime2.2023.7 but got another error:
Traceback (most recent call last):
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/bin/qiime", line 11, in **
** sys.exit(qiime())**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 1157, in call**
** return self.main(args, kwargs)
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 1078, in main
*
** rv = self.invoke(ctx)**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 1688, in invoke**
** return _process_result(sub_ctx.command.invoke(sub_ctx))**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 1688, in invoke**
** return _process_result(sub_ctx.command.invoke(sub_ctx))**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 1434, in invoke**
** return ctx.invoke(self.callback, ctx.params)
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/click/core.py", line 783, in invoke**
** return __callback(args, kwargs)
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/q2cli/builtin/tools.py", line 49, in export_data
*
** result = qiime2.sdk.Result.load(input_path)**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/result.py", line 75, in load**
** peek = cls.peek(filepath)**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/sdk/result.py", line 59, in peek**
** return ResultMetadata(archive.Archiver.peek(filepath))*
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 336, in peek**
** archive = cls.get_archive(filepath)**
** File "/home/fatihahnajihah/miniconda3/envs/qiime2-2023.7/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 322, in get_archive**
** raise ValueError("%s is not a QIIME archive." % filepath)**
ValueError: dna-sequences.fasta is not a QIIME archive.

Hello @fatihahnajihah,

When posting error outputs please also copy the command that you entered in the terminal. Otherwise it's impossible to troubleshoot what's going on.

1 Like

I'm so for not being clear with the commands also.
so attached here is the command was used and the errors comes out.
thank you.

Hello @fatihahnajihah,

It looks like you used export instead of import.

1 Like

thank you for all your suggestions, finally I managed to import the files into sequences.qza and taxonomy.qza after installing the new version of qiime2 (qiime2-2023.7).

However, I still cannot merged the both file from individually folder of sequence and taxonomy as below to become one compiled files which contain sequence and taxonomic information in only one file for undergo blast using ubuntu.

  1. folder of sequences only:
    (this one I had to show it in the fasta format) but I have changed it to qza format.

  2. folder of taxonomy:
    "C:\Users\User\Pictures\Screenshots\Screenshot (86).png"

  3. example of outcomes as I expected to generate:

As far as I did my own search , these are the command can be used for combined the files:

  1. Merge the Data: You can merge the sequence and taxonomy artifacts using the command below:

qiime rescript merge-taxa ***
** --i-data sequences.qza

** --i-taxonomy taxonomy.qza **
** --o-merged-sequences taxonomy_and_sequences.qza*

however, in the qiime rescript merge-taxa dont have any requirements for --i-taxonomy .
the command for merge-taxa only involved with --i-data and --o-merged-sequences.

Therefore may I know any other way to merged the sequence and taxonomic given by NCBI database for me undergo blast using vsearch.
thank you.

Hello @fatihahnajihah,

Can you provide a link to the tutorial you're following?

https://chat.openai.com/share/6418d3be-af23-44e7-afc7-3d8d7f6b0dc1
I have retrieved the command based on this website.
however if the link is restricted to their users only, I have screenshot the contents from the website.



thank you.

Hello @fatihahnajihah,

That software frequently "hallucinates" things, this is one of those cases.

so, do we have any other methods to merge the sequence and taxonomy become one fasta file?

Hello @fatihahnajihah,

When you say merge the sequence and taxonomy artifacts, what you want to create is a fasta file wherein each sequence has its taxonomic assignment as its header, is that correct?

yes correct, since in vsearch they need the --db for run the command for blast. however the database I got from NCBI using RESCRIPt the output was seperated with sequences and taxonomy artifacts.

so I have to merge the files as one fasta file for act as database such as example below for run the command vsearch --usearch_global FILENAME --db FILENAME --id 0.97 --alnout FILENAME:

Hello @fatihahnajihah,

What is the link between your FeatureData[Sequence] and FeatureData[Taxonomy]? It seems like you still need to classify your sequences in some way, otherwise how do you know which sequences get which taxonomy headers, right? Let me know if I'm missing something.

for each files of sequences and taxonomy they have same headers as feature ID.

  1. sequences

  2. taxonomy