How to make artifact .qza for the ref-seqs that I have collected?

Hi!,

I am trying to making a classifier for the taxonomic assignment of marker genes, mcrA and dsrB.

For this work, I collected the ref-seqs from ncbi database using RESCRPt; followed by filtering processes. I examined the ref-seqs using BioEdit and ClustalX in my computer, and decided to improving the ref-seqs file by additional trimming redundant seqs, which I downloaded from the ncbi, and by adding some more necessary seqs, which I would like to include to the ref-seqs file.

I can do this work in BioEdit using fasta format seqs, and can save these seqs as tsv file, and the corresponding id table save as tsv file.

Can I convert those self-made tsv files to qza file; such that i could use them for more qualified classifier?

I would appreciate for your kind advise on that question.

Hee-Sung

Hi @baehsung ,
You can just re-download the final sequences from NCBI using the list of accessions and RESCRIPt (same as you did originally, only now with a set of specific sequences)

Or you can check the importing tutorials at docs.qiime2.org/ to learn how to import various file formats into QIIME 2.

Good luck!

Thanks Nicholas,

I read the importing tutorials, but could not clearly understand.

The screenshot below is the txt files that I converted from qza files obtained with RESCRIPt command.
After editing those txt file (removing and adding some seqs), how could I convert them back to qza files?

Best regards,

Hee-Sung

Hi!,

I am struggling with converting txt to qza in preparing ref-seqs for making a classifier. I got ref-seqs,txt format (as shown below) that had been retrieved from NCBI using RESCRIPt and edited by myself. Now, I would like to convert back to qza file.

According to a linkage (in Converting between file formats — biom-format.org), i first converted the txt to biom file using following command.

 biom convert 
 -i dsrB-ref-seqs.txt 
 -o dsrB-ref-seqs.biom 
 --table-type "OTU table" --to-hdf5

Then i will convert the biom to qza format

qiime tools import
--input-path dsrB-ref-seqs.biom
--output-path dsrB-ref-seqs.qza
--type FeatureTable[Sequence]

This command made an error massage like "biom file is not DNA FASTAFormat file"

Does someone know what type of import file is appropriate in this case?

Thanks,

Hee-Sung

1 Like

Hello!
In order to import ref-seqs into qiime2, you should safe it as a fasta file. No need to convert it to the biom format, since it is not a feature table. Then import is as FeatureData[Sequence], not FeatureTable[Sequence]

1 Like

Thanks Timur !!

your advice help me in saving a huge amount of time that I am spending to solve this issue.

As far as i know, fasta file comprise two line starting with ">" seq id, followed by seqs in the next line.
So, to save it as fasta format, I have to add > in the individual seqs. Am I right?

Yeah, you are right. Usually it is a txt file with extention ".fasta", ".fa". Each ID starts with ">" and sequence as a next line.
Here is a link for more details.

thanks timanix,

I edited the ref-seqs fasta, and then convert back the edition to qza. I hope that it will work for making the clissifier.

Subsequently, I edited "ref-tax" file that including Feature ID and Taxon, and saved as txt file as below.

And then I tried convert it to qza using following commands.

biom convert -i dsrB-refseqs-txid.txt
-o dsrB-refseqs-txid.biom
--table-type "OTU table" --to-hdf5

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-path dsrB-refseqs-txid.biom
--output-path dsrB-refseqs-txid.qza

But this attempt resulted in making error (see below).
Could you figure out what is wrong?

Best regards,

HS

@baehsung,

This error is usually do to either file corruption or a file not being correctly zipped. You will need to use gzip to re-zip your edited .fasta file. Then trying running your import again :slightly_smiling_face:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.