I am trying to making a classifier for the taxonomic assignment of marker genes, mcrA and dsrB.
For this work, I collected the ref-seqs from ncbi database using RESCRPt; followed by filtering processes. I examined the ref-seqs using BioEdit and ClustalX in my computer, and decided to improving the ref-seqs file by additional trimming redundant seqs, which I downloaded from the ncbi, and by adding some more necessary seqs, which I would like to include to the ref-seqs file.
I can do this work in BioEdit using fasta format seqs, and can save these seqs as tsv file, and the corresponding id table save as tsv file.
Can I convert those self-made tsv files to qza file; such that i could use them for more qualified classifier?
I would appreciate for your kind advise on that question.
Hi @baehsung ,
You can just re-download the final sequences from NCBI using the list of accessions and RESCRIPt (same as you did originally, only now with a set of specific sequences)
Or you can check the importing tutorials at docs.qiime2.org/ to learn how to import various file formats into QIIME 2.
I read the importing tutorials, but could not clearly understand.
The screenshot below is the txt files that I converted from qza files obtained with RESCRIPt command.
After editing those txt file (removing and adding some seqs), how could I convert them back to qza files?
I am struggling with converting txt to qza in preparing ref-seqs for making a classifier. I got ref-seqs,txt format (as shown below) that had been retrieved from NCBI using RESCRIPt and edited by myself. Now, I would like to convert back to qza file.
Hello!
In order to import ref-seqs into qiime2, you should safe it as a fasta file. No need to convert it to the biom format, since it is not a feature table. Then import is as FeatureData[Sequence], not FeatureTable[Sequence]
your advice help me in saving a huge amount of time that I am spending to solve this issue.
As far as i know, fasta file comprise two line starting with ">" seq id, followed by seqs in the next line.
So, to save it as fasta format, I have to add > in the individual seqs. Am I right?
Yeah, you are right. Usually it is a txt file with extention ".fasta", ".fa". Each ID starts with ">" and sequence as a next line.
Here is a link for more details.
This error is usually do to either file corruption or a file not being correctly zipped. You will need to use gzip to re-zip your edited .fasta file. Then trying running your import again