error while making BOLD reference database

Hi !

I am currently facing the same problem as @John trying to import my BOLD database...

I installed the plug-in seqkit and tried to run your command lines (@cjfields) but nothing seems to happen after entering the command :

seqkit seq -w0 -g bold_rawSeqs_forQiime.fasta > bold_rawSeqs_forQiime.degapped.fasta

This results in the import command doing exactly the same : nothing
image

Could it be because of my database format (.fas) ?

I am new to qiime2 and have been struggling with the database import for some days.
I would really appreciate some help !

i think you have an argument at the end without an enclosed quote.

see the last line in your screenshot with “—type”…

2 Likes

I struggled so much with my data before that I didn't think the solution could be that obvious (there was also the "R" after "Sequence" but it got added when I rerun the code to get a screenshot of what was happening)......

Thank you and so sorry about this post !

1 Like

I would also like to ask a question about the seqkit command used before.

After using it and importing the database on qiime2, I noticed that my file size went from 115 MO to 21 MO
So lot of information from the database seem to have been lost. Is there anything in this command that I could modify ? Or is it totally normal ?

Another things, I still get this error message when trying to import the database :

database/fasta.degapped.fa is not a(n) AlignedDNAFASTAFormat file:

The sequence starting on line 6 was length 408. All previous sequences were length 659. All sequences must be the same length for AlignedDNAFASTAFormat.

The error is telling you that the FASTA file you are trying to import is no longer in the form of an alignment. That is each sequence (nucleotide bases, indels, etc..) should be the same length. Like this:

ACTGCA--ACTG--CAC
ACTGCATTACTGACCAC

Given the name of your file is fasta.degapped.fna, tells me that you removed gap and missing (-.) characters from your sequences. Which means they won't be the same length and will look like this:

ACTGCAACTGCAC
ACTGCATTACTGACCAC

Thus you can only import them as DNAFASTAFormat, not AlignedDNAFASTAFormat.

Thank you for your answers ! In the end I used another software to rename the duplicates and to align the sequences before importing them. And it worked !

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.