Importing COI data from BOLD

I ran into the same issue and fixed it by following these commands:

  1. check the maximum allowed size of arguments: getconf ARG_MAX
  2. check myval length: echo ${#myval}
  3. write the contents of $myval into a separate .sed file. echo "$myval" > commands.sed
  4. Then, use sed with the -f (file) option: sed -f commands.sed bold_allrawSeqs.fasta > cleaned_bold_allrawSeqs.fasta

This last line takes a few minutes to run (~ 10 min)

1 Like

Hi @devonorourke and @Toan .

I also used Chris Fields suggestion with seqkit seqkit seq -w0 -g bold_rawSeqs_forQiime.fasta > bold_rawSeqs_forQiime.degapped.fasta

But when trying to import the file to qiime it gets me using the type 'FeatureData[Sequence]' and this commands, i get an error message.

qiime tools import
--input-path /workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta
--output-path bold_rawSeqs.qza
--type 'FeatureData[Sequence]'

There was a problem importing /workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta:
/workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta is not a(n) DNAFASTAFormat file:

Multiple consecutive descriptions starting on line 1096

I also tried importing it following the “EMP protocol” multiplexed single-end fastq but i would get an error message saying that the quality scores do not match the length of the sequences, and it does not, because when i check the file, there are no quality scores

I have been struggling with this in the last days and have not get a clue yet. Any help would be highly appreciated.

FYI I am using a cluster with 24 cores and 128 RAM, that I access with Putty and this version qiime2-amplicon-2024.10

Hi @kcn35,

I think the issue is this:

Multiple consecutive descriptions starting on line 1096

This tells me it is a parsing error of some kind. Often this results form multiple ID lines without sequence , i.e.:

>seq01
>seq02
...

It could be something else too. Can you tale a look at that line, and paste it along with several of the following lines of output here?

The FASTA labels 1.1e+07 & 1.7e+07 also makes me think this is a parsing issue too.

You can also DM me a link to the FASTA file you are trying to import.

Importing via the “EMP protocol” will not work as that is only for FASTQ files.

-Mike

1 Like