kcn35
(Kimberly)
March 5, 2025, 5:05pm
1
I ran into the same issue and fixed it by following these commands:
check the maximum allowed size of arguments: getconf ARG_MAX
check myval length: echo ${#myval}
write the contents of $myval
into a separate .sed
file. echo "$myval" > commands.sed
Then, use sed
with the -f
(file) option: sed -f commands.sed bold_allrawSeqs.fasta > cleaned_bold_allrawSeqs.fasta
This last line takes a few minutes to run (~ 10 min)
1 Like
kcn35
(Kimberly)
March 8, 2025, 10:02pm
2
Hi @devonorourke and @Toan .
I also used Chris Fields suggestion with seqkit seqkit seq -w0 -g bold_rawSeqs_forQiime.fasta > bold_rawSeqs_forQiime.degapped.fasta
But when trying to import the file to qiime it gets me using the type 'FeatureData[Sequence]' and this commands, i get an error message.
qiime tools import
--input-path /workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta
--output-path bold_rawSeqs.qza
--type 'FeatureData[Sequence]'
There was a problem importing /workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta:
/workdir/kcn35/bold_rawSeqs_forQiime.degapped.fasta is not a(n) DNAFASTAFormat file:
Multiple consecutive descriptions starting on line 1096
I also tried importing it following the “EMP protocol” multiplexed single-end fastq but i would get an error message saying that the quality scores do not match the length of the sequences, and it does not, because when i check the file, there are no quality scores
I have been struggling with this in the last days and have not get a clue yet. Any help would be highly appreciated.
FYI I am using a cluster with 24 cores and 128 RAM, that I access with Putty and this version qiime2-amplicon-2024.10
Hi @kcn35 ,
I think the issue is this:
Multiple consecutive descriptions starting on line 1096
This tells me it is a parsing error of some kind. Often this results form multiple ID lines without sequence , i.e. :
>seq01
>seq02
...
It could be something else too. Can you tale a look at that line, and paste it along with several of the following lines of output here?
The FASTA labels 1.1e+07
& 1.7e+07
also makes me think this is a parsing issue too.
You can also DM me a link to the FASTA file you are trying to import.
Importing via the “EMP protocol” will not work as that is only for FASTQ files.
-Mike
1 Like