Updating to new version of Qiime, classifier issues with "-" character

We are trying to update to qiime2-2023.5 and when running the following command, the error message below appears.

Command run: qiime tools import --type 'FeatureData[Sequence]' --input-path /adata/myseqdata/FF_Qiime_Databases/ITS2/ITS2_BOLD_230623_qiime_2023.5/ITS2160323_FASTA.fasta --output-path /adata/myseqdata/FF_Qiime_Databases/ITS2/ITS2_BOLD_230623_qiime_2023.5/Custom_FASTA_reads.qza

Error message: There was a problem importing /adata/myseqdata/FF_Qiime_Databases/ITS2/ITS2_BOLD_230623_qiime_2023.5/ITS2160323_FASTA.fasta:

/adata/myseqdata/FF_Qiime_Databases/ITS2/ITS2_BOLD_230623_qiime_2023.5/ITS2160323_FASTA.fasta is not a(n) DNAFASTAFormat file:

Invalid character '.' at position 365 on line 4 (does not match IUPAC characters for this sequence type). Allowed characters are ACGTRYKMSWBDHVN.

Hello @ethanstrak,

Did you open the fasta file and check that position? Is there in fact a period there?

I'm not sure what this has to do with updating the version of qiime, do you mean that this import worked in an older version of qiime, but now doesn't work?

Hi @colinvwood ,

Yes. The fasta file has a "-" there and in many other places. We use it to show variable regions in a sequence or regions that may not have been fully characterised as part of a larger known sequence. This has worked fine on all the previous versions of qiime we have used. The issue has only arisen in running this command as part of preparing our classifiers for use in the newer version.

Hello @ethanstrak,

The traceback says there is a . (period) in that file, but you're saying it's a - (hyphen). Not sure what's going on there.

Do you know which version of qiime it was that allowed you to import such sequences?

Hi @colinvwood

Sorry about the confusion of period or hyphen. We tried the hyphen first as that was what as in the fasta file originally but then we tried changing it to a period to see if that changed anything (it didn't). I must have copied across the second error message when we tried it with the period rather than the hyphen (the error messages were otherwise the same).

We were previously using qiime2-2019.4

Hello @ethanstrak,

This was changed in 2021 to no longer allow periods or hyphens in the alphabet. I'm unsure of the reasoning around this decision. I will reach out to one of the developers who worked on this and get back to you.

For context, are you importing these sequences to later train a classifier on?

Thanks @colinvwood

Yes. We're trying to train our new classifiers.

Hello @ethanstrak,

We noticed that your description of what you're using the hyphens to represent might be better represented as 'N', unless we're missing something. Hyphens generally indicate gaps in sequence alignments.

1 Like

Hi @colinvwood,

Thank you for this. We have tried this and everything is working now.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.