Invalid character in sequence: b'U'

When I try to train the classifier, after the sequence and taxonomy files are successfully imported, when I extract the sequence, there is always an error reminder as shown in the screenshot. This is still the case after repeated revisions. Why? How to solve this problem? I hope you can give me some help.

Hi @LiyingXie,
This is because your sequences consist of RNA, not DNA sequences. Looks like you are trying to import and use the raw SILVA sequences! So you are importing as the wrong data type.

As you are working with SILVA, I recommend using this tutorial for the RESCRIPt plugin, which will make downloading, formatting, and using SILVA much easier:

The outputs of this tutorial — RESCRIPt-formatted SILVA sequences, taxonomy, and taxonomy classifiers — are also available here, which will save you a lot of time:
https://docs.qiime2.org/2020.11/data-resources/

Good luck!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.