How to deal with '-' in reference sequences to create a custom database by Rescript

Dear all!
I am trying to create a custom database based on 16S rRNA from our assemblies and now noticed that some of the sequences have '-', one or several in a row. Which approach will be better to deal with them - just delete them completely or replace with some of specific IUPAC characters? Assemblies were obtained with spades (contigs).

Hi @timanix !

Sounds like you are starting with aligned sequences. You can import these as FeatureData[AlignedSequence] and then degap with RESCRIPt.


Thank you for your reply! Will do it like this

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.