How to deal with '-' in reference sequences to create a custom database by Rescript

Dear all!
I am trying to create a custom database based on 16S rRNA from our assemblies and now noticed that some of the sequences have '-', one or several in a row. Which approach will be better to deal with them - just delete them completely or replace with some of specific IUPAC characters? Assemblies were obtained with spades (contigs).

Hi @timanix !

Sounds like you are starting with aligned sequences. You can import these as FeatureData[AlignedSequence] and then degap with RESCRIPt.

2 Likes

Thank you for your reply! Will do it like this

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.