How to deal with '-' in reference sequences to create a custom database by Rescript

timanix · December 17, 2021, 12:22pm

Dear all!
I am trying to create a custom database based on 16S rRNA from our assemblies and now noticed that some of the sequences have '-', one or several in a row. Which approach will be better to deal with them - just delete them completely or replace with some of specific IUPAC characters? Assemblies were obtained with spades (contigs).

Nicholas_Bokulich · December 17, 2021, 12:31pm

Hi @timanix !

Sounds like you are starting with aligned sequences. You can import these as FeatureData[AlignedSequence] and then degap with RESCRIPt.

timanix · December 17, 2021, 12:41pm

Thank you for your reply! Will do it like this

system · January 17, 2022, 6:42pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.