get-ncbi-data function returns a FeatureData[Sequence] formatted object, correct? If the NCBI data contains any gap (
-) characters, I’m wondering how to get rid of those with RESCRIPt/QIIME functions.
One option might be to use
qiime rescript degap-seqs, but the input for that is a different file format (FeatureData[AlignedSequence]). Perhaps @SoilRotifer might find a way to allow for the input to that function accept both AlignedSequence and Sequence format types?
Thanks for the assistance!
The sequences should not have any gaps — and QIIME 2 should prevent that action from saving a
FeatureData[Sequence] artifact if there are any gaps. Are you finding otherwise? If so this is something we can fix in
Haven’t received any data yet, and I’m hoping for the best
Nevertheless, I’m raising this concern only because when I pulled COI data from BOLD, there were plenty of instances of gap characters in those sequences. I’ll let you know about the sequence composition of NCBI data once I get it all downloaded.
I’ll look for any potential error when downloading with
get-ncbi-data regarding the gap artifacts too. Thanks