The get-ncbi-data
function returns a FeatureData[Sequence] formatted object, correct? If the NCBI data contains any gap (-
) characters, I’m wondering how to get rid of those with RESCRIPt/QIIME functions.
One option might be to use qiime rescript degap-seqs
, but the input for that is a different file format (FeatureData[AlignedSequence]). Perhaps @SoilRotifer might find a way to allow for the input to that function accept both AlignedSequence and Sequence format types?
Thanks for the assistance!
The sequences should not have any gaps — and QIIME 2 should prevent that action from saving a FeatureData[Sequence]
artifact if there are any gaps. Are you finding otherwise? If so this is something we can fix in get-ncbi-data
.
Haven’t received any data yet, and I’m hoping for the best
Nevertheless, I’m raising this concern only because when I pulled COI data from BOLD, there were plenty of instances of gap characters in those sequences. I’ll let you know about the sequence composition of NCBI data once I get it all downloaded.
I’ll look for any potential error when downloading with get-ncbi-data
regarding the gap artifacts too. Thanks
1 Like