During dereplication with VSEARCH (
vsearch dereplicate-sequences ), the feature IDs are replaced with hashes of the sequence. I’m using Q2 for working with Sanger sequences. If that feature could be turned off selectively so I don’t have to reconnnect my sequences to their original ID later on, that would be very useful.
Thanks for the suggestion @sformel! I have opened an issue to get that feature added some time in the future. Contributions are always very welcome if you want to take a swing at it
While Q2-vsearch changes the feature-IDs, using vsearch directly will not. Have you considered simply running vsearch directly?
vsearch --derep_fulllength FILENAME --output FILENAME --sizeout
Let me know if that’s helpful.
It hadn’t occurred to me, although that makes a lot of sense. I’ll give it a shot, thanks!