I’m trying to compare the resulting sequences of a few different filtering pipelines. I’ve found that the same sequence shares the same HashIDs (good!) when comparing sequences filtered through Deblur and DADA2, but I can’t seem to find the same HashIDs in the Vsearch output.
I can confirm that both the Vsearch and Deblur and DADA2 representative sequences are trimmed to the same length, and that they were both given the same raw input fastq files to work with. What’s strange is that the number of characters of the HashIDs are different between Vsearch, and Deblur/DADA2 rep seqs. DADA2 and Deblur are always 32 characters, while Vsearch is always 40. I also noticed that the representative sequences in Vsearch’s fasta file are wrapped at 80 characters, but the DADA2 and Deblur fasta sequences are unwrapped.
I’m curious where exactly these labels are assigned - thanks for your comments!