lost all most all the sequence during Vsearch join-pairs command

rahul_gandhi · February 3, 2021, 3:32pm

Hi everyone,
I Hope, all are safe.

I am an amateur who wants to study microbial diversity analysis/amplicon studies.
For my graduate studies, I need to study the algal diversity of a particular environment. My dataset is paired-end sequences. After importing the dataset, I visualized the sequence quality and the sequence count. Then I did the Vsearch join-pairs command to merge the forward and reverse reads, and it removed most of the sequences from the sample. I tried to change the minimum overlap length to 50 bp, it further reduced the sequence count. I want to understand why it removed most of the sequences.

NOTE: my amplicon size is 350bp. forward and reverse read size is 300bp each, therefore the overlap length should be ~250bp. However, it produces ~600bp read when I tried to join the paired-end sequences.
Pre-joined sequence count:
demux_seqs.qzv (317.3 KB)

joined-sequence count:
demux-joined.qzv (298.9 KB)

ChrisKeefe · February 4, 2021, 10:52pm

Hi @rahul_gandhi,
It looks like you've got deeply sequenced data with good quality scores , but on top of the length issue, you seem to be losing the majority of your reads.

Have you considered re-running this command with --verbose? That will give you a description of how many reads were dropped from each sample, and will describe why. That may help you figure out what's going on.

This doesn't surprise me. You're asking vsearch to match 5x more nucleotides in order to join a read - that's a more stringent requirement, especially if you don't increase the number of mismatches allowed. I'd probably leave this parameter with its default setting, unless you have a good reason to think you have too many false-positive joined reads.

Let us know what you learn,
Chris

system · March 8, 2021, 4:52am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.