Lose many reads after running cut-adapt and join-pairs

Hello everyone!

I am new to the QIIME2 forum and am hoping to get some assistance with a preliminary dataset. I have demultiplexed paired reads from the fungal ITS1 region and I am using QIIME2 on a VirtualBox. I imported my data into an artifact and began to join the forward and reverse reads using vsearch join-pairs. I generated a visualization and my number of joined reads looks good.
demux.qzv (315.3 KB)
hudprelim-joined-no-cut20210418.qzv (295.6 KB)

However, I realized that my reads still contained primers attached at the 5' ends. So, I used cut-adapt trim-paired to remove the primers from both forward and reverse reads, then paired my now trimmed reads:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences hudprelim-pe33-demux.qza \
--o-trimmed-sequences hudprelim-cut20210418.qza 

qiime vsearch join-pairs \
--i-demultiplexed-seqs hudprelim-cut20210418.qza \
--o-joined-sequences hudprelim-joined-cut20210418.qza

However, I lose most of my reads when I do this.
hudprelim-joined-cut20210418.qzv (300.6 KB)

I checked to see if I lost any during the cutadapt step, but it looks as though I kept all my reads post-trimming. hudprelim-cut20210418.qzv (321.1 KB)
Something is happening when I go to join my trimmed forward and reverse reads that is causing me to lose a lot of data, but I cannot figure out what the issue may be. I would greatly appreciate any help or insight anyone has.


I am not sure if it is a case, but since with primers you successfully joined most of the pairs the issue may be in shorter length of reads after trimming.
Could you try to add
parameter to vsearch command?
As far as I know, ITS1 is quite variable in size and this parameter will allow you to join pairs that overlap completely.

Hi @sjcochran, to add to @timanix’s excellent advice, I often recommend adding the following flags when running cutadapt:

Finally, if these do not help, you may want to consider using only the forward read as outline in Hguyen et al. 2015.


Thank you so much! It looks like --p-allowmergestagger took care of my issue completely. I am retaining 10x the number reads now. Your advice also helped me find another topic thread which used vsearch join-pairs --verbose to estimate the number of reads that failed to join, specifically because of staggering. Linking it here so others can find:

I tried this without --p-allowmergestagger and it looks like staggered read pairs were the primary reason my joins failed.


Thank you for the insight! @timanix’s suggestion worked great for my dataset and my next step is re-run cutadapt with these parameters. When using --p-match-read-wildcards \ --p-match-adapter-wildcards \ --p-discard-untrimmed should I indicate True for them both, in order to increase my matching potential?

Just list them, these are value-less flags, and order does not matter. :slight_smile:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.