Lose many reads after running cut-adapt and join-pairs

Hello everyone!

I am new to the QIIME2 forum and am hoping to get some assistance with a preliminary dataset. I have demultiplexed paired reads from the fungal ITS1 region and I am using QIIME2 on a VirtualBox. I imported my data into an artifact and began to join the forward and reverse reads using vsearch join-pairs. I generated a visualization and my number of joined reads looks good.
demux.qzv (315.3 KB)
hudprelim-joined-no-cut20210418.qzv (295.6 KB)

However, I realized that my reads still contained primers attached at the 5' ends. So, I used cut-adapt trim-paired to remove the primers from both forward and reverse reads, then paired my now trimmed reads:

qiime cutadapt trim-paired \
--i-demultiplexed-sequences hudprelim-pe33-demux.qza \
--p-front-f CTTGGTCATTTAGAGGAAGTAA \
--p-front-r GCTGCGTTCTTCATCGATGC \
--o-trimmed-sequences hudprelim-cut20210418.qza 

qiime vsearch join-pairs \
--i-demultiplexed-seqs hudprelim-cut20210418.qza \
--o-joined-sequences hudprelim-joined-cut20210418.qza

However, I lose most of my reads when I do this.
hudprelim-joined-cut20210418.qzv (300.6 KB)

I checked to see if I lost any during the cutadapt step, but it looks as though I kept all my reads post-trimming. hudprelim-cut20210418.qzv (321.1 KB)
Something is happening when I go to join my trimmed forward and reverse reads that is causing me to lose a lot of data, but I cannot figure out what the issue may be. I would greatly appreciate any help or insight anyone has.

Thanks!
Sam

Hello!
I am not sure if it is a case, but since with primers you successfully joined most of the pairs the issue may be in shorter length of reads after trimming.
Could you try to add
--p-allowmergestagger
parameter to vsearch command?
As far as I know, ITS1 is quite variable in size and this parameter will allow you to join pairs that overlap completely.

Hi @sjcochran, to add to @timanix's excellent advice, I often recommend adding the following flags when running cutadapt:

Finally, if these do not help, you may want to consider using only the forward read as outline in Hguyen et al. 2015.

-Mike

Thank you so much! It looks like --p-allowmergestagger took care of my issue completely. I am retaining 10x the number reads now. Your advice also helped me find another topic thread which used vsearch join-pairs --verbose to estimate the number of reads that failed to join, specifically because of staggering. Linking it here so others can find:

I tried this without --p-allowmergestagger and it looks like staggered read pairs were the primary reason my joins failed.

Thanks!
-Sam

Thank you for the insight! @timanix's suggestion worked great for my dataset and my next step is re-run cutadapt with these parameters. When using --p-match-read-wildcards \ --p-match-adapter-wildcards \ --p-discard-untrimmed should I indicate True for them both, in order to increase my matching potential?

Just list them, these are value-less flags, and order does not matter. :slight_smile:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.