I am new to the QIIME2 forum and am hoping to get some assistance with a preliminary dataset. I have demultiplexed paired reads from the fungal ITS1 region and I am using QIIME2 on a VirtualBox. I imported my data into an artifact and began to join the forward and reverse reads using vsearch join-pairs. I generated a visualization and my number of joined reads looks good. demux.qzv (315.3 KB) hudprelim-joined-no-cut20210418.qzv (295.6 KB)
However, I realized that my reads still contained primers attached at the 5' ends. So, I used cut-adapt trim-paired to remove the primers from both forward and reverse reads, then paired my now trimmed reads:
I checked to see if I lost any during the cutadapt step, but it looks as though I kept all my reads post-trimming. hudprelim-cut20210418.qzv (321.1 KB)
Something is happening when I go to join my trimmed forward and reverse reads that is causing me to lose a lot of data, but I cannot figure out what the issue may be. I would greatly appreciate any help or insight anyone has.
Hello!
I am not sure if it is a case, but since with primers you successfully joined most of the pairs the issue may be in shorter length of reads after trimming.
Could you try to add --p-allowmergestagger
parameter to vsearch command?
As far as I know, ITS1 is quite variable in size and this parameter will allow you to join pairs that overlap completely.
Thank you so much! It looks like --p-allowmergestagger took care of my issue completely. I am retaining 10x the number reads now. Your advice also helped me find another topic thread which used vsearch join-pairs --verbose to estimate the number of reads that failed to join, specifically because of staggering. Linking it here so others can find:
I tried this without --p-allowmergestagger and it looks like staggered read pairs were the primary reason my joins failed.
Thank you for the insight! @timanix’s suggestion worked great for my dataset and my next step is re-run cutadapt with these parameters. When using --p-match-read-wildcards \ --p-match-adapter-wildcards \ --p-discard-untrimmed should I indicate True for them both, in order to increase my matching potential?