Demultiplexing cutadapt Error: Reads are improperly paired ; after arrange direction of reads

Hi everyone,
I have some trouble with cutadapt for demultiplexing my files.

My data provides from metabarcoding sequences (Illumina), there look like :
Pair-end reads : 2 files R1 and R2
In each files we can found :
ADAPTER_BARCODE-FORWARD1_DNAsequence_BARCODE-REVERSE_ADAPTER
ADAPTER_BARCODE-REVERSE_DNAsequence_BARCODE-FORWARD1_ADAPTER1

I have several adapters and barcodes in each files and my sequences have several direction, it’s a bit tricky.

The first step I made, is to arrange my sequences for having one direction :
ADAPTER_BARCODE-FORWARD1_DNAsequence_BARCODE-REVERSE_ADAPTER

Script in case I have 3 Barcodes forward/reverse :
AMORCE…= The sequence of my adapter and the barcorde

Summary
AMORCEF1=$(awk '($14 == "1") && ($15 == "F") {print $16}' $NAMESEQTXT)
AMORCER1=$(awk '($14== "1") && ($15 == "R") {print $16}' $NAMESEQTXT)
...
FICHIER_R1=$(find -name "*R1*")
FICHIER_R2=$(find -name "*R2*")
FUNTRIM_R1_STEP1="Banque${i}_R1_untrimmed_step1.fastq.gz"
FUNTRIM_R2_STEP1="Banque${i}_R2_untrimmed_step1.fastq.gz"
...
#1ere amorce				
echo "Etape 1/6"
echo $FICHIER_R1 $FICHIER_R2 >>$SUMMARY 2>&1
cutadapt -g $AMORCER1 -G $AMORCEF1 $FICHIER_R1 $FICHIER_R2 \
	--untrimmed-output $FUNTRIM_R1_STEP1 \
	--untrimmed-paired-output $FUNTRIM_R2_STEP1 \
	--action=none -o $FTRIM_R1_STEP1 -p $FTRIM_R2_STEP1 \
	>>$SUMMARY 2>&1

echo "Etape 2/6"
echo $FUNTRIM_R1_STEP1 $FUNTRIM_R2_STEP1 >>$SUMMARY 2>&1
cutadapt -g $AMORCEF1 -G $AMORCER1 $FUNTRIM_R1_STEP1 $FUNTRIM_R2_STEP1 \
	--untrimmed-output $FUNTRIM_R1_STEP2 \
	--untrimmed-paired-output $FUNTRIM_R2_STEP2 \
	--action=none -o $FTRIM_R1_STEP2 -p $FTRIM_R2_STEP2 \
	>>$SUMMARY 2>&1
					
#2eme amorce
echo "Etape 3/6"
echo $FUNTRIM_R1_STEP2 $FUNTRIM_R2_STEP2 >>$SUMMARY 2>&1
cutadapt -g $AMORCER2 -G $AMORCEF2 $FUNTRIM_R1_STEP2 $FUNTRIM_R2_STEP2 \
	--untrimmed-output $FUNTRIM_R1_STEP3 \
	--untrimmed-paired-output $FUNTRIM_R2_STEP3 \
	--action=none -o $FTRIM_R1_STEP3 -p $FTRIM_R2_STEP3 \
	>>$SUMMARY 2>&1
...
gunzip $FTRIM_R1_STEP2 ; gunzip $FTRIM_R2_STEP2
UNZIPR1="Banque${i}_R1_trimmed_step2.fastq"
UNZIPR2="Banque${i}_R2_trimmed_step2.fastq"
					R1RECCOMP="Banque${i}_R1_trimmed_step2_rev_comp.fastq.gz"
R2RECCOMP="Banque${i}_R2_trimmed_step2_rev_comp.fastq.gz"

gunzip $FTRIM_R1_STEP4 ; gunzip $FTRIM_R2_STEP4
UNZIPR14="Banque${i}_R1_trimmed_step4.fastq"
UNZIPR24="Banque${i}_R2_trimmed_step4.fastq"
					R1RECCOMP4="Banque${i}_R1_trimmed_step4_rev_comp.fastq.gz"
R2RECCOMP4="Banque${i}_R2_trimmed_step4_rev_comp.fastq.gz"

gunzip $FTRIM_R1_STEP6 ; gunzip $FTRIM_R2_STEP6
UNZIPR16="Banque${i}_R1_trimmed_step6.fastq"
UNZIPR26="Banque${i}_R2_trimmed_step6.fastq"
					R1RECCOMP6="Banque${i}_R1_trimmed_step6_rev_comp.fastq.gz"
R2RECCOMP6="Banque${i}_R2_trimmed_step6_rev_comp.fastq.gz"

fastx_reverse_complement -z -i $UNZIPR1 -o $R1RECCOMP >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR2 -o $R2RECCOMP >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR14 -o $R1RECCOMP4 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR24 -o $R2RECCOMP4 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR16 -o $R1RECCOMP6 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR26 -o $R2RECCOMP6 >>$SUMMARY 2>&1
RetourCode=${?}
if (( $RetourCode == 0 ))
then
cat $FTRIM_R1_STEP1 >forward.fastq.gz ;cat $R1RECCOMP >>forward.fastq.gz
cat $FTRIM_R1_STEP3 >>forward.fastq.gz;cat $R1RECCOMP4 >>forward.fastq.gz
cat $FTRIM_R1_STEP5 >>forward.fastq.gz;cat $R1RECCOMP6 >>forward.fastq.gz
cat $FTRIM_R2_STEP1 >reverse.fastq.gz ;cat $R2RECCOMP >>reverse.fastq.gz
cat $FTRIM_R2_STEP3 >>forward.fastq.gz;cat $R2RECCOMP4 >>forward.fastq.gz
cat $FTRIM_R2_STEP5 >>forward.fastq.gz;cat $R2RECCOMP6 >>forward.fastq.gz
mv forward.fastq.gz $NAMESEQ ; mv reverse.fastq.gz $NAMESEQ 

This step works fine, I have no errors.

Then I import my data :

(i=40)
printf '/n========== Banque $i ============' >>$SUMMARY2 2>&1
printf '/n IMPORTATION /n' >>$SUMMARY2 2>&1				
qiime tools import \
	--type MultiplexedPairedEndBarcodeInSequence \
 	--input-path $NAMESEQ \
	--output-path ${NAMESEQ}.qza \
	>>$SUMMARY2 2>&1

RESULT : Imported B40seq as MultiplexedPairedEndBarcodeInSequenceDirFmt to B40seq.qza

The problem appear when I try to demultiplexe :

printf '/n DEMULTIPLEXAGE /n' >>$SUMMARY2 2>&1					
qiime cutadapt demux-paired \
	--i-seqs ${NAMESEQ}.qza \
	--m-forward-barcodes-file $NAMESEQTXT \
	--m-forward-barcodes-column Demul_seq \
	--p-error-rate 0 \
	--o-per-sample-sequences demultiplexed-seqs-${NAME}.qza \
       --o-untrimmed-sequences unknown_tag_${NAME}.qza \
       --verbose \
	>>$SUMMARY2 2>&1
Command: cutadapt --front file:/tmp/tmpgyaizcep --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/reverse.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/forward.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/reverse.fastq.gz

This is cutadapt 1.18 with Python 3.6.7
Command line parameters: --front file:/tmp/tmpgyaizcep --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/reverse.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/forward.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/reverse.fastq.gz
Processing reads on 1 core in paired-end legacy mode ...
**cutadapt: error: Reads are improperly paired. There are more reads in file 1 than in file 2.**

I have understand that I haven’t the same numbers of sequences in my files R1 and R2, but I didn’t how to fix it …
(I have looked in the forward and reverse fasta files, the names of my sequences still corresponding…).

Have you I idea how I can manage this ?

PS : When I demultiplexe my files without out arrange my data, its works and also when I arrange my data but there are just one Barcorde forward and reverse…

2 Likes

Hi @marioncdl, I am reclassifying this to “other bioinformatics tools”, because it looks like the script you provided above is generating invalid fastq file-pairs:

Once you get your data in order we can help you with any issues you might be having in QIIME 2, but for now, this issue appears to be caused by an issue in this script generating mismatched file pairs.