Hi everyone,
I have some trouble with cutadapt for demultiplexing my files.
My data provides from metabarcoding sequences (Illumina), there look like :
Pair-end reads : 2 files R1 and R2
In each files we can found :
ADAPTER_BARCODE-FORWARD1_DNAsequence_BARCODE-REVERSE_ADAPTER
ADAPTER_BARCODE-REVERSE_DNAsequence_BARCODE-FORWARD1_ADAPTER1
…
I have several adapters and barcodes in each files and my sequences have several direction, it’s a bit tricky.
The first step I made, is to arrange my sequences for having one direction :
ADAPTER_BARCODE-FORWARD1_DNAsequence_BARCODE-REVERSE_ADAPTER
Script in case I have 3 Barcodes forward/reverse :
AMORCE…= The sequence of my adapter and the barcorde
Summary
AMORCEF1=$(awk '($14 == "1") && ($15 == "F") {print $16}' $NAMESEQTXT)
AMORCER1=$(awk '($14== "1") && ($15 == "R") {print $16}' $NAMESEQTXT)
...
FICHIER_R1=$(find -name "*R1*")
FICHIER_R2=$(find -name "*R2*")
FUNTRIM_R1_STEP1="Banque${i}_R1_untrimmed_step1.fastq.gz"
FUNTRIM_R2_STEP1="Banque${i}_R2_untrimmed_step1.fastq.gz"
...
#1ere amorce
echo "Etape 1/6"
echo $FICHIER_R1 $FICHIER_R2 >>$SUMMARY 2>&1
cutadapt -g $AMORCER1 -G $AMORCEF1 $FICHIER_R1 $FICHIER_R2 \
--untrimmed-output $FUNTRIM_R1_STEP1 \
--untrimmed-paired-output $FUNTRIM_R2_STEP1 \
--action=none -o $FTRIM_R1_STEP1 -p $FTRIM_R2_STEP1 \
>>$SUMMARY 2>&1
echo "Etape 2/6"
echo $FUNTRIM_R1_STEP1 $FUNTRIM_R2_STEP1 >>$SUMMARY 2>&1
cutadapt -g $AMORCEF1 -G $AMORCER1 $FUNTRIM_R1_STEP1 $FUNTRIM_R2_STEP1 \
--untrimmed-output $FUNTRIM_R1_STEP2 \
--untrimmed-paired-output $FUNTRIM_R2_STEP2 \
--action=none -o $FTRIM_R1_STEP2 -p $FTRIM_R2_STEP2 \
>>$SUMMARY 2>&1
#2eme amorce
echo "Etape 3/6"
echo $FUNTRIM_R1_STEP2 $FUNTRIM_R2_STEP2 >>$SUMMARY 2>&1
cutadapt -g $AMORCER2 -G $AMORCEF2 $FUNTRIM_R1_STEP2 $FUNTRIM_R2_STEP2 \
--untrimmed-output $FUNTRIM_R1_STEP3 \
--untrimmed-paired-output $FUNTRIM_R2_STEP3 \
--action=none -o $FTRIM_R1_STEP3 -p $FTRIM_R2_STEP3 \
>>$SUMMARY 2>&1
...
gunzip $FTRIM_R1_STEP2 ; gunzip $FTRIM_R2_STEP2
UNZIPR1="Banque${i}_R1_trimmed_step2.fastq"
UNZIPR2="Banque${i}_R2_trimmed_step2.fastq"
R1RECCOMP="Banque${i}_R1_trimmed_step2_rev_comp.fastq.gz"
R2RECCOMP="Banque${i}_R2_trimmed_step2_rev_comp.fastq.gz"
gunzip $FTRIM_R1_STEP4 ; gunzip $FTRIM_R2_STEP4
UNZIPR14="Banque${i}_R1_trimmed_step4.fastq"
UNZIPR24="Banque${i}_R2_trimmed_step4.fastq"
R1RECCOMP4="Banque${i}_R1_trimmed_step4_rev_comp.fastq.gz"
R2RECCOMP4="Banque${i}_R2_trimmed_step4_rev_comp.fastq.gz"
gunzip $FTRIM_R1_STEP6 ; gunzip $FTRIM_R2_STEP6
UNZIPR16="Banque${i}_R1_trimmed_step6.fastq"
UNZIPR26="Banque${i}_R2_trimmed_step6.fastq"
R1RECCOMP6="Banque${i}_R1_trimmed_step6_rev_comp.fastq.gz"
R2RECCOMP6="Banque${i}_R2_trimmed_step6_rev_comp.fastq.gz"
fastx_reverse_complement -z -i $UNZIPR1 -o $R1RECCOMP >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR2 -o $R2RECCOMP >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR14 -o $R1RECCOMP4 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR24 -o $R2RECCOMP4 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR16 -o $R1RECCOMP6 >>$SUMMARY 2>&1
fastx_reverse_complement -z -i $UNZIPR26 -o $R2RECCOMP6 >>$SUMMARY 2>&1
RetourCode=${?}
if (( $RetourCode == 0 ))
then
cat $FTRIM_R1_STEP1 >forward.fastq.gz ;cat $R1RECCOMP >>forward.fastq.gz
cat $FTRIM_R1_STEP3 >>forward.fastq.gz;cat $R1RECCOMP4 >>forward.fastq.gz
cat $FTRIM_R1_STEP5 >>forward.fastq.gz;cat $R1RECCOMP6 >>forward.fastq.gz
cat $FTRIM_R2_STEP1 >reverse.fastq.gz ;cat $R2RECCOMP >>reverse.fastq.gz
cat $FTRIM_R2_STEP3 >>forward.fastq.gz;cat $R2RECCOMP4 >>forward.fastq.gz
cat $FTRIM_R2_STEP5 >>forward.fastq.gz;cat $R2RECCOMP6 >>forward.fastq.gz
mv forward.fastq.gz $NAMESEQ ; mv reverse.fastq.gz $NAMESEQ
This step works fine, I have no errors.
Then I import my data :
(i=40)
printf '/n========== Banque $i ============' >>$SUMMARY2 2>&1
printf '/n IMPORTATION /n' >>$SUMMARY2 2>&1
qiime tools import \
--type MultiplexedPairedEndBarcodeInSequence \
--input-path $NAMESEQ \
--output-path ${NAMESEQ}.qza \
>>$SUMMARY2 2>&1
RESULT : Imported B40seq as MultiplexedPairedEndBarcodeInSequenceDirFmt to B40seq.qza
The problem appear when I try to demultiplexe :
printf '/n DEMULTIPLEXAGE /n' >>$SUMMARY2 2>&1
qiime cutadapt demux-paired \
--i-seqs ${NAMESEQ}.qza \
--m-forward-barcodes-file $NAMESEQTXT \
--m-forward-barcodes-column Demul_seq \
--p-error-rate 0 \
--o-per-sample-sequences demultiplexed-seqs-${NAME}.qza \
--o-untrimmed-sequences unknown_tag_${NAME}.qza \
--verbose \
>>$SUMMARY2 2>&1
Command: cutadapt --front file:/tmp/tmpgyaizcep --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/reverse.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/forward.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/reverse.fastq.gz
This is cutadapt 1.18 with Python 3.6.7
Command line parameters: --front file:/tmp/tmpgyaizcep --error-rate 0.0 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.1.fastq.gz --untrimmed-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/forward.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-xauw14d7/{name}.2.fastq.gz --untrimmed-paired-output /tmp/q2-MultiplexedPairedEndBarcodeInSequenceDirFmt-780z1m65/reverse.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/forward.fastq.gz /tmp/qiime2-archive-nd_84ip6/71c7dec0-9f63-48fd-a815-835290a57175/data/reverse.fastq.gz
Processing reads on 1 core in paired-end legacy mode ...
**cutadapt: error: Reads are improperly paired. There are more reads in file 1 than in file 2.**
I have understand that I haven’t the same numbers of sequences in my files R1 and R2, but I didn’t how to fix it …
(I have looked in the forward and reverse fasta files, the names of my sequences still corresponding…).
Have you I idea how I can manage this ?
PS : When I demultiplexe my files without out arrange my data, its works and also when I arrange my data but there are just one Barcorde forward and reverse…