Cutadapt plug-in error: Reads are improperly paired

Hi, wondering if anyone can help me.

I'm running Qiime2-2023.2, Conda environment via my Uni HPC cluster on Linux.

I've downloaded some SRA data using SraToolKit and Sradownloader, and have managed to create a demux qza using a manifest file. So far so good.
Then I've gone ahead with trimming the primers, as provided in the research paper.

Command used was:
qiime cutadapt trim-paired
--i-demultiplexed-sequences $HOME/data/SRA/zhang/demux/zhang_paired_demux.qza
--o-trimmed-sequences $HOME/data/SRA/zhang/demux/zhang_trimmed_1.qza
&> primer_trimming.log

The verbose log I got showed that the trimming appears to be working well, then all of a sudden I get this error:
Command line parameters: --cores 1 --error-rate 0.1 --times 1 --overlap 3 --minimum-length 1 -q 0,0 --quality-base 33 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-z4huq7ej/SRR12115264_105_L001_R1_001.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-z4huq7ej/SRR12115264_225_L001_R2_001.fastq.gz --front ACTCCTACGGGAGGCAGCA -G GGACTACHVGGGTWTCTAAT --match-read-wildcards --discard-untrimmed /tmp/qiime2/rjs202/data/95092ed2-783f-4794-b454-6abbc734b63c/data/SRR12115264_105_L001_R1_001.fastq.gz /tmp/qiime2/rjs202/data/95092ed2-783f-4794-b454-6abbc734b63c/data/SRR12115264_225_L001_R2_001.fastq.gz

Processing paired-end reads on 1 core ...
ERROR: Error in sequence file at unknown line: Reads are improperly paired. Read name 'SRR12115264.30992 30992/2' in file 1 does not match 'SRR12115264.1 1/2' in file 2.

So it looks like the naming convention for the paired reads has gone awry at this point, but I can't understand why or how to fix it, unless I just remove this one sample from the batch and hope that the others work ok.
I have rechecked my manifest file and it is correctly pointing to the right pairs, and the naming conventions used do not differ as far as I can see. I wonder if it is something to do with how the SRA files were downloaded from the SRA but that is a bit beyond my knowledge.

Thanks in advance for any help anyone can offer. I know similar threads have been posted on this subject, but couldn't see a problem that was exactly like mine.

1 Like


Can you post the commands for your import and any other actions you may have run on your data?

Sure, the commands for import were:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path $HOME/data/SRA/zhang/demux/ZhangManifest2.txt
--output-path $HOME/data/SRA/zhang/demux/zhang_demux2.qza
--input-format PairedEndFastqManifestPhred33V2

Since I posted I tried taking out the pair of reads that were causing the trouble to see what would happen. When I ran cutadapt again, it snagged on another pair of reads. I looked a bit closer at these in the manifest file (which I'd combined with the Accession file from SRA). There is a field called "AvgSpotLen". The majority of samples have a value of 494 here, but the reads that were showing as improperly paired had lower values in this field, mostly of 382.
I've read that Spot Length is Illumina specific and has something to do with read length including both biological, adapter and primer sequences. Since there were about 5 reads with a lower AvgSpotLen, I removed them all from the manifest file, and this time cutadapt worked perfectly. I'm not sure why this would cause a problem though!
Thanks again for taking the time to look.

1 Like

Hi @owlpen,

Have you tried q2-fondue to fetch SRA data? It might make your life much easier, by making sure the data is formatted correctly for other downstream QIIME 2 processing. :beach_umbrella:


Hi Mike,
I'd not heard of fondue before now, but I will definitely go ahead and read up and install this.
Thanks very much for the heads up.
What a brilliant community!


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.