These samples do not have matching pairs of forward and reverse reads


While importing my data, I encountered an error saying " These samples do not have matching pairs of forward and reverse reads" together with the IDs of the samples. I tried with a subset of the problematic samples, but it worked fine. Any ideas what the problem could be?



Hello @Parix

Welcome to the forums! :qiime2:

Could you tell me a little more about how your imported your data? Like, what command did you run and what files did you pass? Did your sequencing core or another researcher do any other filtering on the data before you imported it using Qiime 2?



So I am importing the data using a manifest file with this command:
qiime tools import
–type β€˜SampleData[PairedEndSequencesWithQuality]’
–input-path manifest.tsv
–output-path demux.qza
–input-format PairedEndFastqManifestPhred64V2

I have already imported other files (from other cohorts in the same study) from the same sequencing center with a similar approach. I also tried with a toy subset of my files with a manifest and it worked just fine.



Because you are imported paired reads, Qiime assumes you have a forward and reverse file for each sample. It also assumes that the forward and reverse file has the same number of reads.

Looks like some of the files are missing reads from one or both ends. Lots of things could cause this, including a partial download, or a prefiltering step that removed reads from one file but not the other. (Some programs also check that the read names match, but I’m not sure if that’s checked here.)

This is great! We know the script it working on your system. Now we just to figure out which files are missing reads and why.

Thank you.
The thing that I don’t understand is that, I got the error for the following samples (which are not all of my files, but a big part of them):
{β€˜13’, β€˜34’, β€˜33’, β€˜12’, β€˜1’, β€˜38’, β€˜32’, β€˜17’, β€˜24’, β€˜15’, β€˜10’, β€˜18’, β€˜19’, β€˜35’, β€˜26’, β€˜30’, β€˜27’, β€˜22’, β€˜21’, β€˜29’, β€˜14’, β€˜37’, β€˜36’, β€˜16’, β€˜20’, β€˜11’, β€˜3’, β€˜28’, β€˜25’, β€˜31’, β€˜23’, β€˜2’}
However, when I tried with the first five samples (β€˜13’, β€˜34’, β€˜33’, β€˜12’, β€˜1’), it worked just fine. So, I was wondering what could be the reason that they popped up as problematic when importing all of the files, but they were fine separately?
Because when the error specifies file IDs, I assume there should be a problem with all of the reported IDs.


You could possibly have an β€œoff-by-one” type of error, which basically cascades. Could you share your manifest file for us, maybe we can tune up the validation routine a bit to provide a cleaner error message. Thanks!


Sure. Here is the manifest file.manifest.tsv (3.4 KB)


Just curious, but were the runs/samples already quality filtered in some way? Ben

1 Like

Yes, the sequencing center has already done some quality filtering (trimming, removal of adapters,…). However, I haven’t had any problems with the reads of other cohorts that have been sequenced in the same center.


Thanks @Parix β€” manifest looks okay. Can you run this command in your raw/ directory?

for f in *.fq.gz; do r=$(( $(zcat $f | wc -l | tr -d '[:space:]') / 4 )); echo $r $f; done

This should give you something like the following:

11340 L1S105_9_L001_R1_001.fastq.gz
9738 L1S140_6_L001_R1_001.fastq.gz
11337 L1S208_10_L001_R1_001.fastq.gz
8216 L1S257_11_L001_R1_001.fastq.gz
8907 L1S281_5_L001_R1_001.fastq.gz

which is a count of records in each file. This will help us identify any issues (I hope!). :crossed_fingers:

1 Like

Hi all!

I’m facing a similar issue when I try to import my paired-end data in qiime2. The code that I’m running is:

srun -p q2 -n1 --mem 50000 --export=All qiime tools import --type β€˜SampleData[PairedEndSequencesWithQuality]’ --input-path 100_seqs --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path Anas_16s.qza

The message error is:

There was a problem importing 100_seqs/:

/tmp/q2-SingleLanePerSamplePairedEndFastqDirFmt-nu7og_q9 is not a(n) SingleLanePerSamplePairedEndFastqDirFmt:

These samples do not have matching pairs of forward and reverse reads: {’.A.striata6’}

srun: error: compute1: task 0: Exited with exit code 1.

Previously, I analyzed this data in R with DADA2 pipeline. As I need a biom file for PICRUSt analysis I tried to get a biom file with my ASVs, but I faced format troubles when trying to export the phyloseq object to a biom file. I want to reanalyze my data with Qiime2, so I could get a biom file without format issues. I rerun the same command without the unmatched file β€˜.A.striata6’ but the result was the same.

Any help with overcoming this issue would be really appreciated, thanks in advance!!