These samples do not have matching pairs of forward and reverse reads

Parix · October 30, 2019, 12:34pm

Hi,

While importing my data, I encountered an error saying " These samples do not have matching pairs of forward and reverse reads" together with the IDs of the samples. I tried with a subset of the problematic samples, but it worked fine. Any ideas what the problem could be?

Thanks

Parix · October 30, 2019, 12:34pm

Hi,

While importing my data, I encountered an error saying " These samples do not have matching pairs of forward and reverse reads" together with the IDs of the samples. I tried with a subset of the problematic samples, but it worked fine. Any ideas what the problem could be?

Thanks

colinbrislawn · October 30, 2019, 1:33pm

Hello @Parix

Welcome to the forums! :qiime2:

Could you tell me a little more about how your imported your data? Like, what command did you run and what files did you pass? Did your sequencing core or another researcher do any other filtering on the data before you imported it using Qiime 2?

Thanks,
Colin

Parix · October 30, 2019, 1:49pm

Hi!

So I am importing the data using a manifest file with this command:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.tsv
--output-path demux.qza
--input-format PairedEndFastqManifestPhred64V2

I have already imported other files (from other cohorts in the same study) from the same sequencing center with a similar approach. I also tried with a toy subset of my files with a manifest and it worked just fine.

colinbrislawn · October 30, 2019, 1:53pm

Thanks!

Because you are imported paired reads, Qiime assumes you have a forward and reverse file for each sample. It also assumes that the forward and reverse file has the same number of reads.

Looks like some of the files are missing reads from one or both ends. Lots of things could cause this, including a partial download, or a prefiltering step that removed reads from one file but not the other. (Some programs also check that the read names match, but I'm not sure if that's checked here.)

This is great! We know the script it working on your system. Now we just to figure out which files are missing reads and why.

Parix · October 30, 2019, 2:55pm

Thank you.
The thing that I don't understand is that, I got the error for the following samples (which are not all of my files, but a big part of them):
{'13', '34', '33', '12', '1', '38', '32', '17', '24', '15', '10', '18', '19', '35', '26', '30', '27', '22', '21', '29', '14', '37', '36', '16', '20', '11', '3', '28', '25', '31', '23', '2'}
However, when I tried with the first five samples ('13', '34', '33', '12', '1'), it worked just fine. So, I was wondering what could be the reason that they popped up as problematic when importing all of the files, but they were fine separately?
Because when the error specifies file IDs, I assume there should be a problem with all of the reported IDs.

thermokarst · October 30, 2019, 2:57pm

You could possibly have an "off-by-one" type of error, which basically cascades. Could you share your manifest file for us, maybe we can tune up the validation routine a bit to provide a cleaner error message. Thanks!

Parix · October 30, 2019, 5:58pm

Sure. Here is the manifest file.manifest.tsv (3.4 KB)

ben · October 31, 2019, 4:19am

Just curious, but were the runs/samples already quality filtered in some way? Ben

Parix · October 31, 2019, 10:10am

Yes, the sequencing center has already done some quality filtering (trimming, removal of adapters,...). However, I haven't had any problems with the reads of other cohorts that have been sequenced in the same center.

thermokarst · November 6, 2019, 11:38pm

Thanks @Parix --- manifest looks okay. Can you run this command in your raw/ directory?

for f in *.fq.gz; do r=$(( $(zcat $f | wc -l | tr -d '[:space:]') / 4 )); echo $r $f; done

This should give you something like the following:

11340 L1S105_9_L001_R1_001.fastq.gz
9738 L1S140_6_L001_R1_001.fastq.gz
11337 L1S208_10_L001_R1_001.fastq.gz
8216 L1S257_11_L001_R1_001.fastq.gz
8907 L1S281_5_L001_R1_001.fastq.gz
...

which is a count of records in each file. This will help us identify any issues (I hope!).