Vsearch join-pairs uneven # of reads error

Hello,

I am trying to merge reads for 943 samples and I keep getting the same error with some fastqs that I have concatenated as some samples were sequenced more than once:

Fatal error: More reverse reads than forward reads
Traceback (most recent call last):
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in join_pairs
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 232, in bound_callable
output_types, provenance)
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 367, in callable_executor
output_views = self._callable(**view_args)
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py”, line 57, in join_pairs
qmax, qmaxout)
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py”, line 141, in _join_pairs_w_command_output
run_command(cmd)
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py”, line 33, in run_command
subprocess.run(cmd, check=True)
File “/Users/mteachey/miniconda3/envs/qiime2-2018.6/lib/python3.5/subprocess.py”, line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[‘vsearch’, ‘–fastq_mergepairs’, ‘/var/folders/v4/06g73zxx0_3315vbt6xc4869ch0sww/T/qiime2-archive-dhiivbem/dff142a2-6b76-42f1-acec-c7fd4f97f2f8/data/SP13-MIDO806_54_L001_R1_001.fastq.gz’, ‘–reverse’, ‘/var/folders/v4/06g73zxx0_3315vbt6xc4869ch0sww/T/qiime2-archive-dhiivbem/dff142a2-6b76-42f1-acec-c7fd4f97f2f8/data/SP13-MIDO806_55_L001_R2_001.fastq.gz’, ‘–fastqout’, ‘/var/folders/v4/06g73zxx0_3315vbt6xc4869ch0sww/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-gdul6_qr/SP13-MIDO806_256_L001_R1_001.fastq’, ‘–fastq_ascii’, ‘33’, ‘–fastq_minlen’, ‘1’, ‘–fastq_minovlen’, ‘10’, ‘–fastq_maxdiffs’, ‘10’, ‘–fastq_qmin’, ‘0’, ‘–fastq_qminout’, ‘0’, ‘–fastq_qmax’, ‘41’, ‘–fastq_qmaxout’, ‘41’, ‘–fastq_allowmergestagger’]’ returned non-zero exit status 1

I have tried to reconcile the apparent differences in sequence reads in the following ways:

  1. re-concatenating the files and replacing the old ones (this didn’t work)
  2. removing the trouble sample entirely (this worked until it hit another sample that it deemed had different numbers of reads)

I checked the file size of the sample in the error above and R1 was 88.2 MB and R2 was 88.5 MB. While there is a difference in size, other samples for which multiple fastqs were merged and of slightly different sizes passed through fine. For example, a sample from the same collection date and Miseq run with an R1 of 123.2 MB and an R2 of 123.6 MB merged at 100%. Can someone help me figure out how to resolve this without removing each sample that fails in this way, as I have so many samples to deal with?

Thanks in advance for the help!

Hello Morgan,

Thanks for posting your full log file. I have also received this error before, but I would love to know how to filter out the paired reads that are missing their other pair.

I tried using tools from the bbmap software collection, but did have any luck. I think either reformat.sh or repair.sh could do this, but I couldn’t get it to work.

Let’s see what @ebolyen recommends.

Colin

1 Like

Hi again,

I’m still stuck on this issue and wanted to see if @ebolyen had been able to make any head way on it. It’s been a little bit so I wanted to make sure it didn’t get lose in the chaos.

Thanks!

@ebolyen has been working at the STAMPS 2018 class the last week or so, and is OOO the rest of this week - keep your eyes peeled for a reply some time next week. Thanks for your patience @morgan_t!

I’m back! I’ll be looking into this shortly, sorry for the delay @morgan_t!

Great, thanks so much, @ebolyen! Wanted to give you a quick update: I checked to see if there were actually more forward reads than reverse in the sample that was generating the error and there are not ¯_(ツ)_/¯

Hey @morgan_t,

I wonder if something strange has happened between records (perhaps an extra newline). What does running wc -l on your data produce?

For example for a given pair, does running the following give you the same results?

zcat SP13-MIDO806_54_L001_R1_001.fastq.gz | wc -l
zcat SP13-MIDO806_54_L001_R2_001.fastq.gz | wc -l

How exactly did you concatenate the files?

1 Like

Hey,

So I tried that and both R1 and R2 have 630324 lines.

The way I merged the files just by using cat:
cat MIDO806A-SP13-S14-L001-R1-001.fastq >> MIDO806-SP13-S13-L001-R1-001.fastq.

Then I ran it again just replacing R1 with R2 so that everything’s in the same order.

These same files were previously merged in the same way and then run successfully through mothur. So I’m not sure what’s going on but in the event that we can’t find a fix, are there any external programs you would recommend to merge my reads that integrate back into QIIME2 easily and work well with deblur?

Thanks again for the help with this!

A safer way to join multiple files is to do the following:

cat file1 file2 > file3

The method you used changes one of the files permanently and so if for some reason the command ran twice it would double up the number of reads. I suppose that isn’t really consistent with your file-sizes, so that’s probably not whats happening, but I wanted to show you this way for future use :slight_smile:


As for what’s really going on, I’m a bit stumped. Could you provide an ls or screenshot of your directory that you are importing? Also how are you importing this data?

Once I know that I’ll send you a little bash command to test some other things…

Thanks for the tip! When I re-merged these files, I used new ones that I downloaded again to avoid that. I checked these new merged files using mothur's make.contigs command and had no problems.

However, I took that one set of files out and it ran fine. Still doesn't make sense but I'm happy it worked.

Here's a screen shot of the file that my fastqs are in:

Hey @morgan_t,

Sorry for not noticing this earlier, but your files are using - instead of _. Are you using a fastq manifest file to import? If so, could you attach, I wonder if something just got transposed in there…

In any case!

Here is a script to run inside the extracted artifact:

qiime tools export yourdemux.qza --output-dir testing-dir/
cd testing-dir/
for file in *_R1_001.fastq.gz; do f=${file%_R1_001.fastq.gz}; FWD=$(zcat "$f"_R1_001.fastq.gz | wc -l); REV=$(zcat "$f"_R2_001.fastq.gz | wc -l); [[ $FWD -ne $REV ]] && echo Problem with $f, R1 has $FWD lines, but R2 has $REV; done

That will either silently complete, or explain the problematic samples. If its silent, then there’s something else happening disguised as a mismatch.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.