Demux plugin error

Hi all,

I've been struggling with a colleague's dataset and I'm hoping for some help! I imported the files (cassava format) and ran qiime vsearch join-pairs --i-demultiplexed-seqs demux-paired-end.qza --o-joined-sequences demux-joined.qza --verbose per this tutorial.

I got the following error:

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --fastq_mergepairs /var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R1_001.fastq.gz --reverse /var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R2_001.fastq.gz --fastqout /var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-l0gz9m84/LBSK6_0_L001_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout 41

vsearch v2.6.0_macos_x86_64, 8.0GB RAM, 4 cores

Merging reads 86%

Fatal error: Invalid line 85257600 in FASTQ file: Sequence and quality lines must be equally long
Traceback (most recent call last):
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py", line 224, in call
results = action(**arguments)
File "", line 2, in join_pairs
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 57, in join_pairs
qmax, qmaxout)
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 141, in _join_pairs_w_command_output
run_command(cmd)
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/Users/carolbucking/miniconda3/envs/qiime2-2017.12/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R1_001.fastq.gz', '--reverse', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R2_001.fastq.gz', '--fastqout', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-l0gz9m84/LBSK6_0_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '0', '--fastq_qmax', '41', '--fastq_qmaxout', '41']' returned non-zero exit status 1

Plugin error from vsearch:

Command '['vsearch', '--fastq_mergepairs', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R1_001.fastq.gz', '--reverse', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/qiime2-archive-m04zyy6o/4a0e158f-7ad9-4ce5-a8d1-8f63b75d4fcd/data/LBSK6_04_L001_R2_001.fastq.gz', '--fastqout', '/var/folders/2f/wntdzmgd7vn5s7zcnkdqv36r0000gn/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-l0gz9m84/LBSK6_0_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '0', '--fastq_qmax', '41', '--fastq_qmaxout', '41']' returned non-zero exit status 1

See above for debug info.

Any help would be greatly appreciated! I ran this same code on my data from the same service provider and it ran without issue. The colleague's data is from HiSeq but mine was from MiSeq - could this be the source of the error?

Many thanks!

Leah

This is a vsearch error message, so please take my assessment with a grain of salt, but I think that means that vsearch ran into a fastq record that has a different number of quality scores in the record compared to nucleotides. An example, where there are more nts than quality scores:

@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>

Dissecting the error message a bit, it looks like the bad record is in the pair LBSK6_04_L001_R1_001.fastq.gz/LBSK6_04_L001_R2_001.fastq.gz, maybe around line 85257600 (it isn’t clear to me if it is the fwd or rev that is a problem). I would crack open those files and take a peek at that line number (and the surround lines, too).

I don’t think so, it sounds like this is just related to a malformed fastq record (see above).

Keep us posted and let us know how it goes! :t_rex:

Thanks for your help! I tried extracting the file to have a look and I keep getting an error message. I’m going to download the fastq.gz file for that sample again and see if I have any luck!

I’ll let you know how it goes! :grin:

Hmm, strange - how did you try extracting it? Did you see the exporting tutorial? Feel free to post the commands you ran and the errors you saw, we can give you a hand. Thanks! :t_rex:

I just right clicked and chose “extract” within the file explorer in virtualbox- was that wrong? The error message is “An error occurred while extracting files” - not very informative I’m afraid!

I looked at the exporting tutorial but I’m not sure how to apply it to this particular data. I’m a beginner so I’m still trying to get a handle on qiime : )

You can export any kind of QIIME 2 data, the examples on that page are some of the common ones we see come up often, but you can pass any QZA into the export command. Give it a shot! :skier:

Sorry it took me so long to reply to your last message! I was able to use the command, but I'll be completely honest, I have no idea what I'm looking for! I downloaded the file again, and tried the same command but got a different result! Looking at the readout, it was able to merge the reads (where previously it stalled out at 86%) but then it looks like it wasn't able to merge all of them!

To refresh, this is the original command:

qiime vsearch join-pairs --i-demultiplexed-seqs demux-paired-end.qza --o-joined-sequences demux-joined.qza --verbose

And here is what I got:

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --fastq_mergepairs /tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSK6_04_L001_R1_001.fastq.gz --reverse /tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSK6_04_L001_R2_001.fastq.gz --fastqout /tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-mlobyng3/LBSK6_0_L001_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout 41

vsearch v2.6.0_linux_x86_64, 5.2GB RAM, 2 cores

Merging reads 100%
57672470 Pairs
6425970 Merged (11.1%)
51246500 Not merged (88.9%)

Pairs that failed merging due to various reasons:
31523566 too few kmers found on same diagonal
8296825 potential tandem repeat
9413 too many differences
10950844 alignment score too low, or score drop to high
180694 overlap too short
285158 staggered read pairs

Statistics of merged reads:
163.67 Mean fragment length
21.00 Standard deviation of fragment length
0.17 Mean expected error in forward sequences
0.19 Mean expected error in reverse sequences
0.23 Mean expected error in merged sequences
0.11 Mean observed errors in merged region of forward sequences
0.10 Mean observed errors in merged region of reverse sequences
0.21 Mean observed errors in merged region
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: gzip /tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-mlobyng3/LBSK6_0_L001_R1_001.fastq

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --fastq_mergepairs /tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_01_L001_R1_001.fastq.gz --reverse /tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_001_L001_R2_001.fastq.gz --fastqout /tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-mlobyng3/LBSM4_1_L001_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout 41

vsearch v2.6.0_linux_x86_64, 5.2GB RAM, 2 cores

Merging reads 100%

Fatal error: Invalid line 58147546 in FASTQ file: Unexpected end of file
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/commands.py", line 224, in call
results = action(**arguments)
File "", line 2, in join_pairs
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 57, in join_pairs
qmax, qmaxout)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 141, in _join_pairs_w_command_output
run_command(cmd)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_01_L001_R1_001.fastq.gz', '--reverse', '/tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_001_L001_R2_001.fastq.gz', '--fastqout', '/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-mlobyng3/LBSM4_1_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '0', '--fastq_qmax', '41', '--fastq_qmaxout', '41']' returned non-zero exit status 1

Plugin error from vsearch:

Command '['vsearch', '--fastq_mergepairs', '/tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_01_L001_R1_001.fastq.gz', '--reverse', '/tmp/qiime2-archive-pwpkvasv/55ab4b46-d21d-4967-b3c3-692ad7eaec6e/data/LBSM4_001_L001_R2_001.fastq.gz', '--fastqout', '/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-mlobyng3/LBSM4_1_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '0', '--fastq_qmax', '41', '--fastq_qmaxout', '41']' returned non-zero exit status 1

See above for debug info.

I'm not sure if this is a related error, or a whole new error but I would really appreciate your input!

Thanks so much :grinning:

Leah

Thanks for the follow-up @leahtee! I feel bad, I completely forget about this handy “validation” tool that we recently added to QIIME 2. Can you please run the following, and provide the output here?

qiime tools validate demux-paired-end.qza

This will perform a thorough validation of these data, and should be able to point you at which sample/direction/read is funky.

:crossed_fingers: :pray: :t_rex:

Done and done @thermokarst!

Please see result below - I still keep getting the “compressed file ended…” message, so maybe it is my files after all?

Thanks again :slight_smile:

Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2cli/tools.py”, line 271, in validate
artifact.validate(level)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 283, in validate
self.format.validate(self.view(self.format), level)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py”, line 171, in validate
getattr(self, field)._validate_members(collected_paths, level)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py”, line 101, in _validate_members
self.format(path, mode=‘r’).validate(level)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py”, line 24, in validate
self.validate(level)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py”, line 159, in validate
self._check_n_records(record_count_map[level])
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py”, line 119, in check_n_records
for i, record in file
:
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/gzip.py”, line 287, in read1
return self._buffer.read1(size)
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/_compression.py”, line 68, in readinto
data = self.read(len(byte_view))
File “/home/qiime2/miniconda/envs/qiime2-2017.12/lib/python3.5/gzip.py”, line 480, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

An unexpected error has occurred while attempting to validate artifact demux-paired-end-all.qza:

Compressed file ended before the end-of-stream marker was reached

See above for debug info.

Hi @leahtee!

The error message is making me lean towards there being some kind of issue with your source data. We have seen two different error messages related to these data in this thread:

  • Invalid sequence records (the number of nts in the read was different than the number of quality scores)
  • Invalid gzip files (the EOFError: Compressed file ended error you just reported)

It might be worth it to make sure that the source data is squared away — maybe refetch it from the FTP server / DVD / resource that you originally grabbed it from? Then, re-import and try again in Q2.

As well, ensuring that md5sums match is a good first stab at making sure that two copies of files match (so, if your collaborator knows these data to be okay, they could send you md5sums of these files, you can verify the md5sums locally to ensure you have a byte-for-byte copy on hand).

Keep us posted! :t_rex:

Great news @thermokarst! I re-downloaded the data from the service provider and all seems well (for now). I ran the qiime tools validate and it’s looking good. I really appreciate all your help!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.