If there's one thing I'm learning, it's that I can always find a new way to create an error message. I recently discovered a situation where I had some paired end samples I was importing (as manifest file) and trimming (with cutadapt plugin) and then merging (with vsearch plugin). When making the manifest file, I typically run a command on the directory of fastq files to check that there is some data in each .fastq file:
find . -type f -name "*.gz" -empty -delete
For whatever reason, this failed to recognize that several of my fastq files apparently had zero reads. At first, this didn't seem to matter (and I didn't notice because I didn't get any error!). The import function worked without any error being flagged. Then cutadapt worked without issue. But interestingly, vsearch fails when it tries to merge an empty file.
Command: vsearch --fastq_mergepairs /tmp/qiime2-archive-s1v74pfi/53390f42-dd28-451b-9205-9c77e4dbf7b2/data/nau1304194_264_L001_R1_001.fastq.gz --reverse /tmp/qiime2-archive-s1v74pfi/53390f42-dd28-451b-9205-9c77e4dbf7b2/data/nau1304194_265_L001_R2_001.fastq.gz --fastqout /tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-1_6mf45b/nau1304194_129_L001_R1_001.fastq --fastq_ascii 33 --fastq_minlen 1 --fastq_minovlen 10 --fastq_maxdiffs 10 --fastq_qmin 0 --fastq_qminout 0 --fastq_qmax 41 --fastq_qmaxout 41 --fastq_allowmergestagger
vsearch v2.7.0_linux_x86_64, 251.7GB RAM, 24 cores
https://github.com/torognes/vsearch
Fatal error: File too small
Traceback (most recent call last):
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
results = action(**arguments)
File "<decorator-gen-130>", line 2, in join_pairs
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
output_views = self._callable(**view_args)
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 57, in join_pairs
qmax, qmaxout)
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_vsearch/_join_pairs.py", line 141, in _join_pairs_w_command_output
run_command(cmd)
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/mnt/lustre/macmaneslab/devon/.conda/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/tmp/qiime2-archive-s1v74pfi/53390f42-dd28-451b-9205-9c77e4dbf7b2/data/nau1304194_264_L001_R1_001.fastq.gz', '--reverse', '/tmp/qiime2-archive-s1v74pfi/53390f42-dd28-451b-9205-9c77e4dbf7b2/data/nau1304194_265_L001_R2_001.fastq.gz', '--fastqout', '/tmp/q2-SingleLanePerSampleSingleEndFastqDirFmt-1_6mf45b/nau1304194_129_L001_R1_001.fastq', '--fastq_ascii', '33', '--fastq_minlen', '1', '--fastq_minovlen', '10', '--fastq_maxdiffs', '10', '--fastq_qmin', '0', '--fastq_qminout', '0', '--fastq_qmax', '41', '--fastq_qmaxout', '41', '--fastq_allowmergestagger']' returned non-zero exit status 1
This got me to wondering a couple of things:
- Any idea why the
find
command isn't flagging these empty files? I ended up adding on a-size
parameter to ensure it ditches the empty files, but can't figure out why the bash shell isn't thinking these files are empty. - I know we can filter sequences and features in QIIME, but is there a way to remove samples from the initial
demux.seqs.qza
artifact I generate from the manifest importing step? That's aSampleData[PairedEndSequencesWithQuality]
type... It would be great to have the same basic feature where you can just drop entire samples from that file if there was a way to count the number of sequences for each sample. - Because our compute cluster uses SLURM, I end up generating my
.log
files via the program'sSBATCH --output=
command. Perhaps it's just the nature of our cluster, but I can't ever access the temporary files it generates in the .log to look at. If I run the same command on the head node, then I capture the message and can access the /tmp/pathtomessage to see what really went wrong. Any idea if this is a SLURM thing, or something about the setup of our compute cluster?
Cheers!