Q2-ITSxpress File too small?

Hey all, im trying to test my pipeline on published results regarding fungal ITS.
My pipeline trims with itsXpress after importing and then denoising with dada2.
this pipeline served me well in my mock community analysis and also on our own large data set.
But, when i try to run this pipeline on another data set the itsXpress crashes.

The main difference that i could think of is the way i imported the sequences to qiime.
the mocks and my own data set were imported as: Casava 1.8 paired-end demultiplexed fastq

The published sequences were downloaded as sra via the sra tool kit from ncbi.
then i converted them to fastaq via (fastaq-dump also in the sra tool kit).
i prepared a manifest file (54 paired end samples) exactly as in the importing tutorial.
manifest4.csv (7.8 KB)

import was successful with Phred33 (tested just in case with Phred64 - got out of range error).
after importing i checked the qzv file and it was great!.

i ran the itsxpress trimming command:
(qiime2-2018.11) bash-4.2$ qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences paired-end-demux.qza --p-region ITS2 --p-taxa F --p-threads 20 --verbose --o-trimmed trimmed_exact2.qza

Plugin error from itsxpress:

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_kuop981q/seq.fq.gz 100%
1444 nt in 5 seqs, min 284, max 290, avg 289
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 1, avg 1.0
Singletons: 5, 100.0% of seqs, 100.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_c_6qy3_6/seq.fq.gz 100%
1152 nt in 4 seqs, min 282, max 290, avg 288
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 4 Size min 1, max 1, avg 1.0
Singletons: 4, 100.0% of seqs, 100.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_ed2b71fw/seq.fq.gz 100%
469 nt in 2 seqs, min 187, max 282, avg 234
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 2 Size min 1, max 1, avg 1.0
Singletons: 2, 100.0% of seqs, 100.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_gs_aovye/seq.fq.gz 100%
1335 nt in 5 seqs, min 183, max 290, avg 267
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 1, avg 1.0
Singletons: 5, 100.0% of seqs, 100.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_lzwa8c43/seq.fq.gz 100%
1160 nt in 4 seqs, min 290, max 290, avg 290
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 4 Size min 1, max 1, avg 1.0
Singletons: 4, 100.0% of seqs, 100.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Reading file /tmp/itsxpress_xln3x3wx/seq.fq.gz 100%
30530 nt in 106 seqs, min 288, max 290, avg 288
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 20 Size min 1, max 87, avg 5.3
Singletons: 19, 17.9% of seqs, 95.0% of clusters

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Fatal error: File too small

ERROR:root:Could not perform clustering with Vsearch. Error from Vsearch was:
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Fatal error: File too small
Traceback (most recent call last):
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
p2.check_returncode()
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1
Traceback (most recent call last):
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in trim_pair_output_unmerged
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 242, in trim_pair_output_unmerged
cluster_id=cluster_id)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 301, in main
sobj.cluster(threads=threads, cluster_id=cluster_id)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 560, in cluster
raise e
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
p2.check_returncode()
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1

Plugin error from itsxpress:

Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1

See above for debug info.

i tried to search for similar errors in the forum, and it seems this bug sometimes relates to insufficient memory. i tried importing only half the data and then trimming and nothing changed.

any ideas would be highly appreciated

Hi! Did you try to use less amount of threads?

yes, i tried 100,20,5 :frowning:
@Adam_Rivers maybe you can help?

Here's the main error:

vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores

Fatal error: File too small

And that error came from this vsearch command:

‘[‘vsearch’, ‘–cluster_size’, ‘/tmp/itsxpress_k3_dis_u/seq.fq.gz’,

This makes me think that one of the files was too small or something got messed up in a previous step causing this file to be empty.

Can you check on that file for me and let us know what you find?

Colin

this is the qzv after importing. demux.qzv (289.4 KB)
the min sequence count there is 188k, i assume it means the importing process went well.

this command:
‘[‘vsearch’, ‘–cluster_size’, ‘/tmp/itsxpress_k3_dis_u/seq.fq.gz’,

is a mystery to me, i have no file with that name or anything like it.

Check on this file specifically (before importing): itsxpress_k3_dis_u/seq.fq.gz

It sounds like that file may not have any sequences?

cc: @Adam_Rivers

ITSxress first merges the reads then clusters them with Vsearch, If you have a high number of input reads but you didn't have any reads left for clustering then it means your reads did not merge successfully. I've seen this happen for a couple of reasons. 1) The sample names in the manifest file pair the wrong files together. 2) The sequence quality is too poor to merge with BBmerge, which by default has more striengent merging cutoffs than Pear. the name seq.fq.gz is the name of the temporary file that Vsearch tries to merge.

Looking at the output it appears that almost all of your sequences have hardly any reads that merged ( n=5, 4, 2, 5, 4, 106, 0).

You can verify by running bbmerge directly on your read pairs:
bbmerge in=seq.R1.fastq.gz in2=seq.R2.fastq.gz
That will dump out the merging statistics for your reads

3 Likes

@Adam_Rivers i find a minor issue with my manifest file that might explain the low % of merged reads.
i will update asap

The process runs without any errors. but, the output is a 42kb file. (i started with 15gb)
The manifest looks ok, so i dont understand why there's still problem with merging.
any suggestions to test the new problem?
@Adam_Rivers

Hello Arnon,

I'm glad you are trying to merge by running the underlying programs directly, as this gives you additional settings to modify. For example, if you merge with vsearch, you can try passing the --fastq_allowmergestagger flag, which does this:

allow to merge staggered read pairs. Staggered pairs are pairs where the 3’ end of the reverse read has an overhang to the left of the 5’ end of the forward read. This situation can occur when a very short fragment is sequenced.

BBmerge might have something similar. It also might have a setting that allows more mismatches in the area of overlap.

Keep playing with those settings and see what you find!

Colin

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.