Hey all, im trying to test my pipeline on published results regarding fungal ITS.
My pipeline trims with itsXpress after importing and then denoising with dada2.
this pipeline served me well in my mock community analysis and also on our own large data set.
But, when i try to run this pipeline on another data set the itsXpress crashes.
The main difference that i could think of is the way i imported the sequences to qiime.
the mocks and my own data set were imported as: Casava 1.8 paired-end demultiplexed fastq
The published sequences were downloaded as sra via the sra tool kit from ncbi.
then i converted them to fastaq via (fastaq-dump also in the sra tool kit).
i prepared a manifest file (54 paired end samples) exactly as in the importing tutorial.
manifest4.csv (7.8 KB)
import was successful with Phred33 (tested just in case with Phred64 - got out of range error).
after importing i checked the qzv file and it was great!.
i ran the itsxpress trimming command:
(qiime2-2018.11) bash-4.2$ qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences paired-end-demux.qza --p-region ITS2 --p-taxa F --p-threads 20 --verbose --o-trimmed trimmed_exact2.qza
Plugin error from itsxpress:
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_kuop981q/seq.fq.gz 100%
1444 nt in 5 seqs, min 284, max 290, avg 289
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 1, avg 1.0
Singletons: 5, 100.0% of seqs, 100.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_c_6qy3_6/seq.fq.gz 100%
1152 nt in 4 seqs, min 282, max 290, avg 288
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 4 Size min 1, max 1, avg 1.0
Singletons: 4, 100.0% of seqs, 100.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_ed2b71fw/seq.fq.gz 100%
469 nt in 2 seqs, min 187, max 282, avg 234
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 2 Size min 1, max 1, avg 1.0
Singletons: 2, 100.0% of seqs, 100.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_gs_aovye/seq.fq.gz 100%
1335 nt in 5 seqs, min 183, max 290, avg 267
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5 Size min 1, max 1, avg 1.0
Singletons: 5, 100.0% of seqs, 100.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_lzwa8c43/seq.fq.gz 100%
1160 nt in 4 seqs, min 290, max 290, avg 290
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 4 Size min 1, max 1, avg 1.0
Singletons: 4, 100.0% of seqs, 100.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Reading file /tmp/itsxpress_xln3x3wx/seq.fq.gz 100%
30530 nt in 106 seqs, min 288, max 290, avg 288
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 20 Size min 1, max 87, avg 5.3
Singletons: 19, 17.9% of seqs, 95.0% of clusters
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Fatal error: File too small
ERROR:root:Could not perform clustering with Vsearch. Error from Vsearch was:
vsearch v2.7.0_linux_x86_64, 376.4GB RAM, 72 cores
Fatal error: File too small
Traceback (most recent call last):
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
p2.check_returncode()
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1
Traceback (most recent call last):
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in trim_pair_output_unmerged
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 242, in trim_pair_output_unmerged
cluster_id=cluster_id)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 301, in main
sobj.cluster(threads=threads, cluster_id=cluster_id)
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 560, in cluster
raise e
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
p2.check_returncode()
File "/home/arnonm/miniconda2/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1
Plugin error from itsxpress:
Command '['vsearch', '--cluster_size', '/tmp/itsxpress_k3_dis_u/seq.fq.gz', '--centroids', '/tmp/itsxpress_k3_dis_u/rep.fa', '--uc', '/tmp/itsxpress_k3_dis_u/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1
See above for debug info.
i tried to search for similar errors in the forum, and it seems this bug sometimes relates to insufficient memory. i tried importing only half the data and then trimming and nothing changed.
any ideas would be highly appreciated