Curious if anyone has a quick suggestion on how to troubleshoot the following error. I'm trying to run rescript dereplicate
to cluster a set of sequences while retaining the best taxonomy strings:
qiime rescript dereplicate \
--i-sequences repSeqs.qza \
--i-taxa sklearn_tax.qza \
--p-mode 'super' \
--p-perc-identity 0.985 \
--p-threads 4 \
--p-derep-prefix \
--output-dir NB_p985
The initial VSEARCH clustering appears to run fine, but then the program crashes once it launches into the RESCRIPt section (think?). I get the following error message which is specific enough to tell me that I'm giving two bits of information when I should only be giving one... but I don't know what those two bits are!
Plugin error from rescript:
Wrong number of items passed 2, placement implies 1
Maybe @SoilRotifer has seen this kind of message before? The full error message is below.
Thanks very much for any advice you can offer!
(rescript_2020.6) dorourke@bio653:clustseqs$ qiime rescript dereplicate --i-sequences /home/dorourke/projects/coi_diet/paper3/qiime/select_libs/reads/NHsampleOnly_nobat
ASV_repSeqs.qza --i-taxa /home/dorourke/projects/coi_diet/paper3/qiime/select_libs/tax/nbayes/tmp.raw_bigDB_NBtax.qza --p-mode 'super' --p-perc-identity 0.985 --p-threa
ds 4 --p-rank-handles 'greengenes' --p-derep-prefix --output-dir naiveBayes_clust --verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --derep_prefix /tmp/qiime2-archive-taibf4fc/772fc70b-e23c-40eb-9509-27cdcbcb614e/data/dna-sequences.fasta --output /tmp/tmpz39bwqsp --uc /tmp/tmp9jo0o8
pt --qmask none --xsize --threads 4
vsearch v2.7.0_linux_x86_64, 251.9GB RAM, 16 cores
https://github.com/torognes/vsearch
Reading file /tmp/qiime2-archive-taibf4fc/772fc70b-e23c-40eb-9509-27cdcbcb614e/data/dna-sequences.fasta 100%
1659488 nt in 9134 seqs, min 181, max 200, avg 182
Sorting by length 100%
Dereplicating 100%
Sorting 100%
9127 unique sequences, avg cluster 1.0, median 1, max 2
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --cluster_size /tmp/tmpz39bwqsp --id 0.985 --centroids /tmp/q2-DNAFASTAFormat-ovbrxohu --uc /tmp/tmp9jo0o8pt --qmask none --xsize --threads 4
vsearch v2.7.0_linux_x86_64, 251.9GB RAM, 16 cores
https://github.com/torognes/vsearch
Reading file /tmp/tmpz39bwqsp 100%
1658210 nt in 9127 seqs, min 181, max 200, avg 182
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 5483 Size min 1, max 65, avg 1.7
Singletons: 3874, 42.4% of seqs, 70.7% of clusters
Traceback (most recent call last):
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Taxon'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1069, in set
loc = self.items.get_loc(item)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Taxon'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/q2cli/commands.py", line 329, in __call__
results = action(**arguments)
File "<decorator-gen-151>", line 2, in dereplicate
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in _callable_executor_
output_views = self._callable(**view_args)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/rescript/dereplicate.py", line 54, in dereplicate
taxa, sequences, clustered_seqs, uc, mode=mode)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/rescript/dereplicate.py", line 116, in _dereplicate_taxa
uc['Taxon'] = uc['seqID'].apply(lambda x: taxa.loc[x])
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/frame.py", line 3487, in __setitem__
self._set_item(key, value)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/frame.py", line 3565, in _set_item
NDFrame._set_item(self, key, value)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/generic.py", line 3381, in _set_item
self._data.set(key, value)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1072, in set
self.insert(len(self.items), item, value)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1181, in insert
block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 3284, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 2792, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "/home/dorourke/miniconda/envs/rescript_2020.6/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 128, in __init__
"{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 2, placement implies 1