Hi,
I'm trying to run through the silva example of RESCRIPt (Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt) and am encountering an error at the dereplication step. I'm currently using QIIME2 2022.11 and installed RESCRIPt via pip as suggested:
pip install git+https://github.com/bokulich-lab/RESCRIPt.git
Here are the commands I ran:
qiime rescript get-silva-data \
--p-version '138.1' \
--p-target 'SSURef_NR99' \
--p-include-species-labels \
--o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza \
--o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza
qiime rescript reverse-transcribe \
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza
qiime rescript cull-seqs \
--i-sequences silva-138.1-ssu-nr99-seqs.qza \
--o-clean-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
qiime rescript filter-seqs-length-by-taxon \
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza \
--i-taxonomy silva-138.1-ssu-nr99-tax.qza \
--p-labels Archaea Bacteria Eukaryota \
--p-min-lens 900 1200 1400 \
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza \
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza
qiime rescript dereplicate \
--i-sequences silva-138.1-ssu-nr99-seqs-filt.qza \
--i-taxa silva-138.1-ssu-nr99-tax.qza \
--p-rank-handles 'silva' \
--p-mode 'uniq' \
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza \
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza
The dereplicate command produced the following error:
$ cat /scratch/rlampe/30545035.tscc-mgr7.local/qiime2-q2cli-err-tij6a2zb.log
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
Command: vsearch --derep_fulllength /scratch/rlampe/30545035.tscc-mgr7.local/qiime2/rlampe/data/dab83010-1036-49df-9f40-b75d7c4d9da7/data/dna-sequences.fasta --output /scratch/rlampe/30545035.tscc-mgr7.local/tmpcfgsrr46 --uc /scratch/rlampe/30545035.tscc-mgr7.local/tmp00vq71bj --xsize --threads 1
vsearch v2.22.1_linux_x86_64, 1007.2GB RAM, 64 cores
https://github.com/torognes/vsearch
Dereplicating file /scratch/rlampe/30545035.tscc-mgr7.local/qiime2/rlampe/data/dab83010-1036-49df-9f40-b75d7c4d9da7/data/dna-sequences.fasta 100%
699161179 nt in 477562 seqs, min 900, max 3983, avg 1464
Sorting 100%
435502 unique sequences, avg cluster 1.1, median 1, max 893
Writing FASTA output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/rescript/dereplicate.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
uc['Taxon'] = uc['seqID'].apply(lambda x: taxa.loc[x])
/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/rescript/dereplicate.py:116: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
uc['centroidtaxa'] = uc['centroidID'].apply(lambda x: taxa.loc[x])
Traceback (most recent call last):
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
results = action(**arguments)
File "<decorator-gen-490>", line 2, in dereplicate
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 408, in _callable_executor_
artifact = qiime2.sdk.Artifact._from_view(
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/result.py", line 356, in _from_view
artifact._archiver = archive.Archiver.from_data(
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 408, in from_data
Format.write(rec, type, format, data_initializer,
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/format/v5.py", line 20, in write
super().write(archive_record, type, format, data_initializer,
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/format/v1.py", line 25, in write
provenance_capture.finalize(
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 320, in finalize
self.write_citations_bib()
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 311, in write_citations_bib
self.citations.save(str(self.path / self.CITATION_FILE))
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/cite.py", line 71, in save
bp.dump(db, f, writer=writer)
File "/projects/ps-allenlab/rlampe/bin/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/bibtexparser/__init__.py", line 108, in dump
bibtex_file.write(writer.write(bib_database))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0161' in position 2884: ordinal not in range(256)
Thanks in advance!
Rob