vsearch index error

After reading all of the help I could find (here and elsewhere), I can’t seem to figure out this vsearch index error.

I ran: qiime vsearch dereplicate-sequences --i-sequences seqs.qza --o-dereplicated-table table.qza --o-dereplicated-sequences rep-seqs.qza --verbose

and received:
Command: vsearch --derep_fulllength /tmp/qiime2-archive-zdqiqr2f/55fcdc9d-52b0-4b62-b47b-fe94e9f5bdbc/data/seqs.fna --output /tmp/q2-DNAFASTAFormat-732l_xsf --relabel_sha1 --relabel_keep --uc /tmp/tmp76vcgrpk --qmask none --xsize

vsearch v2.7.0_linux_x86_64, 3.9GB RAM, 2 cores

Reading file /tmp/qiime2-archive-zdqiqr2f/55fcdc9d-52b0-4b62-b47b-fe94e9f5bdbc/data/seqs.fna 100%
408249818 nt in 1630324 seqs, min 150, max 543, avg 250
Dereplicating 100%
Sorting 100%
507284 unique sequences, avg cluster 3.2, median 1, max 152400
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
Traceback (most recent call last):
File “/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py”, line 328, in call
results = action(**arguments)
File “</home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-129>”, line 2, in dereplicate_sequences
File “/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 240, in bound_callable
output_types, provenance)
File “/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 383, in callable_executor
output_views = self._callable(**view_args)
File “/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_vsearch/_cluster_sequences.py”, line 134, in dereplicate_sequences
table = _parse_uc(out_uc)
File “/home/qiime2/miniconda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_vsearch/_cluster_sequences.py”, line 70, in _parse_uc
observation_id = fields[9].split()[0]
IndexError: list index out of range

The first few lines of my input file are:

>Cd10_20CZN0000700524
TACGGTAGGGGCTAGCGTTATCCGGATTACTGGGCGTAAAGGGTGCGTAGGTGGTTTTTAAGTCAGAAGTGAAAGGCTACGGCTCAACCGTAGTAAGCTTTTGAAACTAGAGAACTTGAGTGCAGGAGAGGAGAGTAGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAATACCAGTAGCGAAGGCGCTCTCTGGACTGTAACTGACGCTGAGGCTCGGAAAAGCGT
>Cd10_20CZN0000700970
TACGTAGGTGGCGAGCGTTGTCCGGATTACTGGCGTAAAGGGGAGCGTAGGCGGATTTTAAGTGGATGTGAAATACCGGCTCAACCTGGGGTGCTGCATTCCAACTGGAATCTAGAGTGCAGGAGGGGGAGAGTGGATTCCTAGTGTAGCGGTGAATGCGTAGAGATTAGGAAGAACACCAGTGGCGAAGGCGACTCTCTGGACTGTAACTGACGCTGAGGCTCGAAGCGTGGGGGAGCGAACAGGATTAGATACCCCGT
>Cd10_20CZN0000900614
TACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGTGGTTTCTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAACTGGGAGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAATGCGTAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTCTGGACTGTAACTGACGTGAGGCGCGAAAGCGTGGGGAGCAAACG

Any help is appreciated as I move from qiime1 to qiime2. TIA
Russ

Welcome to the forum @Russell_Minton! Glad to hear you are making the transition from Q1 -> Q2 :smile:

This error has been reported a few times on the forum, and each time (so far) it seems like windows-style (CRLF) line breaks are the problem. VSEARCH isn't reading these correctly and blows a gasket.

See this topic for an example:

As a first step, use qiime tools validate seqs.qza to check the file.

If that does not turn up anything useful, check out the raw fasta file (i.e., before you import to QIIME 2). Use this command: file seqs.fasta and the output will indicate what type of line endings you have.

If you do have CRLF, use an appropriate command to convert CRLF -> LF line breaks.

Please let us know what you find and if that solves your issue.

1 Like

That appears to be it. I changed the line breaks to Unix and it ran smoothly.

1 Like