Vsearch dereplicate seqs - list index out of range

Hello qiime2 users and team,

I am relatively new to this programme and after going through the main tutorials I tried to use qiime2 (2018.8) to import a big FASTA and create a feature table with vsearch dereplicate (since, as far as I know, dada2 and deblur are not an option for a FASTA without quality).

All good importing the file, but when I tried to dereplicate I incurred in this error which I don't know how to interpret: IndexError: list index out of range - I searched for similar error in this forum, but found none. Sorry if it's a duplicate.

Here's the code I used:

> qiime vsearch dereplicate-sequences\
>  --i-sequences CoDL-CH0001.qza\
>  --o-dereplicated-sequences CoDL-CH0001-derep.qza\
>  --o-dereplicated-tab\
>  --verbose

And this is the full output:

> Running external command line application. This may print messages to stdout and/or stderr.                                                                                                                        The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.                                                                                                                                                                                                                                                                                                     Command: vsearch --derep_fullength /tmp/qiime2-archive-063t149_/cacfdd7b-9595-431e-9dcf-e800bef4d9d7/data/seqs.fna --outpu`Preformatted text`t /tmp/q2-DNAFASTAFormat-ktw_h99w --relabel_sha1 --relabel_keep --uc /tmp/tmpygkkud8x --qmask none --xsize                                                                                                                                                                                                                                                                                                                                                                                                                    vsearch v2.7.0_linux_x86_64, 7.7GB RAM, 4 cores                                                                                                                                                                    https://github.com/torognes/vsearch                                                                                                                                                                                                                                                                                                                                                                                                   Reading file /tmp/qiime2-archive-063t149_/cacfdd7b-9595-431e-9dcf-e800bef4d9d7/data/seqs.fna 100%                                                                                                                  58586982 nt in 156872 seqs, min 356, max 440, avg 373                                                                                                                                                              Dereplicating 100%                                                                                                                                                                                                 Sorting 100%                                                                                                                                                                                                       54354 unique sequences, avg cluster 2.9, median 1, max 15423                                                                                                                                                       Writing output file 100%                                                                                                                                                                                           Writing uc file, first part 100%                                                                                                                                                                                   Writing uc file, second part 100%                                                                                                                                                                                  Traceback (most recent call last):                                                                                                                                                                                   File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__                                                                                       results = action(**arguments)                                                                                                                                                                                    File "<decorator-gen-128>", line 2, in dereplicate_sequences                                                                                                                                                       File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable                                                                              output_types, provenance)                                                                                                                                                                                        File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_                                                                         output_views = self._callable(**view_args)                                                                                                                                                                       File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 134, in dereplicate_sequences                                                           table = _parse_uc(out_uc)                                                                                                                                                                                        File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 70, in _parse_uc                                                                        observation_id = fields[9].split()[0]                                                                                                                                                                          IndexError: list index out of range

Any idea of what may be going on here?

Thank you all,

Giacomo

Hey there @Giacomo!

I suspect something might be off with the FASTA ID header format - can you copy and paste the first few lines? :qiime2: :t_rex:

Hi @thermokarst,

Thanks for the quick answer. Here are the first 2 lines. The .FASTA is formatted so that each sequence is on a single line, not sure if the forum is splitting it.

11607782|DCO_HAN_Bv4v5--CH_0021|1
ACGGGGGGAGCAAGCGTTGTTCGGATTTACTGGGCGTAAAGGGCGTCTAGGCGGACCAGCAAGTCAGATGTGAAATCCCACGGCTCAACCGTGGAACTGCATTTGAAACTGCTGGTATTGAGTATGGAAGAGGAAAGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAAGAACACCGGTGGCGAAGGCGGCTTTCTGGTCCAATACTGACGCTAAAGCGCGAAAGTGTGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCACACTGTAAACGATGGATACTTGGTGTCGGGGATCCGACCTCTTCGGTGCCGAAGCTAACGCATTAAGTATCCCGCCTGGGGAGTACGATCGCAAGGTTGAA
11608132|DCO_HAN_Bv4v5--CH_0021|1
ACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGTAAGACAGTTGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGTGACTGCATAGCTAGAGTACGGTAGAGGGGGATGGAATTCCGTGTGTAGCAGTGAAATGCGTAGATATGCGGAGGAACACCGATGGCGAAGGCAATCCCCTGGACCTGTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCAACTGGTTGTTGGGTCTTCACTGACTCAGTAACGAAGCTAACGCGTGAAGTTGACCGCCTGGGGAGTACGGCCGCAAGGTTGAA

PS: sorry I realised the verbose output is very difficult to read as I quoted it. Hope this looks better

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist. Command: vsearch --derep_fulllength /tmp/qiime2-archive-j05q6si0/cacfdd7b-9595-431e-9dcf-e800bef4d9d7/data/seqs.fna --output /tmp/q2-DNAFASTAFormat-vn7444rl qmask none --xsize vsearch v2.7.0_linux_x86_64, 7.7GB RAM, 4 cores GitHub - torognes/vsearch: Versatile open-source tool for microbiome analysis Reading file /tmp/qiime2-archive-j05q6si0/cacfdd7b-9595-431e-9dcf-e800bef4d9d7/data/seqs.fna 100% 58586982 nt in 156872 seqs, min 356, max 440, avg 373 Dereplicating 100% Sorting 100% 54354 unique sequences, avg cluster 2.9, median 1, max 15423 Writing output file 100% Writing uc file, first part 100% Writing uc file, second part 100% Traceback (most recent call last): File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call results = action(**arguments) File "", line 2, in dereplicate_sequences File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable output_types, provenance) File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor output_views = self._callable(**view_args) File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 134, in dereplicate_sequences table = _parse_uc(out_uc) File "/home/giacomo_vitali/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_vsearch/_cluster_sequences.py", line 70, in _parse_uc observation_id = fields[9].split()[0] IndexError: list index out of range Plugin error from vsearch: list index out of range See above for debug info.

Giacomo

Hi again,

I solved the problem by simply replacing each "|" character in the sequence ID with a "-". Worked fine, I would't have gotten there without the tip of the header ID, so thanks a ton!

Giacomo

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.