vsearch classification error

I re imported the SILVA database exactly as instructed and re-ran the feature-classifier. Now I'm getting this error:


qiime feature-classifier classify-consensus-vsearch \

--i-query $projID-rep-seqs_NoTrunc.qza
--i-reference-reads /SILVA_db/silva13.8_1/silva-138.1-ssu-ref-seqs.qza
--i-reference-taxonomy /SILVA_db/silva13.8_1/silva-138.1-ssu-FULL-tax.qza
--p-maxaccepts 'all'
--p-threads 78
--o-classification $projID-repSeqs-SILVA13_8.1-vsearch-classified.qza
--verbose
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/qiime2-archive-pfzyq8ys/58332ddd-a506-46f4-bf18-fbf16d4729db/data/dna-sequences.fasta --id 0.8 --query_cov 0.8 --strand both --maxaccepts 0 --maxrejects 0 --db /tmp/qiime2-archive-lzir31qs/633c5659-73c0-419a-af72-16e6cdbccc19/data/dna-sequences.fasta --threads 78 --output_no_hits --blast6out /tmp/tmpr1jwos1q

vsearch v2.7.0_linux_x86_64, 377.4GB RAM, 88 cores

Reading file /tmp/qiime2-archive-lzir31qs/633c5659-73c0-419a-af72-16e6cdbccc19/data/dna-sequences.fasta 100%
3183581141 nt in 2224740 seqs, min 900, max 4000, avg 1431
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 3568 of 4383 (81.41%)
Traceback (most recent call last):
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2cli/commands.py", line 339, in call
results = action(**arguments)
File "", line 2, in classify_consensus_vsearch
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in callable_executor
output_views = self._callable(**view_args)
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_vsearch.py", line 62, in classify_consensus_vsearch
consensus = _consensus_assignments(
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_consensus_assignment.py", line 29, in _consensus_assignments
obs_taxa = _import_blast_format_assignments(
File "/home/AnalysisTools/miniconda3/envs/qiime2-2022.2/lib/python3.8/site-packages/q2_feature_classifier/_consensus_assignment.py", line 99, in _import_blast_format_assignments
if fields[1] == '*':
IndexError: list index out of range

Plugin error from feature-classifier:

list index out of range

See above for debug info.


We are wondering if this could be an issue with the sequence label on the query read.

Do you think you could find read number 3568 from inside projID-rep-seqs_NoTrunc.qza and post it here?
EDIT: This line means that 81.41% of queries hit the database. It does not tell us the location of the error.

There may also be interesting clues inside of /tmp/tmpr1jwos1q if you want to post the last few lines of the vsearch temporary output file.

I think read number 3568 is:

49b7afe9a0e7db3f6a357ea0ce9b08ed
CGGTAATACATAGGGGGCGAGCGTTGTCCGGAATGACTGGGCGTAAAGGGAGTGTAGGCGGCTTGGTAAGTTATGAGTGAAAGCCCGCAGCTTAACTGCGGAACTGCTCATAAAACTGCTTGGCTTGAGTACAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGGAATGCGTAGATATTAGGAAGAACACCAGAGGCGAAGGCGGCTTACTGGACTGAAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
But I'll post 1 above and below that too just in case I mis-counted:

2530af14cac9e9a839af0c8305934509
CCGCCAGCGTTCCCACGGCTGCATTTTTCGGCGCCGCCCCGCACACCTTTTTGTATGCCGCAGGCGTCAGCCCTAATTCGCGCCAGTACGCAAGCGCGGTTTTGTTGAGCATATCCCACAGGTCGAGGTATGGGCTGCGCGTCATGTTGGTAGCCCCGGCCTTATTGGTGTGCTCCACGATGGGGCCGTCGCCGTCCTCCCGGTAGTCCGGCTGA
49b7afe9a0e7db3f6a357ea0ce9b08ed
CGGTAATACATAGGGGGCGAGCGTTGTCCGGAATGACTGGGCGTAAAGGGAGTGTAGGCGGCTTGGTAAGTTATGAGTGAAAGCCCGCAGCTTAACTGCGGAACTGCTCATAAAACTGCTTGGCTTGAGTACAGGAGAGGTAAGCGGAATTCCTAGTGTAGCGGTGGAATGCGTAGATATTAGGAAGAACACCAGAGGCGAAGGCGGCTTACTGGACTGAAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGG
ffe191c518bd9bfe712d8ab736b3b891
CGGTAATACGTAGGTGGCGAGCGTTATCCGGATTTACTGGGTGTAAAGGGCGTGTAGGCGGGAAAGCAAGTCAGATGTGAAAACTGTGGGCTCAACCCACAGCCTGCATTTGAAACTGTTTTTCTTGAGTACTGGAGAGGCAGATGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGATCTGCTGGACAGCAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGG


that log file /tmp/tmpr1jwos1q isn't there !

Thank you for posting that. It looks fine to me... :face_with_monocle:

_import_blast_format_assignments
if fields[1] == '*':
IndexError: list index out of range

I wonder if this is on the Qiime2 side after all.

I guess this makes sense, as it's a temp file. Liz, is there a good way to preserve temp files when running a qiime2 plugin?

Rerun that command with --verbose and post all the info it gives you so we can inspect if for clues. :female_detective:

1 Like

the command WAS run with -- verbose

--i-query $projID-rep-seqs_NoTrunc.qza
--i-reference-reads /SILVA_db/silva13.8_1/silva-138.1-ssu-ref-seqs.qza
--i-reference-taxonomy /SILVA_db/silva13.8_1/silva-138.1-ssu-FULL-tax.qza
--p-maxaccepts 'all'
--p-threads 78
--o-classification $projID-repSeqs-SILVA13_8.1-vsearch-classified.qza
--verbose

Ah, OK. Thank you for letting me know.

Is that the only error you saw? I'm looking for more clues, so anything else you found would be really helpful. :mag:

New idea from Nick:

Run vsearch global directly, outside of QIIME 2. The log seems to indicate that the output is being created so I doubt that it is empty, but this would at least rule out vsearch as the culprit.

See if you can craft a vsearch command and run it without Qiime2.

Maybe like this:

vsearch --usearch_global query-sequences.fasta \
  --id 0.8 --query_cov 0.8 --strand both --maxaccepts 0 --maxrejects 0  \
  --db silva-database-sequences.fasta \
  --threads 0 --output_no_hits --blast6out hits.blast6.tsv

Thanks for sticking with us while we figure this out. I appreciate your patience.

1 Like

I'll try that and let you know.

1 Like

--thread 0 (does that mean it'll use everything the server has ?) I have 88 threads available and use around 78 threads

1 Like

Yes, that should use all threads, which should maximize speed. If you notice any issues, you could use fewer threads!

vsearch --usearch_global $SampleID-rep-seqs_NoTrunc_dna-sequences.fasta \

--id 0.8 --query_cov 0.8 --strand both --maxaccepts 0 --maxrejects 0
--db /SILVA_db/silva13.8_1/SILVA_138.1_SSURef_tax_silva.fasta
--threads 80 --output_no_hits --blast6out hits.blast6.tsv

vsearch v2.7.0_linux_x86_64, 377.4GB RAM, 88 cores

Reading file /SILVA_db/silva13.8_1/SILVA_138.1_SSURef_tax_silva.fasta 100%
3183581141 nt in 2224740 seqs, min 900, max 4000, avg 1431
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 3568 of 4383 (81.41%)

Awesome, looks like it finished successfully!

vsearch v2.7.0_linux_x86_64 is the expected version of vsearch, so that should be fine.

What does the output hits.blast6.tsv look like? Maybe you could post the last few lines with tail hits.blast6.tsv?

188a21290cefc94a0bff971cbc66ec46 FJ678549.1.1399 80.0 330 60 3 1 326 1 1399 -1 0
188a21290cefc94a0bff971cbc66ec46 FJ512235.1.1377 80.0 330 61 2 1 326 1 1377 -1 0
188a21290cefc94a0bff971cbc66ec46 FPLO01005756.1.1409 80.0 330 61 2 1 326 1 1409 -1 0
188a21290cefc94a0bff971cbc66ec46 FJ681923.1.1406 80.0 330 61 2 1 326 1 1406 -1 0
188a21290cefc94a0bff971cbc66ec46 FJ512234.1.1376 80.0 330 61 2 1 326 1 1376 -1 0
188a21290cefc94a0bff971cbc66ec46 FJ679173.1.1390 80.0 330 61 2 1 326 1 1390 -1 0
188a21290cefc94a0bff971cbc66ec46 LC028757.1.1514 80.0 330 62 1 1 326 1 1514 -1 0
188a21290cefc94a0bff971cbc66ec46 FMDX01000032.25335.26845 80.0 330 61 2 1 326 1 1511 -1 0
e48fd9de444b8876ab6d95e01a373486 * 0.0 0 0 0 0 0 0 0 -1 0
20b6eac6d7b5a8d1084b4be267934311 * 0.0 0 0 0 0 0 0 0 -1 0

1 Like

Good Morning !

SO how do you propose I proceed after this vsearch_global worked ?

How do I import the hits.blast6.tsv file into qiime2 and continue so that I can get a $projID-repSeqs-SILVA13_8.1-vsearch-classified.qza file ??

So the qiime2 command I was using was:
--i-query $projID-rep-seqs_NoTrunc.qza
--i-reference-reads /SILVA_db/silva13.8_1/silva-138.1-ssu-ref-seqs.qza
--i-reference-taxonomy /SILVA_db/silva13.8_1/silva-138.1-ssu-FULL-tax.qza
--p-maxaccepts 'all'
--p-threads 78
--o-classification $projID-repSeqs-SILVA13_8.1-vsearch-classified.qza
--verbose


Could it be that there is something wrong with the way the taxonomy file was imported maybe or maybe the whole database importing procedure ?
I followed the instructions to the letter on how to import the SILVA database so maybe the import script needs to be examined a bit more closely ?

This is how I imported the taxanomy file:
qiime tools import
--type FeatureData[Taxonomy]
--input-format HeaderlessTSVTaxonomyFormat
--input-path /SILVA_db/silva13.8_1/FULL/taxmap_slv_ssu_ref_138.1.txt
--output-path slv_ssu_ref_138_1_FULL-tax.qza


1 Like

Good afternoon,

Thank you for your patience. Let's plan our next steps:

I like your idea of validating the output and database formats. While VSEARCH was able to finish, something could still be wrong with the output format or it's match against the database taxonomy.

This function has had a bit of a makeover with the (upcoming) 2022.8 release. Before we try a manual import, let's wait until the release comes out and rerun this pipeline. The latest release exposes separate actions for vsearch alignment and consensus classification steps (as well as the classify-consensus-vsearch pipeline chaining these together), so this would be useful for debugging if you still experiences issues with the new release.

OK ETA for the new release ?

You can track the dev of 2022.8 over here: [Preview] QIIME 2 2022.8 development changelog

Soon!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.