plugin error from vsearch: No matches were identified to reference_sequences.

wbb_121 · June 10, 2019, 2:03pm

Hi to all!

I am running QIIME2 2019.01 via VirtualBox to analyze a study downloaded from NCBI. When using qiime vsearch cluster-features-closed-reference, I faced this problem.

with verbose:
vsearch v2.7.0_linux_x86_64, 4.8GB RAM, 6 cores

Reading file /tmp/qiime2-archive-9w63scf6/4aa7bd0a-0b9b-44e7-b342-007854eb5248/data/dna-sequences.fasta 100%
142290491 nt in 99322 seqs, min 1254, max 2353, avg 1433
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching query sequences: 0 of 148682 (0.00%)
vsearch v2.7.0_linux_x86_64, 4.8GB RAM, 6 cores

Reading file /tmp/tmp8pr_na5i 100%
38310917 nt in 148682 seqs, min 54, max 896, avg 258
Getting sizes 100%
Sorting 100%
Median abundance: 1
Writing output 100%
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/tmp9dh7g16y --id 0.97 --db /tmp/qiime2-archive-9w63scf6/4aa7bd0a-0b9b-44e7-b342-007854eb5248/data/dna-sequences.fasta --uc /tmp/tmphpc6wmek --strand plus --qmask none --notmatched /tmp/tmp8pr_na5i --threads 1

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --sortbysize /tmp/tmp8pr_na5i --xsize --output /tmp/q2-DNAFASTAFormat-gk1csu50

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 275, in cluster_features_closed_reference
collapse_f = _collapse_f_from_sqlite(conn)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 97, in _collapse_f_from_sqlite
raise ValueError("No sequence matches were identified by vsearch.")
ValueError: No sequence matches were identified by vsearch.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "</home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-122>", line 2, in cluster_features_closed_reference
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_vsearch/_cluster_features.py", line 278, in cluster_features_closed_reference
raise VSearchError('No matches were identified to '
q2_vsearch._cluster_features.VSearchError: No matches were identified to reference_sequences. This can happen if sequences are not homologous to reference_sequences, or if sequences are not in the same orientation as reference_sequences (i.e., if sequences are reverse complemented with respect to reference sequences). Sequence orientation can be adjusted with the strand parameter.

I searched for the problem on the forum but did not find a way to solve it.

By the way, I had another two questions:

The data were sequencing with 454-FLX Titanium chemistry (Roche) so I suppose DADA2 or Deblur could not be applied?
I didn't do quality control and chimera filtering myself because the data I downloaded are clean according to the original paper. Is it okay?

Thank you!

colinbrislawn · June 10, 2019, 6:51pm

Hello @wbb_121

Welcome to the Qiime 2 forums! :qiime2: Thank you for posting.

Here is the most important part of the error message.

q2_vsearch._cluster_features.VSearchError: No matches were identified to reference_sequences. This can happen if sequences are not homologous to reference_sequences, or if sequences are not in the same orientation as reference_sequences (i.e., if sequences are reverse complemented with respect to reference sequences). Sequence orientation can be adjusted with the strand parameter.

So the reads in stat_3000_rep-seqs.qza don't match your database. Are these reads from the 16S region? While the 454-FLX should still work, using a different region would not.

Let me know what you find,
Colin

wbb_121 · June 11, 2019, 2:09am

Hi Colin. @colinbrislawn

Thank you so much for your attention to this matter. The reads in stat_3000_rep-seqs.qza are from 16S V3 region and I succeeded matching them to Greengenes ‘97_otus.fasta’ via the command ‘pick_otus.py’ in QIIME1 before. So I could not understand why it failed this time.

Could you give me some suggestions? Thanks again.

colinbrislawn · June 11, 2019, 5:52pm

Ah OK! This is a great clue:

These closed-ref search methods use different programs and may have different defaults... but they should be pretty similar. I wonder if it's the 97_otus.qva file. I know the qiime 1 databases use the full length 16S gene, but some of the qiime 2 default databases use just a specific region of the gene.

Is there a chance that the 97_otus.qva file doesn't include the 16S V3 you have in your reads?

Colin

ebolyen · June 11, 2019, 7:10pm

Hey @wbb_121,

I think @colinbrislawn is right that you reference probably isn't lining up. Regarding your other two questions:

DADA2 does work with 454 data in fact! You just need to use denoise-pyro in the QIIME 2 plugin instead of the single or paired.

I mean, if you believe them, but I kind of doubt that is true. If you use DADA2's denoise-pyro, that will all get handled automatically.

wbb_121 · June 12, 2019, 4:41am

Thanks @colinbrislawn @ebolyen

Thanks for your suggestions, but actually, the file 97_otus.qva is the full-length 16S rRNA gene imported by myself. Do you mean I should extract the V3 region according to the primers and use it as reference?

It's really good news! I'll try it.

However, I was comparing different methods with my newly developed method to calculate beta diversity. So I guess I need to perform OTU based methods anyway.

colinbrislawn · June 12, 2019, 3:11pm

Thanks for telling me more.

Nope. It should totally work.

You did everything right. I wonder why it’s not working. It’s a mystery!

Maybe it’s a bug. Have you tried doing this alignment using vsearch directly? I can help test it out to try and see what’s wrong.

Colin

wbb_121 · June 13, 2019, 2:33am

Thank you so much for all you have done. @colinbrislawn

I haven't tried vsearch directly. (In fact, I haven't used vsearch directly before.) If possible, could you please help me test it? The data are available from the NCBI under BioProject 168618.

Thanks again.

colinbrislawn · June 14, 2019, 3:50am

Sure thing.

I'm not sure what you are familiar with in terms of linux and vsearch and other things, so I don't want to go too slow or too fast. Here is the outline of what I would do (and let me know if you have any questions!).

First, activate your qiime2 conda environment and run this command

vsearch

It should show you the vsearch version and some example commands.

Second, let's collect the reads (queries) and database (target) needed to run vsearch.

The reads are the .fna or .fasta file you imported to make stat_3000_rep-seqs.qza.
The database, also in the .fna or .fasta format, is the

Now that you have the reads and the database vsearch, you can put them all together. Here is how I would do it:

vsearch --usearch_global stat_3000_rep-seqs.fna \
--db 97_otus.fna --blast6out stat_3000_rep-cr-97.txt \
--id 0.97 --threads 2

You can add more threads

Also, try this command:

vsearch --usearch_global stat_3000_rep-seqs.fna \
--db 97_otus.fna --blast6out stat_3000_rep-cr-97.txt \
--id 0.97 --threads 2 --strand both

Here, I've added the --strand flag and set it to both. This will search the database in both directions (forward and reverse).

Let me know what you find!

Colin

wbb_121 · June 14, 2019, 12:36pm

Thank you, Colin.

I followed your instructions and got the following results.

The two commands had the same results: NO READS MATCH MY DATABASE.

OK. I began to doubt myself. The command I use in QIIME1 is

pick_otus.py -i ./stat_3000.fsata -r ./97_otus.fasta -m blast -o ./blast_picked_otus

make_otu_table.py -i ./blast_picked_otus/stat_3000_otus.txt -o stat_3000.biom

It does give me a reasonable OTU table. (At least it looked reasonable to me.)
Is there any problem? What else could I do?

Thanks again.

colinbrislawn · June 16, 2019, 4:55pm

Very strange! I have no idea why blast would be able to find hits but vsearch would not...

Any other ideas here?

wbb_121 · June 18, 2019, 1:57am

Oh, I remembered that at first I tried 'uclust' in QIIME1 (default for pick_otus.py) and it failed to match my sequences to Greengenes. So finally I decided to use 'blast'.

Except for that, I'm afraid I didn't have any ideas either. Maybe I had a really special dataset?

colinbrislawn · June 18, 2019, 7:57pm

Ah OK. That makes more sense.

Blast does local alignment, while vsearch and uclust do global alignment while ignoring terminal gaps. This could lead to some different results.

I bet vsearch would work if you use --p-perc-identity 0.95 or --p-perc-identity 0.90. Have you tried that?

Colin

wbb_121 · June 20, 2019, 7:19am

Wow, that's surprising!! I tried

--p-perc-identity 0.85

as you suggested and it gave me the results I expected.

I felt stupid for not realizing this point. Thank you so much!

colinbrislawn · June 20, 2019, 5:18pm

I'm glad this worked!

It's not you, it's the identify definition; 97% means something different for vsearch and blast. Confusing, right???

Let me know if you have any other questions.
Colin

system · July 21, 2019, 11:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.