Clustering sequences into OTUs using q2-vsearch

chloewang · September 7, 2018, 1:27am

I am totally new in Qiime and Qiime2. I am running the clustering step as below:

Import reference sequence

qiime tools import _
_ --input-path silva.nr_v132.align _
_ --output-path reference-seqs.qza _
_ --type 'FeatureData[Sequence]'

Closed-reference clustering

qiime vsearch cluster-features-closed-reference _
_ --i-table feature-frequency-filtered-table.qza _
_ --i-sequences rep-seqs-filtered.qza _
_ --i-reference-sequences reference-seqs.qza_
_ --p-perc-identity 0.97 _
_ --o-clustered-table table-cr-97.qza _
_ --o-clustered-sequences rep-seqs-cr-97.qza _
_ --o-unmatched-sequences unmatched-cr-97.qza

But there was always error info as:
Plugin Error from vsearch: Command '['vsearch', '--usearch_global',....returned non-zero exit status 1.

Here is the log info:

The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /tmp/tmpbr_9gead --id 0.97 --db /tmp/qiime2-archive-xodkr3r6/fadebd06-b0b6-43e6-a3a7-2f731302270d/data/dna-sequences.fasta --uc /tmp/tmpcyed42k0 --strand plus --qmask none --notmatched /tmp/tmpl64pqv9h --threads 1

vsearch v2.7.0_linux_x86_64, 31.4GB RAM, 8 cores

Reading file /tmp/qiime2-archive-xodkr3r6/fadebd06-b0b6-43e6-a3a7-2f731302270d/data/dna-sequences.fasta

Fatal error: illegal character '.' on line 2 in FASTA file
Traceback (most recent call last):
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in cluster_features_closed_reference
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
output_types, provenance)
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in callable_executor
output_views = self._callable(**view_args)
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 256, in cluster_features_closed_reference
run_command(cmd)
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2_vsearch/_cluster_features.py", line 33, in run_command
subprocess.run(cmd, check=True)
File "/home/xyz/miniconda3/envs/qiime2-2018.6/lib/python3.5/subprocess.py", line 398, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['vsearch', '--usearch_global', '/tmp/tmpbr_9gead', '--id', '0.97', '--db', '/tmp/qiime2-archive-xodkr3r6/fadebd06-b0b6-43e6-a3a7-2f731302270d/data/dna-sequences.fasta', '--uc', '/tmp/tmpcyed42k0', '--strand', 'plus', '--qmask', 'none', '--notmatched', '/tmp/tmpl64pqv9h', '--threads', '1']' returned non-zero exit status 1

What can I do to debug?

Thank you very much.
Chloe

colinbrislawn · September 7, 2018, 4:40pm

Hello Chloe,

Thanks for posting the full script and error message. I think I found the important line in the middle of the error:

Looks like silva.nr_v132.align is not quite in the right format... Let's see what the qiime devs recommend to do to fix it!

Colin

ebolyen · September 7, 2018, 4:57pm

Hey @chloewang,

It looks like the issue is you are using aligned fasta file (hence the dot gap-character @colinbrislawn found in the error).

You need to use the representative sequences from silva instead.

Hope that helps!

chloewang · September 7, 2018, 6:44pm

Thank you @colinbrislawn and @ebolyen.
Now I am trying to use the QIIME-compatible SILVA and running the following commands:

Obtaining and importing reference data sets (work)

qiime tools import _
_ --type ‘FeatureData[Sequence]’ _
_ --input-path silva_132_99_16S.fna _
_ --output-path silva_132_99_16S_otus.qza

#Herin, Error:no such option: --input-format. Needs to debug

qiime tools import _
_ --type ‘FeatureData[Taxonomy]’ _
_ --input-format HeaderlessTSVTaxonomyFormat _
_ --input-path taxonomy_7_levels.txt _
_ --output-path ref-taxonomy.qza

Extract reference reads(work)

qiime feature-classifier extract-reads _
_ --i-sequences silva_132_99_16S_otus.qza _
_ --p-f-primer GTGCCAGCMGCCGCGGTAA _
_ --p-r-primer GGACTACHVGGGTWTCTAAT _
_ --p-trunc-len 120 _
_ --o-reads ref-seqs.qza

Train the classifier (waiting to run)

qiime feature-classifier fit-classifier-naive-bayes _
_ --i-reference-reads ref-seqs.qza _
_ --i-reference-taxonomy ref-taxonomy.qza _
_ --o-classifier classifier.qza

Test the classifier (waiting to run)

qiime feature-classifier classify-sklearn _
_ --i-classifier classifier.qza _
_ --i-reads rep-seqs.qza _
_ --o-classification taxonomy.qza

qiime metadata tabulate _
_ --m-input-file taxonomy.qza _
_ --o-visualization taxonomy.qzv

I am stuck in import taxonomy.txt with Qiime2-2018.6. Do you have any suggestions?
I appreciate your help.

Chloe

ebolyen · September 7, 2018, 6:47pm

Hey @chloewang,

In 2018.8 --source-format was changed to --input-format. Since you are running 2018.6 just change it back to --source-format (or use the 2018.6 docs instead of latest) and you should be good to go.

chloewang · September 7, 2018, 9:13pm

@ebolyen Thank you very much. It works.

chloewang · September 8, 2018, 10:23pm

@ebolyen I am sorry to keep asking questions. But I am running “train the classifier” code. It takes hours and hasn’t finished till now. It seems to occupy a lot of RAM since my computer is almost frozen now. Is this normal?

Mehrbod_Estaki · September 9, 2018, 1:14am

Hi @chloewang,

Training the classifier is one of the longer processes available through qiime2. Depending on the size of the database and how much resources you have dedicated to the processes, it can certainly take a long time and 3 hours is perfectly normal. As long as it is still processing I would let it do its thing and wait!

chloewang · September 9, 2018, 11:00pm

Got it. Thank you @Mehrbod_Estaki

system · October 11, 2018, 5:11am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.