Hello QIIME users,
I’m trying to analyze sequences with QIIME2 that a collaborator previously analyzed with QIIME1, but I’m having trouble with alignment. I’m starting out with file.fna that contains ~1.5 million sequences. I’m running QIIME2-2021.2 using miniconda3/4.8.5 via my institution’s HPC.
All signs point to success with the following:
qiime tools import
--input-path file.fna
--output-path seqs.qza
--type 'SampleData[Sequences]'qiime vsearch dereplicate-sequences
--i-sequences seqs.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qzaqiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv
My intention was to do this next:
qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences rep-seqs.qza
--o-alignment aligned-rep-seqs.qza
--o-masked-alignment masked-aligned-rep-seqs.qza
--o-tree unrooted-tree.qza
--o-rooted-tree rooted-tree.qza
However, once I try to use mafft, I start getting error messages.
qiime alignment mafft
--verbose
--i-sequences rep-seqs.qza
--p-parttree
--o-alignment aligned-rep-seqs.qza
returns:
nseq = 1463689
groupsize = 1463690, partsize=50
mafft --dpparttree might give a better result, although slow.
mafft --fastaparttree is also available if you have fasta34.
splittbfast (nuc) Version 7.475
alg=M, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
1 thread(s)
picksize = 50
groupsize = -1
sueff_global = 0.100000
inputfile = infile
alloclen = 75324 in main
/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/bin/mafft: line 2747: 96023 Killed "$prefix/splittbfast" $legacygapopt -Z $algopt $splitopt $partorderopt $parttreeoutopt $memopt $seqtype $model -f "-"$gop -Q $spfactor -h $aof -p $partsize -s $groupsize $treealg $outnum -i infile > pre 2>> "$progressfile"
Traceback (most recent call last):
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in mafft
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_alignment/_mafft.py", line 128, in mafft
return _mafft(sequences_fp, None, n_threads, parttree, False)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
run_command(cmd, result_fp)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_alignment/_mafft.py", line 26, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '--parttree', '/tmp/qiime2-archive-d1_7r8b5/548013d5-28de-4c35-8a04-16698eb66cda/data/dna-sequences.fasta']' returned non-zero exit status 1.
Plugin error from alignment:
Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '--parttree', '/tmp/qiime2-archive-d1_7r8b5/548013d5-28de-4c35-8a04-16698eb66cda/data/dna-sequences.fasta']' returned non-zero exit status 1.
See above for debug info.
I was unable to work out how to add --dpparttree as suggested at the top of the output included above.
I’ve tried to use SEPP to get around this:
wget https://data.qiime2.org/2021.8/common/sepp-refs-silva-128.qza
qiime fragment-insertion sepp
--i-representative-sequences rep-seqs.qza
--i-reference-database sepp-refs-silva-128.qza
--o-tree insertion-tree.qza
--o-placements insertion-placements.qza
But that returns the following error:
Traceback (most recent call last):
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in sepp
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_fragment_insertion/_insertion.py", line 77, in sepp
debug)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/site-packages/q2_fragment_insertion/_insertion.py", line 53, in _run
subprocess.run(cmd, check=True, cwd=cwd)
File "/opt/apps/miniconda3/4.8.5/envs/qiime2-2021.2/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run-sepp.sh', '/tmp/qiime2-archive-i02jjmed/548013d5-28de-4c35-8a04-16698eb66cda/data/dna-sequences.fasta', 'q2-fragment-insertion', '-x', '1', '-A', '1000', '-P', '5000', '-a', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/aligned-dna-sequences.fasta', '-t', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/tree.nwk', '-r', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/raxml-info.txt', '-b', '1']' returned non-zero exit status 1.
Plugin error from fragment-insertion:
Command '['run-sepp.sh', '/tmp/qiime2-archive-i02jjmed/548013d5-28de-4c35-8a04-16698eb66cda/data/dna-sequences.fasta', 'q2-fragment-insertion', '-x', '1', '-A', '1000', '-P', '5000', '-a', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/aligned-dna-sequences.fasta', '-t', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/tree.nwk', '-r', '/tmp/qiime2-archive-d1k6qkge/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/raxml-info.txt', '-b', '1']' returned non-zero exit status 1.
Lastly, I’ve tried to use SINA outside of QIIME:
wget -O- https://www.arb-silva.de/fileadmin/arb_web_db/release_138_1/ARB_files/SILVA_138.1_SSURef_NR99_12_06_20_opt.arb.gz
| gunzip -c > SILVA_refnr.arbsina -i file.fna -r SILVA_refnr.arb -o aligned_seqs.fasta
Which appears to successfully run SINA 1.7.2 and return the output file. From there, I have not been able to import the aligned_seqs.fasta file into QIIME2 as an artifact. I’ve tried several different importable formats and types options, but not had any luck.
qiime tools import
--input-path aligned_seqs.fasta
--output-path aligned_rep_seqs.qza
--input-format AlignedDNAFASTAFormat
--type FeatureData[AlignedSequence]
There was a problem importing aligned_seqs.fasta:
aligned_seqs.fasta is not a(n) AlignedDNAFASTAFormat file:
Invalid character '' at position 37839 on line 1288468 (does not match IUPAC characters for this sequence type). Allowed characters are ACGTURYKMSWBDHVN.-.
qiime tools import
--input-path aligned_seqs.fasta
--output-path aligned_rep_seqs.qza
--type FeatureData[AlignedSequence]
There was a problem importing aligned_seqs.fasta:
aligned_seqs.fasta is not a(n) AlignedDNAFASTAFormat file:
Invalid character '' at position 37839 on line 1288468 (does not match IUPAC characters for this sequence type). Allowed characters are ACGTURYKMSWBDHVN.-.
qiime tools import
--input-path aligned_seqs.fasta
--output-path aligned-sequences.qza
--type 'SampleData[Sequences]'
There was a problem importing aligned_seqs.fasta:
aligned_seqs.fasta is not a(n) QIIME1DemuxFormat file
Is there a way forward with any of these paths? Thank you for your help!
Beck