Qiime2 unable to perform fragment-insertion on clustered rep_seq

Hello,
I've previously used fragment-insertion plugin without any issues. However, recently as I'm trying generate the tree using rep_seq file generated from vsearch open-reference clustering. I'm getting the following error from the this plugin. :sweat_smile:


And the log file says-

Removing /tmp/tmp.etXzw1y7jQ/sepp-tmp-CaHDDnOVOT
Traceback (most recent call last):
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in sepp
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_fragment_insertion/_insertion.py", line 71, in sepp
_run(str(representative_sequences.file.view(DNAFASTAFormat)),
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/site-packages/q2_fragment_insertion/_insertion.py", line 53, in _run
subprocess.run(cmd, check=True, cwd=cwd)
File "/home/turtle/miniconda3/envs/qiime2-2023.2/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run-sepp.sh', '/tmp/qiime2/turtle/data/3095dbf2-7de3-4f15-9049-89ca7ccc2969/data/dna-sequences.fasta', 'q2-fragment-insertion', '-x', '1', '-A', '1000', '-P', '5000', '-a', '/tmp/qiime2/turtle/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/aligned-dna-sequences.fasta', '-t', '/tmp/qiime2/turtle/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/tree.nwk', '-r', '/tmp/qiime2/turtle/data/e44b5e78-31e5-4a0f-9041-494bc3ca2df2/data/raxml-info.txt']' returned non-zero exit status 1.

But I did get it to run for the rep_seq file from de-novo clustering. Being a novice in metagenomic data analysis. Can someone please explain to me what I'm, doing wrong? and How can I resolve this error? :face_holding_back_tears:

I'm attaching the file that I'm trying to run.
sotu_rep_seqs_0.97.qza (705.7 KB)
sotu_rep_seqs_0.99.qza (1.2 MB)

I used these rep_seq file while using the provided sepp-refs-silva-128.qza from the qiime2 data-resource as reference.

Hello @Ayazaeroth,

Please rerun with the --verbose flag and post the output.

Here is the error after running with --verbose

I've read about the reference file being corrupted in one post and tried re-downloading it from the resource but it gives the same error.

Hello @Ayazaeroth,

How much memory is available on your machine? Can you also confirm that you're not running out of disk space?

I don't think that's the case, here's what I can show.


Hello @Ayazaeroth,

Could you run the command with the --p-debug flag as well please?

using the --p-debug provided a huge error script the summary of which I tried to capture with the following 5 screenshots:





Hello @Ayazaeroth,

What is the full command you used to generate the open reference OTUs?

Hey, sorry for the delay. Here's the code I used for clustering:

qiime vsearch cluster-features-open-reference \
  --i-table combined_deblur_table.qza \
  --i-sequences combo_rep_seqs.qza \
  --i-reference-sequences sOTU_tabs/silva-138-99-seqs.qza \
  --p-perc-identity 0.97 \
  --o-clustered-table sOTU_tabs/sotu_table_0.97 \
  --o-clustered-sequences sOTU_tabs/sotu_rep_seqs_0.97.qza \
  --o-new-reference-sequences sOTU_tabs/new_ref_seqs_0.97.qza

Hello @Ayazaeroth,

What is the source of your input sequences? Some if not all are coming from ncbi? It looks like you're trying to insert fragments that are already used in the reference database, and since these have the same name, you're getting an error. Obviously, there's not much to learn from performing fragment insertion with fragments that make up the reference, so I'm guessing this was an accident somehow. Perhaps you merged your real sequences with the reference sequences at some point?

I've been using a single source table for open clustering.
Here's the table and seqs .
combined_deblur_table.qza (2.9 MB)
combo_rep_seqs.qza (2.0 MB)

qiime vsearch cluster-features-open-reference \
  --i-table combined_deblur_table.qza \
  --i-sequences combo_rep_seqs.qza \
  --i-reference-sequences silva-138-99-seqs.qza \
  --p-perc-identity 0.97 \
  --o-clustered-table sotu_table_0.97 \
  --o-clustered-sequences otu_rep_seqs_0.97.qza \
  --o-new-reference-sequences new_ref_seqs_0.97.qza
  
qiime feature-table summarize \
      --i-table sotu_table_0.97.qza \
      --m-sample-metadata-file metadata_final.tsv \
      --o-visualization sotu_table_0.97.qzv
      
qiime feature-classifier classify-sklearn \
  --i-reads sotu_rep_seqs_0.97.qza \
  --i-classifier silva-138-99-nb-classifier.qza \
  --o-classification sOTU_taxonomy_97.qza
  
qiime metadata tabulate \
  --m-input-file sotu_taxonomy_0.97.qza \
  --o-visualization sotu_taxonomy_0.97.qzv

(error)
qiime fragment-insertion sepp \
  --i-representative-sequences sotu_rep_seqs_0.97.qza \
  --i-reference-database sepp-refs-silva-128.qza \
  --o-tree insertion-tree-97.qza \
  --o-placements insertion-placements-97.qza \
  --verbose

And here's the code.
All the steps work fine until the last. And the same files work completly file for denovo. IS there any way to work around this? shall I try to build the tree from the main comb_deblur_table instead? I'm lost.

Hello @Ayazaeroth,

The sequences that make up your deblur tables, where did those come from?

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.