Problem with qza file for fragment-insertion sepp: Invalid value for '--i-representative-sequences'

Hello,

I'm sorry to say that I have searched for this issue on and off the forum, and I'm stuck. I am running QIIME2021.8. I also have a condo env set up for qiime2_amplicon_2024_5, but this doesn't seem to help the problem. I'm running these on our University cluster (https://unity.rc.umass.edu/).

The immediate problem is that I'm running fragment-insertion sepp and am getting the error:
(1/1) Invalid value for '--i-representative-sequences': 'rep-seq-CH_.qza' is
not a QIIME 2 Artifact (.qza).

I downloaded the rep-seq.qza from this tutorial: QIIME 2 Library, and it's working, so something is wrong with my ARTIFACT FeatureData[Sequence] file.

Here's what's happened. I am working with two sets of sequence data that I want to merge to create one phylogeny. To do this I used qiime feature-table merge-seqs. Before that, I'd just removed some too long and too short sequences using rescript filter-seqs-length (which is why I needed the conda env). I checked all files (the two trimmed files and the merged files) using feature-table tabulate-seqs and they all looked great.

I tried running the fragment-insertion sepp command with --verbose, but it doesn't get past this initial error. Any ideas? Thanks!

Hello Kristen,

Welcome to the forums! :qiime2:

I agree; that file must be broken. Let's start here.

Are you willing and able to post this .qza file here? If so, we can take a look at it.

You could also try to download this file and opening it with https://view.qiime2.org.

Hi! Thanks for the welcome. I love it here <3

I did a bit of troubleshooting while I was waiting, and the trouble seems to be with rescript. The output .qza file from rescript gets the same error message. So that tells me the issue is not with merge-seqs.

Sure, I'll upload the file.
rep-seq-C_filtered.qza (989.7 KB)

1 Like

and I just tried using the merged files that were not filtered using rescript, and get the same error message. So maybe it's not rescript.

Now I'm wondering if there's something wrong with the way I'm trying to use fragment-insertion sepp.

qiime fragment-insertion sepp \
 --i-representative-sequences rep-seq-C_.qza \
 --i-reference-database sepp-refs-silva-128.qza \
 --p-threads 0 \
 --o-tree tree-rep-seq-C_filtered.qza \
 --o-placements placements-rep-seq-C_filtered.qza \
 --verbose

Yeah, that files opens in q2view no problem:
https://view.qiime2.org/provenance/?src=https://forum.qiime2.org/uploads/short-url/z4SGyJESN6bmRD0yKLZC3c79NLx.qza

Hold on...

Uh, those are three different files.

Which ones can you open with q2view?
Which ones make errors when running sepp?

Ah, sorry about that!

  • rep-seq-C_.qza is one half of the data set, before filtering w/rescript and merge
  • rep-seq-C_filtered.qza is one half the data, after filtering w/rescript & before merge
  • rep-seq-CH_.qza is after merging (with the other half of the data) without filtering

I hadn't previously tried to visualize the .qza files, just the corresponding .qzv summaries. Here are the results.

name: "rep-seq-C_.qza"
uuid: "be3a9276-e64b-4023-88cd-624969424ef8"
type: "FeatureData[Sequence]"
format: "DNASequencesDirectoryFormat"

name: "rep-seq-C_filtered.qza"
uuid: "2854a4cc-d70c-4687-b267-02f335697f20"
type: "FeatureData[Sequence]"
format: "DNASequencesDirectoryFormat"

name: "rep-seq-CH_.qza"
uuid: "f993c1ea-88f4-432e-bcd9-1c492c0293f2"
type: "FeatureData[Sequence]"
format: "DNASequencesDirectoryFormat"

As for sepp, I only tried rep-seq-CH_.qza and rep-seq-C_filtered.qza with sepp, and both had the same 'not a QIIME 2 Artifact' error.

1 Like

Good morning! I have an update. The tldr is that not all CPU partitions assigned on a computer cluster are created equally, so this problem can likely be solved by quitting and starting a new interactive session, or re-launching the sbatch.

I figured this out because I got the same error running feature-table merge-seqs. So I remembered something from running genome assemblies, that there is one program that needs something that only some of the cpu nodes have. Idk what it is, but trying it in a new interactive session often works.

So ok it only worked this once, and I still haven't gotten sepp to run successfully, but I'm making a test rep-seq set now by filtering to work with just a fraction of the sequences.

Thanks for your help here!

1 Like

and now sepp works! so I think it was a CPU issue.

1 Like