Not a qiime archive

Hi,

I am trying to run dada2 on a large number of samples, and after finishing the import step, it says paired-end-demux.qza is not a qiime archive. This is the command I used to generate it:

qiime tools import --type SampleData[PairedEndSequencesWithQuality] --input-path Manifest1000.csv --output-path paired-end-demux.qza --source-format PairedEndFastqManifestPhred33 

When I look at the object, it is 56 GB, so it’s clearly there and has data in it. Do you have any insight on what the problem could be?

Best,

Anna

qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end-demux.qza --o-table table.qza --o-representative-sequences rep-seqs.qza --p-trim-left-f 10 --p-trim-left-r 0 --p-trunc-len-f 290 --p-trunc-len-r 290 --p-n-threads 0
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/commands.py", line 187, in __call__
    arguments, missing_in, verbose = self.handle_in_params(kwargs)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/commands.py", line 261, in handle_in_params
    kwargs, fallback=cmd_fallback
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/handlers.py", line 255, in get_value
    return qiime2.Artifact.load(path)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 60, in load
    archiver = archive.Archiver.load(filepath)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 294, in load
    archive = cls.get_archive(filepath)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 259, in get_archive
    raise ValueError("%s is not a QIIME archive." % filepath)
ValueError: paired-end-demux.qza is not a QIIME archive.

Did you open paired-end-demux.qza in some kind of ZIP viewer program? Or unzip and zip the contents again? Those things can introduce extra files into the qza that make it unreadable to QIIME 2. It sounds like the file was corrupted somehow — was it transferred between machines at some point? If you didn’t do any of those things, can you try editing your manifest to only include a few samples, and see if that works? Also, can you run qiime tools peek paired-end-emux.qza and print the results here? Thanks!

The .qza file was created on Amazon, I didn’t move it. All of the fast files were unzipped to begin with. It works fine with 250 samples, but doesn’t seem to work for 500 or 1000. Here is the output from that command:

qiime tools peek paired-end-demux.qza
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/q2cli/tools.py", line 74, in peek
    metadata = qiime2.sdk.Result.peek(path)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/sdk/result.py", line 51, in peek
    return ResultMetadata(*archive.Archiver.peek(filepath))
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 273, in peek
    archive = cls.get_archive(filepath)
  File "/home/ec2-user/miniconda3/envs/qiime2-2017.4/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 259, in get_archive
    raise ValueError("%s is not a QIIME archive." % filepath)
ValueError: paired-end-demux.qza is not a QIIME archive.

Hi @akknight216, before I address the error you brought up, I think it is worth mentioning that it looks like maybe you are combining multiple sequencing runs prior to denoising --- this isn't recommended (more on why in a bit!).

Our FMT tutorial provides a good example of running an analysis on multiple sequencing runs. Basically, you import the demultiplexed data on a per-run basis (for example, our FMT tutorial has two runs, sounds like you might have more). Then, you can run qiime dada2 denoise-paired on each run, and finally merge the results. Once everything is merged, all of your downstream analyses should proceed as if you only had one sequencing run (as in, you shouldn't need to do anything different).

The DADA2 docs have this to say about why we use that strategy:

Large projects can span multiple sequencing runs, and because different runs can have different error profiles, it is recommended to learn the error rates for each run individually. Typically this means running the Sample Inference script once for each run or lane, and then merging those runs together into a full-study sequence table. If your study is contained on one run, that part of this script can be ignored.

So, in the short-term, it seems like you can keep moving forward with your analysis by adopting this per-sequencing-run strategy.

As far as the ValueError: paired-end-demux.qza is not a QIIME archive error, would you be willing to share a minimum data set that reproduces the problem? If so, I can follow up with you in a direct message about how we can transfer the data or arrange access to your server.

Thanks!

1 Like

Just following up here,

We worked with @akknight216 offline in a DM and here’s what we think happened:

The import command never actually completed (it was in the middle of zipping everything into a .qza). If your SSH connections terminates for whatever reason, you’ll end up with a file that say .qza, but it’s incomplete. We were able to use the manifest for the 500 sample dataset and the import worked once we ran it inside of screen (to ensure that it would keep running even if we disconnected). A good way to notice if this has happened is if you have things like qiime2-archive-[gibberish] directories in your /tmp directory. Assuming you aren’t running anything at the moment, that probably means something crashed in an odd way.

I was able to use hexdump on the start of one of the incomplete .qza files to find its UUID which matched the UUID in the root of one of the leftover qiime2-arcive-[gibberish] directories (this is how you can prove the identity of these if you’re not sure if it is safe to delete).

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.