Missing plugin to load artifact or BadZipFile?

I'm working though the “Atacama soil microbiome” tutorial (“Atacama soil microbiome” tutorial — QIIME 2 2022.11.1 documentation) again and I seem to be running into some technical trouble I'm not understanding. I have a previous MiSeq fastq.gz dataset in EMPPairedEnd format that I was previously able to demultiplex, denoise, etc. just fine in a previous version of QIIME2. In fact, just yesterday I was able to successfully demultiplex this aforementioned dataset but today it won't work for me. I'm really stumped and would really appreciate some help!

  • Version qiime2-2022.11 installed using conda in WSL with Ubuntu 22.04.1 LTS (GNU/Linux x86_64)

  • Command and error message:

(qiime2-2022.11) [email protected]:~/qiime2-argonne2$ qiime demux emp-paired --m-barcodes-file sample-metadata4.2.2.tsv --m-barcodes-column barcode-sequence --p-no-golay-error-correction
--i-seqs emp-paired-end-sequences2.2.qza --o-per-sample-sequences demux2.2.qza --o-error-corre
ction-details demux-details2.2.qza
Traceback (most recent call last):
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/util.py", line 398, in _load_input_file
artifact = qiime2.sdk.Result.load(fp)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/result.py", line 79, in load
archiver = archive.Archiver.load(filepath)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 366, in load
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 198, in mount
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 212, in extract
zf.extract(name, path=str(filepath.parent))
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/zipfile.py", line 1630, in extract
return self._extract_member(member, path, pwd)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/zipfile.py", line 1702, in _extract_member
shutil.copyfileobj(source, target)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/shutil.py", line 205, in copyfileobj
buf = fsrc_read(length)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/zipfile.py", line 940, in read
data = self._read1(n)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/zipfile.py", line 1030, in _read1
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/zipfile.py", line 958, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '923c6d75-59d8-4fc7-b539-6dec9823a0f5/data/reverse.fastq.gz'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/click/type.py", line 112, in _convert_input
result, error = q2cli.util._load_input(value)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/util.py", line 352, in _load_input
artifact, error = _load_input_file(fp)
File "/home/nickbenn/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/util.py", line 404, in _load_input_file
raise ValueError(
ValueError: It looks like you have an Artifact but are missing the plugin(s) necessary to load it. Artifact has type 'EMPPairedEndSequences' and format 'EMPPairedEndDirFmt'

There was a problem loading 'emp-paired-end-sequences2.2.qza' as an artifact:

It looks like you have an Artifact but are missing the plugin(s) necessary to load it. Artifact has type 'EMPPairedEndSequences' and format 'EMPPairedEndDirFmt'

See above for debug info.

Hey @nickbenn,

I'm afraid something quite unusual has occurred, the correct interpretation of the error is BadZipFile, specifically this:

zipfile.BadZipFile: Bad CRC-32 for file '923c6d75-59d8-4fc7-b539-6dec9823a0f5/data/reverse.fastq.gz'

What that is saying is that the checksum recorded in the zip file itself (via CRC-32) does not match the data.

This is a way of fingerprinting a bunch of data to make sure it has been reliably transferred (or stored). Since the fingerprints will always be the same when done on the same data, if the fingerprints match then you know everything is fine. If they don't match, then either the fingerprint that was stored is corrupt, or the data is corrupt.

Since the CRC-32 checksum is stored in two locations in a zip file, it is likely that the corruption occurred in the reverse.fastq.gz file rather than the checksum/fingerprint itself.

As for what to do next:

Let's confirm this situation by running:

unzip -t  emp-paired-end-sequences2.2.qza

Which will create a list of all of the entries in the zip and whether or not the CRC32's match.

Ideally you have this data in another location and can either re-import to create a new QZA file, or you have the QZA file itself in another location. If so, use that file instead.

Otherwise, there's not actually a way to do anything you can do, the data has been lost and can't actually be recovered.

Since you mentioned having just used this data, it is likely that your storage device is starting to fail, so I would look into replacing that and backing up anything else as soon as you can.

Sorry, I know that's a pretty scary answer.

1 Like

Hi @ebolyen
Thank you for the prompt reply!

I ran
"unzip -t emp-paired-end-sequences2.2.qza"

on the 3 artifacts I'm trying to work with and confirmed all the files are corrupted.
(qiime2-2022.11) [email protected]:~/qiime2-argonne2$ unzip -t emp-paired-end-sequences2.2.qza
Archive: emp-paired-end-sequences2.2.qza
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/metadata.yaml OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/VERSION OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/checksums.md5 OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/data/forward.fastq.gz OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/data/barcodes.fastq.gz OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/data/reverse.fastq.gz bad CRC 50eac8ce (should be a718f7af)
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/provenance/metadata.yaml OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/provenance/VERSION OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/provenance/citations.bib OK
testing: 923c6d75-59d8-4fc7-b539-6dec9823a0f5/provenance/action/action.yaml OK
At least one error was detected in emp-paired-end-sequences2.2.qza.
(qiime2-2022.11) [email protected]:~/qiime2-argonne2$ unzip -t emp-paired-end-sequences2.1.
Archive: emp-paired-end-sequences2.1.qza
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/metadata.yaml OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/VERSION OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/checksums.md5 OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/data/forward.fastq.gz bad CRC ba634ea0 (should be da732787)
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/data/barcodes.fastq.gz OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/data/reverse.fastq.gz bad CRC 6dacb16f (should be cb35a264)
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/provenance/metadata.yaml OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/provenance/VERSION OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/provenance/citations.bib OK
testing: 840654b8-f71c-410c-b3b7-fb7bc2f862cf/provenance/action/action.yaml OK
At least one error was detected in emp-paired-end-sequences2.1.qza.
(qiime2-2022.11) [email protected]:~/qiime2-argonne2$ unzip -t emp-paired-end-sequences1.qz
Archive: emp-paired-end-sequences1.qza
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/metadata.yaml OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/VERSION OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/checksums.md5 OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/data/forward.fastq.gz bad CRC db2c0ba8 (should be cc9b1727)
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/data/barcodes.fastq.gz OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/data/reverse.fastq.gz OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/provenance/metadata.yaml OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/provenance/VERSION OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/provenance/citations.bib OK
testing: 20e7784b-8a86-49a6-a4f1-8fd8a5a11b6a/provenance/action/action.yaml OK
At least one error was detected in emp-paired-end-sequences1.qza.

Unfortunately, the file emp-paired-end-sequences1.qza was also corrupted. This file was from demultiplexing I did over a year ago. I suppose it's about that time I clone my hard drive and replace.

Thanks again!

Sorry to hear that, it does seem like the HDD is the issue then.

Hopefully you have the data elsewhere. I would be suspicious of any other larger files on your hard drive, as they pose a larger target (and are more likely to get hit by bitrot).

Good luck!

Thank you again for your help!

Quick update on my situation.
I looked into the health of my SSD using crystal disk info software and it indicated "Good 94%" see the screenshot for more details.

I cleaned out some space on my drive, backed everything up again and deleted all the fastq files I was working with and downloaded them again from sequencing center server. I started over and everything seemed to work out with no issues encountered before. I don't know a ton about computing and hardware but seems like I just didn't have enough free space on my SSD.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.