Denoise: Not a Qiime2 artifact

Hi folks,

I am trying to denoise a demux file. The demux file was made from 48 fastq files that look just like a regular fastq file but am starting with reads from which the primer/adapters were already cut out using cutadapt. I don't usually do it this way but someone did it this for me to help out with difficult removal of primers/adapters (something weird happened during the sequence run). When I run denoising I get a message stating that my file is not a Qiime artifact (see below), yet the demultiplexing seems to work just fine. I used Qiime tools validate on the demultiplexed file and the message (also below) seems to indicate that it "sees" gz files, which is a little confusing because I had gunzipped the files before demultiplexing (or so I thought). This was done within QIIME2 2019.10 on the cyverse platform (I am checking with them too just in case there is an issue with this app). Any thoughts on what might be going on here?

Alternatively, if anyone has a recommendation of another way I could quickly get repseqs and table, that would be great as a quick fix. I have no fancy analysis in mind, I just need to repseqs so that I can search against NCBI. In the long run it looks like I should switch to Docker. Thanks for any suggestions!

Bill

DADA2 COMMAND AND ERROR
I am using qiime2.2019.10 in cyverse but this seems to be an issue with every version in cyverse.
!qiime dada2 denoise-single
--i-demultiplexed-seqs /filepath/file.qza
--p-trunc-len 0
--p-trim-left 0
--p-max-ee 1
--o-table /filepath/table
--o-representative-sequences /filepath/repseqs
--o-denoising-stats /filepath/denoise

(1/1) Invalid value for "--i-demultiplexed-seqs": '/filepath/file.qza' is not a QIIME 2 Artifact (.qza)

QIIME TOOLS VALIDATE RESULTS

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 409, in validate
result = qiime2.sdk.Result.load(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/result.py", line 66, in load
archiver = archive.Archiver.load(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 305, in load
rec = archive.mount(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 204, in mount
root = self.extract(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 215, in extract
zf.extract(name, path=str(filepath))
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1507, in extract
return self._extract_member(member, path, pwd)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1579, in _extract_member
shutil.copyfileobj(source, target)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 872, in read
data = self._read1(n)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 962, in _read1
self._update_crc(data)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 890, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file '7694313a-fa5d-4208-bcd8-bc6e5685ebe7/data/T2163_8_L001_R1_001.fastq.gz'

There was a problem loading /data-store/iplant/home/wlandesman/RO3-Plate4/12S-Trimmed-Nov5/demux-trimmed-nov5.qza as a QIIME 2 Result:

Bad CRC-32 for file '7694313a-fa5d-4208-bcd8-bc6e5685ebe7/data/T2163_8_L001_R1_001.fastq.gz'

See above for debug info.

Hi @wlandesman, this error most likely means your .qza is corrupted. Would it be possible for you to recreate it?

Interesting. I did make the demux file twice. The first time it was with fastq.gz files and I got that message. So I gunzipped, thinking that was the issue, and got the same message. I could try again. Would you recommend working with fastq.gz or fastq? Or doesn't it matter? Thanks for your help!

Bill

You're going to want the files to be zipped when you import them. Since you were able to unzip them without an error, they must have been zipped to begin with (sometimes you pull data with a .gz extension that isn't actually zipped). Can you try creating the artifact using every file you have (in zipped form) except for the one referenced in the error "T2163_8_L001_R1_001.fastq.gz" and see if that works? If that works, there is most likely something wrong with that specific file. There may or may not be anything you can do about this. You can try unzipping and rezipping that specific file, or, if you downloaded the file from the person who created it, you can try redownloading it.

1 Like

Thanks, I will give that a try. It might take me a few days to get this done but I will check in and let you know how it goes!

1 Like

I noticed that for the previous error, the file name that is referenced (T2163_12S_CGCAGTCTAT_L002_R1_001.fastq.gz) is not even the file name that I see, but it is a file name that appears in my sequencing report as the demultiplexed file name (sampleID_index_lane#_read#_001.fastq). I guess the demultiplexing process somehow deciphers that from the header? I removed that T2163 file and this time got the follow error with qiime tools validate:

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 409, in validate
result = qiime2.sdk.Result.load(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/result.py", line 66, in load
archiver = archive.Archiver.load(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 305, in load
rec = archive.mount(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 204, in mount
root = self.extract(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 215, in extract
zf.extract(name, path=str(filepath))
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1507, in extract
return self._extract_member(member, path, pwd)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1577, in _extract_member
with self.open(member, pwd=pwd) as source,
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1393, in open
raise BadZipFile("Truncated file header")
zipfile.BadZipFile: Truncated file header

There was a problem loading /data-store/iplant/home/wlandesman/RO3-Plate4/12S-Trimmed/demux-trimmed-rem2163-redo.qza as a QIIME 2 Result:

Truncated file header

See above for debug info.

This error is saying that it can't unzip your .qza because one of the files it's trying to extract from the .qza has a truncated header. When you say you removed the file, what exactly did you do? If you removed it from an existing .qza, I'd expect to see something like this, but if you imported a new .qza with all of your .fastq.gz files except for the one in discussion, I'm not sure what would cause this error aside from further data corruption.

I did the whole thing from scratch, including re-downloading the data. I used tar -xvf so that I ws working with the uncompressed files. Once again I got the same error. Here are the results of Qiime tools validate of the demultipled qza file. I guess this reconfirms that the file is corrupt, right?

Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 409, in validate
result = qiime2.sdk.Result.load(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/result.py", line 66, in load
archiver = archive.Archiver.load(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 305, in load
rec = archive.mount(path)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 204, in mount
root = self.extract(filepath)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/core/archive/archiver.py", line 215, in extract
zf.extract(name, path=str(filepath))
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1507, in extract
return self._extract_member(member, path, pwd)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 1579, in _extract_member
shutil.copyfileobj(source, target)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 872, in read
data = self._read1(n)
File "/opt/conda/envs/qiime2-2019.10/lib/python3.6/zipfile.py", line 948, in _read1
data = self._decompressor.decompress(data, n)
zlib.error: Error -3 while decompressing data: invalid stored block lengths

There was a problem loading /data-store/iplant/home/wlandesman/RO3-Plate4/Trimmed/demux-12s-r1-trimmed.qza as a QIIME 2 Result:

Error -3 while decompressing data: invalid stored block lengths

Yeah unfortunately it is really looking like that's the case.