Hi,
I'm experiencing similar issues as this and this post: corrupted files.
I can import my .fastq.gz files in qiime, but when I run qiime demux summarize \ --i-data raw_reads_.qza \ --o-visualization raw_reads_SSD.qzv
I get error:
raceback (most recent call last):
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/util.py", line 492, in _load_input_file
artifact = qiime2.sdk.Result.load(fp)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 80, in load
archiver = archive.Archiver.load(filepath)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 367, in load
archive.mount(path)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 198, in mount
self.extract(filepath)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 212, in extract
zf.extract(name, path=str(filepath.parent))
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/zipfile.py", line 1630, in extract
return self._extract_member(member, path, pwd)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/zipfile.py", line 1702, in _extract_member
shutil.copyfileobj(source, target)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/shutil.py", line 205, in copyfileobj
buf = fsrc_read(length)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/zipfile.py", line 940, in read
data = self._read1(n)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/zipfile.py", line 1030, in _read1
self._update_crc(data)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/zipfile.py", line 958, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'ce11c3ae-864f-49af-a884-274406195176/data/804_802_L001_R1_001.fastq.gz'The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/click/type.py", line 116, in _convert_input
result, error = q2cli.util._load_input(value)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/util.py", line 397, in _load_input
artifact, error = _load_input_file(fp)
File "/home/thebiobeast/miniforge3/envs/qiime2-amplicon-2024.2/lib/python3.8/site-packages/q2cli/util.py", line 498, in _load_input_file
raise ValueError(
ValueError: It looks like you have an Artifact but are missing the plugin(s) necessary to load it. Artifact has type 'SampleData[PairedEndSequencesWithQuality]' and format 'SingleLanePerSamplePairedEndFastqDirFmt'There was a problem loading 'raw_reads_SSD.qza' as an artifact:
It looks like you have an Artifact but are missing the plugin(s) necessary to load it. Artifact has type 'SampleData[PairedEndSequencesWithQuality]' and format 'SingleLanePerSamplePairedEndFastqDirFmt'
It appears that some of the .fastq.gz are corrupted.
I ran unzip -t raw_reads_.qza
and indeed two times I get:
ce11c3ae-864f-49af-a884-274406195176/data/616_1498_L001_R2_001.fastq.gz bad CRC b506cd09 (should be 43f5c748)
&
ce11c3ae-864f-49af-a884-274406195176/data/804_802_L001_R1_001.fastq.gz bad CRC 7f231ee9 (should be 3fce1e29)
all other ~ 1700 lines were OK.
However these file names (e.g. 616_1498_L001_R2_001.fastq.gz) are not the files names I actually have.
How can I find out what my file names are for these corrupted .fa.gz files, so I can remove them?