I'm trying to import around 200 files with Phred64 scoring offset and after hours of running I got the following error:
"Compressed file ended before the end-of-stream marker was reached".
I know it is most probably because of a corrupted file, however, I don't know how to identify the faulty file. I could potentially divide the files into groups and see when the error happens, but since the scoring is Phred64 and the number of the files is large, it'll take a lot of time. I was wondering if there's a better way to solve this issue?
Hopefully there is a detailed log file from the import script that will list which file was faulty during decompression. Can you post your full log files?
There also might be a different way to import all these files that's more efficient or will give you a more detailed log file. Can you post the full command you ran?
Navigate to your sequences directory and run the following (this assumes all files are in the same directory):
for f in *.fastq.gz; do gzip -tv $f; done
This should print something like the following:
L1S105_9_L001_R1_001.fastq.gz: OK
L1S140_6_L001_R1_001.fastq.gz: OK
L1S208_10_L001_R1_001.fastq.gz: OK
L1S257_11_L001_R1_001.fastq.gz: OK
L1S281_5_L001_R1_001.fastq.gz: OK
L1S57_13_L001_R1_001.fastq.gz: OK
L1S76_12_L001_R1_001.fastq.gz: OK
L1S8_8_L001_R1_001.fastq.gz: OK
L2S155_25_L001_R1_001.fastq.gz: OK
L2S175_27_L001_R1_001.fastq.gz: OK
L2S204_1_L001_R1_001.fastq.gz: OK
L2S222_23_L001_R1_001.fastq.gz: OK
Hopefully we will see which file isn't okay.
Thanks!