I encountered a problem with running cutadapt. I am sorry if this problem has been solved before, but I could not find a solution.
I have run the code : parallel --xapply --jobs 30 'cutadapt --pair-filter any --no-indels --discard-untrimmed -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -o 01_primer_trimmed_fastqs/cutadapt_{1/} -p 01_primer_trimmed_fastqs/cutadapt_`basename {=s/_1/_2/;s/\.fastq.gz//=}.fastq.gz` {1} {=s/_1/_2/=} > 01_primer_trimmed_fastqs/{1/}_cutadapt_log.txt' ::: raw_data/*_1.fastq.gz
and then I have an error message as follows.
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/bin/cutadapt", line 10, in <module>
sys.exit(main_cli())
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/cutadapt/cli.py", line 1149, in main_cli
main(sys.argv[1:])
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/cutadapt/cli.py", line 1243, in main
stats = runner.run(pipeline, progress, outfiles)
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/cutadapt/runners.py", line 423, in run
(n, total1_bp, total2_bp) = pipeline.process_reads(
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/cutadapt/pipeline.py", line 137, in process_reads
for reads in self._reader:
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/dnaio/pairedend.py", line 96, in __iter__
for r1, r2 in zip(self.reader1, self.reader2):
File "src/dnaio/_core.pyx", line 581, in dnaio._core.FastqIter.__next__
File "src/dnaio/_core.pyx", line 512, in dnaio._core.FastqIter._read_into_buffer
File "/home/gihyeon/anaconda3/envs/qiime2-amplicon-2024.5/lib/python3.9/gzip.py", line 300, in read
return self._buffer.read(size)
igzip_lib.IsalError: Error -1 Invalid deflate block found
The weird point is that the output files were produced successfully. I also checked the log file and found no error.
'Deflate' is the compression method used in .gz files.
This means that one of your fastq.gz files is probably corrupted during downloading.
Redownloading it should fix this issue!
P.S. I've moved this question to 'other bioinformatic tools' as it looks like you are using cutadapt and GNU parallel.
Have you considered importing your data and using the q2-cutadapt plugin? :qiime2:
I appreciate your really fast answer!
(I would amend that I used cutadapt package installed in qiime2 environment, not q2-cutadapt plugin.)
Unfortunately, if my fastq files were corrupted while downloading, there is no way to solve it
I am trying to run qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./list.csv --output-path 01_importing_file/importing_data --input-format PairedEndFastqManifestPhred33.
list.csv files includes the path of trimmed files which were produced by previous code (maybe damaged fastq.gz files).
By the way, Can I use q2-cutadapt plugin with GNU parallel for shorten working time? or any other method you recommend?
Lots of the Qiime2 plugins include built-in methods to process things in parallel, including q2-cutadapt.
>Usage: qiime cutadapt trim-paired [OPTIONS]
Search demultiplexed paired-end sequences for adapters and remove them. The
parameter descriptions in this method are adapted from the official cutadapt
docs - please see those docs at https://cutadapt.readthedocs.io for complete
details.
Inputs:
--i-demultiplexed-sequences ARTIFACT
SampleData[PairedEndSequencesWithQuality]
The paired-end sequences to be trimmed. [required]
Parameters:
--p-cores NTHREADS Number of CPU cores to use. [default: 1]
Good luck sorting out the corrupted file. I've had luck with BBTools repair.sh in the past. That's another 3rd party tool!