Hello,
I've been attempting to trim primers and adapters from my fastq files without success. I've been using the following command on our server:
parallel --link --jobs 50 'cutadapt
--pair-filter any
--no-indels
--discard-untrimmed
-a TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
-A GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
-g CGAAATYGGTAGACGCTACG
-G CCDTYGAGTCTCTGCACCTATC
-o primer_trimmed_fastqs/{1/}
-p primer_trimmed_fastqs/{2/}
{1} {2}
> primer_trimmed_fastqs/{1/}_cutadapt_log.txt' ::: raw_reads/_R1.fastq.gz ::: raw_reads/_R2.fastq.gz
This has worked previously on other trnL sequences, but I consistently get the same error messages:
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
cutadapt: error: Reads are improperly paired. There are more reads in file 2 than in file 1.
...and so on
I'm not sure why I'm getting this error. I came across this post among some similar ones, which mentions header issues or read count discrepancies:
However, when inspecting my fastq files, I see the whitespace doesn't seem to be an issue. Here is an example R1:
@M01666:85:000000000-BNT9H:1:1101:15781:1556 1:N:0:199
CGAAATCGGTAGACGCTACGGACTTAATTGGATTGAGCCTTGGTATGGAAACCTACTAAGTGATAACTTTCAAATTCAGAGAAACCCTGGAATTAACAATGGGCAATCCTGAGCCAAATCCTGGGTTACGCGAACAAACCGGAGTTTAGAA
+
?ABBBBFCCBCCGGGGGGGGGFEGHGFHHHHHHHHHHHHHGHHHGHGHGHGHHHHHHHGHGHHHHHHHHHGFHHGHHGHHHFFHHHGHHHGHHHHHHGGHHGGHHEHHHHHHHHHHHHGHHHHHEGGHHGGGGGGGHHHGGGGGFHHFGGH
@M01666:85:000000000-BNT9H:1:1101:13846:1644 1:N:0:199
CGAAATTGGTAGACGCTACGGACTTAATTGGATTGGGCCTTGGTATGGAAACCTGCTGAGTGAGAACTTTCAAATTCAGAGAAACCCTGGAATTAATAAAAAGGGGCAATCCTGAGCCAAATCCTATTTTTCGAAAACAAAGGTTTAGAAA
And its R2 complement:
@M01666:85:000000000-BNT9H:1:1101:15781:1556 2:N:0:199
CCATTGAGTCTCTGCACCTATCCCTTTTTTTCTCGCTTTCTAAACTCCGGTTTGTTCGCGTAACCCAGGATTTGGCTCAGGATTGCCCATTGTTAATTCCAGGGTTTCTCTGAATTTGAAAGTTATCACTTAGTAGGTTTCCATACCAAGG
+
BBBBBFFFFFFFGGGGGGGGGGHHHHHHHGGHHHGGGHGHHHFHHHHHGGGGGGFHHGGGGGGGHHGHGHHHHHHHHHHHHGHHHHHHGHHHHHGFHHHHHHHCGHEHHHHGGGHHHHHHHGEDHHHHHHHHHHHHGHHHHHHHHHHHHFF
@M01666:85:000000000-BNT9H:1:1101:13846:1644 2:N:0:199
CCTTTGAGTCTCTGCACCTATCCCCTTTTTCACTTTCTAAACCTTTGTTTTCGAAAAATAGGATTTGGCTCAGGATTGCCCCTTTTTATTAATTCCAGGGTTTCTCTGAATTTGAAAGTTCTCACTCAGCAGGTTTCCATACCAAGGCCCA
Does anyone have any advice here? I'm just baffled that this has worked on other sequences and now I couldn't get this to work to save my life. I'm guessing there's a user error somewhere there but I simply can't find it.
Thanks,
Steve