This was an issue that arose out of another problem I had, so I figured it would be best to create another ticket for this problem.
As the title states, I have FASTQ files that seem to have an extra character in the quality score line vs the sequence line. When I try to import my manifest file I getting the warning:
There was a problem importing seqs.tsv:
/var/folders/zc/csj0fb595j98l9vn8xybjdr40000gp/T/q2-SingleLanePerSampleSingleEndFastqDirFmt-lp2cavnw/LCl-85_212_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:
Quality score length doesn’t match sequence length for record beginning on line 5.
So, I checked the file manually & there was one extra character in the quality score (e.g. 316) vs the sequence (e.g. 315). It seems that this is an issue for some. This person had the same issue due to a joining of FASTA & quality scores with a converter. That got me to thinking about the way Windows & Mac (or Unix) code their line breaks/endings, as has been mentioned to me before.
Using BBEdit, I have gone through every fastq file in the folder holding my sequences & switched the line break types to Mac (CR) & made sure that each seq/fastq file had only four lines (all of them had an extra space that caused the file to have 5 lines, though only four lines had any data/info). I’m thankful I only have ~425 sequences.
Having done all of that, I am still getting this error. I have checked the number of characters in both the quality scores as well as the sequences themselves & they have an identical number of characters. So, I’m really not sure why this is still an issue. Would it help at all to change the line break type to Unix (LF)? I can’t imagine that would be the case, but I’m completely lost on this issue.
Thanks for your patience & wisdom!