Long time browser, first time poster. Thanks in advance for any input.
I have some sequence data I am trying to improve analysis on as there has always been issues with its processing.
Unfortunately there is no details of sequencing machine, versions of software etc.
I import at paired-end Casava 1.8, as the other batches in the project are that and it accepts it fine.
When imported into QIIME2, on first glance, the quality scores are workable
However, on the reverse reads at position 22,24,27 on all reads, and some reads at 33,37, have rock bottom quality scores (below). This obviously has dada2 running issues with the default quality truncation, even more so as I am trying to automate pipelines across multiple sequencing runs. Is this an import error with the Phred encoding? Or has anyone seen similar from sequencing runs?
Welcome to the forum and thanks so much for your patience!
My initial thoughts are that it's difficult to nail down where these poor quality scores are coming from without having any information regarding your sequencing provider, etc. Depending on what your analysis pipeline looks like, one option would be to use DADA2 and trim at ~37 bp to remove those poor quality scores. This could be a good solution if this is the only run where you are seeing this issue.
Alternatively, if you are seeing a lot of variance in your quality scores across multiple runs, you might consider using deblur instead of DADA2 for your denoising, since it uses a static error model (as opposed to DADA2, which uses the quality scores to inform the model).
Here are a couple of good forum posts that discuss using DADA2 vs. deblur for different situations, and this might help clarify things further:
QIIMing back in here (pun intended!) after getting some input from @Nicholas_Bokulich - trimming the low-quality bases from the 5' ends should be a good way to move forward. Often times quality scores are low at the 5' end if primers are still attached, which could explain the unusual profile that you're seeing here.
Thank you for this (also, pun excused).
Oops, the figures uploaded are indeed untrimmed. The primers unfortunately are 16 and 20 bp respectively, so it only gets rid of the usual initial low quality, still leaving the '2's in the reverse read as they only start at pos. 22.
I will trim appropriately for this run, and hope it doesn't happen again. But I will try pursue the sequence provider and update with a solution if I hear back and they found and resolved the issue though.