Hi, sir,
If possible, I want to ask you another question. It's really interesting to see a quality score of sequence base like this, and I never saw that before. I want to know what problem with it, especially reverse, why it does not have quality before 100bp? Although it hasn't removed barcode and primer, which is total 26bp.
Please always open unrelated questions in new topics. That will allow you to give them meaningful titles, and help others learn from your questions in the future.
If you click and drag over one section of that interactive plot, it will zoom in, and you'll be able to see that it is actually a box plot:
It looks like you might be using either a sequencer or data post-processing that "bins" quality scores. I'm guessing NovaSeq? Binning systems make it much more likely that regions with high-quality scores will have very, very tiny boxes that could be misinterpreted as "no q score". To the contrary, your plot is saying that nearly everything at those positions clusters tightly at q=37.
Note that DADA2 doesn't yet officially support denoising of NovaSeq data, so if that's what you're using, you may need to consider a workaround or another approach to denoising/merging. The developers of DADA2 are also looking for NovaSeq data from known communities they can use to add this functionality. If you have data that sounds like that and you want to support their efforts, consider getting in touch with them directly.
Hi @ChrisKeefe , first I would like to say that I am new in the area of microbiome analysis, this is my first analysis and I am loving and falling in love with it.
Now let's go to the question, I think it's similar to the colleague here on the topic. My data is from Illumina Miseq, I used the 515F and 806F primers.
I ran DADA2 and removed the primers using:
--p-trim-left-f 19
--p-trim-left-r 20
that correspond to the length of the primers.
But I couldn’t identify by the quality graph if I need to use:
--p-trunc-len-f
--p-trunc-len-r
to do some kind of quality control ...
If this is MiSeq data, then I would say this looks like it has already had some form of quality control applied to it, which will be very problematic when using a tool like q2-dada2, which relies on the "natural" signal/error profile to be present. If you don't know what forms of pre-screening were applied to these data then I strongly suggest you reach out to the sequencing center, or wherever you received these data from, to ask about what preliminary steps were performed (and see if you can get the "raw" data, if you plan to use q2-dada2).