Hey @stanislav.iablokov!
Looking through your logs (thanks for those!!):
Snippet from that file:
2) Learning Error Rates
Initializing error rates to maximum possible estimate.
Error rates could not be estimated.
Error in err[c(1, 6, 11, 16), ] <- 1 :
incorrect number of subscripts on matrix
Calls: dada
Execution halted
I've never seen this before. @benjjneb, do you know what causes this?
Looking at your sample fastq
file, every quality score is I
which is pretty unusual, it doesn't look like you actually have any quality information for DADA2 to use (which is probably why we get that error).
Deblur doesn't use the quality score to denoise the sequence data, so your quality scores being all written as the letter I
don't bother it.
The reason the files are so small is because we find the representative sequences and then count the number of times that each occurs in every sample in the feature-table. That way we don't have to hold onto a bunch of reads that say the same thing (making it much smaller than your raw data).
I think both tools can make strong arguments for their use. Here are their respective papers:
- DADA2: High-resolution sample inference from Illumina amplicon data.
- Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns.
Let me know if that helps!