Lost of data with dada2

jairideout · October 10, 2017, 8:03pm

Hi @blau! From the demux.qzv you posted for mock-13, it looks like the reverse reads are pretty poor quality across the length of the sequences. My guess is that many of the reverse reads are being discarded during the denoising step, and there's not much left over to merge.

You might try the following:

Process only the mock-13 forward reads and see how many sequences are discarded. @Nicholas_Bokulich has analyzed mock-13 forward-reads only with DADA2 and obtained reasonable results.
The first several positions in the mock-13 sequences are also low quality. You could try trimming off the first several positions from the forward and reverse reads using --p-trim-left-f and --p-trim-left-r. For example, you might try trimming the first 12 positions of the forward reads, and the first 5 positions of the reverse reads.
I think that sequencing artifacts have already been removed from the mock-13 reads. When you're processing your own data, make sure that all sequencing artifacts (e.g. adapters, primers, barcodes) have been removed from the sequences before processing with DADA2.
The DADA2 functionality available in qiime2 assumes you have amplicon data and that it's produced by an Illumina platform (i.e. for the error model DADA2 builds). mock-13 fits these requirements but you'll want to verify that with your own data set.
If you have ITS data, you may need to trim off the reverse primers from the forward reads if they're present (in addition to the other sequencing artifacts mentioned above).

Let us know how it goes! Sorry to not have a more clear/straightforward answer for you

Thanks @ebolyen and @Nicholas_Bokulich for these suggestions!