DADA2 filtering stats

nerdynella · April 11, 2018, 4:56pm

Continuing the discussion from Summary statistics after dada2:

@ebolyen does this mean we need to rerun DADA2 in v 2017.12 or later to get the stats? Is there a way to still obtain these stats from outputs that were obtained using an earlier q2 version? DADA2 took forever to run on my data and we'd rather not try to rerun it to obtain the stats - please tell me there's another way

P.S. could you please update the tutorial page to show that this option is now available?

Nsa

nerdynella · April 11, 2018, 6:53pm

while i await a response, I have gone ahead and downloaded the freq/sample csv from my resulting DADA2 table.

I am multiplying the reported seq/sample by 2 (since my seqs were PE) to obtain the reads retained per sample after DADA2. is this approach correct? does the seq/sample reported in the csv file account for reads that were combined into unique sequences during the dereplication stage ?

nerdynella · April 12, 2018, 6:09pm

i think i got one answer.
using the moving pictures tutorial data, i ran DADA2 and compared the output obtained by passing --verbose to those in the resulting DADA2 table (freq/sample csv), and it appears that the data presented in the freq/sample csv is the output of the final DADA2 step - filtered, dereplicated, non-chimeric reads.

Now, --verbose only shows stats for the first 6 samples, how do you visualize stats for the remaining samples?

I am still hoping that there's a way to obtain these stats from the outputs of a previous DADA2 run (ours took about a week to complete on our largest high mem node) and I don't think I'll be lucky again to have access to this node for that long.

Thanks,
Nsa

ebolyen · April 16, 2018, 10:30pm

Hi @nerdynella!

Sorry for the very delayed response.

I'm afraid not.

The --verbose was very much a stopgap, we've got a better plan coming.

No, the frequencies provided for denoise-paired are what you should use. The PE data is merged into a single sequence which is then counted once each time it is observed.

Just to double check, what sequencing instrument did you use? On Illumina the primers aren't independent, so your forward and reverse reads represent the same "sampling event" on the instrument.

That will be coming soon once we've fixed this issue. The plan is to have that information be an artifact which you can visualize as metadata or use otherwise.

nerdynella · April 17, 2018, 3:49pm

that's what i thought, but since the last column (non-chimeric reads) is what is presented in the freq/sample csv i was hopping that it could somehow be used to calculate the filtered/denoised data

nerdynella · April 17, 2018, 3:50pm

yes, we used the Illumina HiSeq platform, and thanks for the heads up

nerdynella · April 17, 2018, 3:50pm

oh well...thanks anyway.

system · May 18, 2018, 9:50pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.