Output stats for DADA2

JenKelly · April 11, 2018, 11:52am

Dear Qiime2 developers,

I have just begun working through the Moving Pictures tutorial and after running DADA2 I want to know how many sequences were input, filtered, denoised and chimeric for each sample. I used the "verbose" option which output some information to the screen, however only for the following samples?..

                           input filtered denoised non-chimeric

L1S105_9_L001_R1_001.fastq.gz 11340 8571 8571 7865
L1S140_6_L001_R1_001.fastq.gz 9736 7676 7676 7245
L1S208_10_L001_R1_001.fastq.gz 11335 9260 9260 8270
L1S257_11_L001_R1_001.fastq.gz 8216 6705 6705 6486
L1S281_5_L001_R1_001.fastq.gz 8904 7066 7066 6755
L1S57_13_L001_R1_001.fastq.gz 11750 9298 9298 8756

Do you know why there is only info for a few samples? Is there a log file which is created which tracks samples through each step?

Also after carrying out the DADA2 denoising, I am obviously interested to look at the ASV table directly and quickly see how many features per sample and the abundance of each feature in each sample. There seems to be no way to simply open this table?? I have generated the summary files however the information they provide is limited. Is the only way to view this table to first export to biom then convert to text?

Thank you for your help,
Jen

ebolyen · April 16, 2018, 10:08pm

Hi @JenKelly!

Sorry for the very delayed response on my part.

It looks like we're using R's head function to look at only the first few samples. There isn't a log yet, but we have an issue tentatively planned for this upcoming release to turn that information into an artifact (viewable as metadata) letting you track this for each sample (and even use it downstream if you needed).

Typically there is so much information, that looking at it that way isn't the most informative since it contains hundreds of thousands (to millions) of values. If you want to look at it in Excel or similar, then you will need to export and use biom to convert like you described. We do have plans to make exporting to other formats simpler (skipping the biom step), but that's probably not going to make it into this release.

JenKelly · April 17, 2018, 1:44pm

Hi @ebolyen

Thanks for the reply!

So it is correct that currently there is no way of determining how many reads were removed due to low quality, phiX and chimericity during the DADA2 denoising step?

Thanks again,
Jen

ebolyen · April 17, 2018, 8:47pm

Not at this time, once we return the artifact containing these stats, you should be able to sum up all the columns however you please.

system · May 19, 2018, 2:47am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.