Does learning error rates step of DAD2 have to have all samples under study?

Dear Friends,

I ran DADA2 on Iontorrent sequenced single end cDNA data, and below is the denoising result log:

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.1 / RcppParallel: 4.4.2

  1. Filtering …
  2. Learning Error Rates
    279982055 total bases in 1191413 reads from 3 samples will be used for learning the error rates.
  3. Denoise samples …
  4. Remove chimeras (method = consensus)
  5. Report read numbers through the pipeline
  6. Write output
    Running external command line application(s). This may print messages to stdout and/or stderr.
    The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /data/data/NPRP-11-data/qiime-results/qiime-tmp/qiime2-archive-k1k9jgmh/6b3f03d0-f354-465d-935e-1bb073c642d9/data /data/data/NPRP-11-data/qiime-results/qiime-tmp/tmp8zc83ut3/output.tsv.biom /data/data/NPRP-11-data/qiime-results/qiime-tmp/tmp8zc83ut3/track.tsv /data/data/NPRP-11-data/qiime-results/qiime-tmp/tmp8zc83ut3 235 0 2.0 2 Inf consensus 1.0 20 1000000 NULL 16

Saved FeatureTable[Frequency] to: Ionexpress_1to11-dada2-rep-seqs-table2.qza
Saved FeatureData[Sequence] to: Ionexpress_1to11_dada2-rep-seqs2.qza
Saved SampleData[DADA2Stats] to: Ionexpress_1to11_dada2-rep-seqs-stats2.qza

As we see the learning error rates used 3 samples out of 10 samples I have, does it matter? Or I can move on with my analysis.
Thanks!

Hello Daniel,

When you ran this command, did you pass in just 3 of your samples or did you run dada2 on all 10 samples?

(It’s better to learn errors of your full data set, but dada2 is pretty flexible!)

Colin

Thanks! I passed all 10 samples. And, performed diversity analysis from the denoised result obtained with error rates calculated from 3 samples. As you aid, dada2 is flexible, should I leave he analysis as performed or try to get the error rates for all 10 samples? Thanks

I would run dada2 on all 10 samples at once! Then you can always use qiime feature-table filter-samples to use just keep the three samples later on.
https://docs.qiime2.org/2018.11/tutorials/filtering/#metadata-based-filtering

Thanks…but i think you misunderstood. I ran dada2 on all 10 samples demux file. In dada2 log it says error rate using 3 samples. Does it really matter if error rate is calculated using 3 samples? Thanks.

Got it!

Ok, I think everything is fine. Dada2 uses the first x samples (sorted) to build up the n-reads values and train the error models.

Because errors are similar on each run, this is the default and should work well. You are good to go :slight_smile:

Colin

Thanks @colinbrislawn!