Question about analyzing many samples


#1

I have 180 samples that were analyzed by a bioinformatics center using Qiime 1.9 about a year and a half ago (V3-V4, MiSeq, 2x300). I am interested in learning more about analyzing the data myself, comparing Qiime2 results to the older version, and potentially developing an undergraduate course using this data set, so I am running through Qiime 2 using a virtual box. I can get to the denoising step with the complete data set (importing and primer trimming) with no issues, but my limited RAM can’t handle that many samples in dada2. I have found that I can process 12 samples at a time in dada2, then merge the results for downstream analysis.

My question is how I should determine my p-trunc-len values for each dada2 run. Would it be better to look at the quality plots from the entire pooled data set (180 samples) to determine forward and reverse truncations for dada2 or is it better to look at the quality plots for just the 12 samples that will be grouped in the dada2 process?

From the few runs of 12 samples I have completed, it seems like the truncation values for each run are about the same, which makes me think that there isn’t a difference between a pooled vs individual group approach.


(Justine) #2

Hi @Matt1,

If you’re analyzing your data in parallel, you want to keep your parameters constant across all datasets. So, your trim length should be the same. If you’re doing DADA2, you may also want to consider how you’re handling chimeras. I believe DADA2 defines your chimeras by consensus by looking at the group of samples you pick, and then analysing the data. With that run size, you may have funky things happening with chimera handling. (You may find deblur more robust, here, because it uses different filtering criteria.)

Best,
Justine


#3

Justine,

Thank you the response and suggestions. After beating my head against the wall for awhile and then realizing the sequencing center gave me the wrong primer sequences, I have gotten my sequences to the point where the entire set can be processed through Dada2 within a few hours. Amazing how much difference proper primer removal can make!

Matt