Splitting datasets for processing

Jenna_Shelton · September 1, 2017, 1:32am

If I were to break down my re-seqs.qza file (I used the ManifestPhred33 method of importing the 145 previously demultiplexed samples), to say, process 30 samples or so at a time using QIIME2, this would likely fix any memory problems I am having, but would there be any processing consequences? What I mean is, would there be a difference in the resulting OTU table when processing chunks of samples and combining the OTU tables in R downstream as opposed to processing all of the samples in one go like I am trying to do here? Do you think there would be any difference in taxonomic classification or anything, given I use the same parameters and such throughout the pipeline (trim to the same lengths, etc.)?

Thank you!

thermokarst · September 2, 2017, 12:21am

Hi @Jenna_Shelton!

Please take a peek at this thread! Long story short, if you are using DADA2, splitting your samples up can cause issues with the DADA2 error model, and, from what I understand, doesn't really help out with reducing the necessary computational resources (memory, CPU, etc.).

Thanks!