Merging outputs from Q2-DADA2

I have split my HiSeq data into smaller chunks to run as a batch/array job to save time.
Before I proceed, will it be possible to merge each generated repseq.qza and table.qza into one repseq.qza and table.qza respectively or alternatively extract and merge contents of each generated .qza file?
Thanks,
Nsa

1 Like

Hi @nerdynella!

Yes!! Check out the "FMT Tutorial" on the docs, particularly this section. One thing to note is that if you split your data into say 4 groups, you would need to run these commands 3 (4-1) times:

$ qiime feature-table merge \
  --i-table1 table-1.qza \
  --i-table2 table-2.qza \
  --o-merged-table merged-table.qza
$ qiime feature-table merge \
  --i-table1 merged-table.qza \
  --i-table2 table-3.qza \
  --o-merged-table merged-table.qza
$ qiime feature-table merge \
  --i-table1 merged-table.qza \
  --i-table2 table-4.qza \
  --o-merged-table merged-table.qza

Hope that helps!

EDIT: There is an open issue on the bug tracker to support variadic inputs, which would theoretically allow a method like feature-table merge to merge multiple tables at the same time.


There might be some implications to splitting these data when it comes to denoising, I will ping @benjjneb (DADA2) and @wasade (deblur) to see if they have anything to say on the matter. Thanks!

1 Like

When using exact sequence variant methods it is fine to process subsets of the samples independently.

On the dada2 side you want each subset to have enough reads to be able to get the error rates right, but there are way more than enough reads in a Hiseq run for that purpose so splitting is AOK.

excellent! Thank you @thermokarst and @benjjneb for your quick response.
Cheers,
Nsa

Hey @nerdynella, you do not need to split HiSeq data for q2-deblur. It splits internally and processes each sample using a static error model that is not subject to run-to-run variation. For context, Deblur on the American Gut dataset, which spans 15,000 samples from around 50 MiSeq runs, takes 8 hours using 10 cores.

1 Like

Thank you @wasade. How do I handle PE reads with Deblur?

Deblur is agnostic; join upstream.

It is not clear if joining reads is a benefit or detriment in amplicon studies, and I’m not aware of an independent benchmarking study which has explored this. Recall that genus level differentiation using naive Bayes is not great even with longer reads Wang et al 2007. And, it greatly increases the number of errors as reverse is lower quality, it reduces the number of reads per sample due to quality filtering, and misassembly is possible.

Thank you @wasade I’ll explore Deblur using my fwrd reads and compare the results to those from DADA2 PE.
cheers,
Nsa

1 Like

@wasade please how do i specify number of threads using Q2-Deblur? or is it set to automatically utilize all threads?
Thanks,
Nsa

You can specify the number of threads/jobs with --p-jobs-to-start.

Best,
Daniel

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

In the new QIIME 2 2017.12 release, feature-table merge can now accept arbitrarily many tables to merge!

1 Like