Combining raw data from multiple Illumina runs with overlapping barcodes to be used for QIIME2 implementation of DADA2

Hoomandv · January 23, 2018, 5:01pm

Hi,
I have raw fastq files (forward, reverse, barcodes) from multiple seq runs (5 runs) with overlapping barcodes (EMP protocol). I was wondering, after importing data to QIIME2 and demultiplexing them by sampleIDs, is there anyway that I can combine the demultiplexed files and then use the combined file to analyze using DADA2 (following Atacama soil microbiome tutorial)? Can we convert multiple demultiplexed files to fastq format and combine them using cat and then continue from there? With new approaches (DADA2 and Deblur), is it possible/preferred to pick the sequence variants of samples belonging to each run separately and then merging the final tables? considering that down the road we are going to perform some analyses at the OTU (sequence) level, merging different tables seems to be impractical.

Appreciate your advice on this matter!

jairideout · January 23, 2018, 10:42pm

Hi @Hoomandv! DADA2 works best when denoising each sequencing run separately. Since you have five sequencing runs, I'd recommend denoising each run separately and then merge the results into a single feature table that can be used for downstream analyses. Check out the FMT tutorial for an example of processing two sequencing runs with DADA2 and merging the results (the same workflow can be applied to your five sequencing runs). Carefully read through the tutorial, as it highlights some important considerations when denoising and merging multiple sequencing runs (e.g. you'll want to use the same DADA2 truncation and trimming parameters across sequencing runs).

system · February 24, 2018, 4:42am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.