Hi all. I'm working on multiple datasets, each one from an independent MiSeq run (lets name them run1, run2, run3, etc), all of them using the same pair of primers targeting the 16S rRNA gene. I receive a new dataset regularly which has to be compared with the previous ones in order to evaluate the temporal changes in composition and diversity. I have standardized a work scheme in which each dataset is "cleaned" independently using dada2 (based on run quality), and then the resulting table and rep-seq artifacts are combined with those of previous runs, obtaining two newly merged table and rep-seq artifacts encompassing all my datasets. For this I use the qiime feature-table merge and merge-seqs commands:
qiime feature-table merge
--i-tables run1-table-dada2.qza
--i-tables run2-table-dada2.qza
--i-tables run3-table-dada2.qza
...
--i-tables runi-table-dada2.qza
--o-merged-table merged-table.qza
qiime feature-table merge-seqs
--i-data run1-rep-seqs-dada2.qza
--i-data run2-rep-seqs-dada2.qza
--i-data run3-rep-seqs-dada2.qza
...
--i-data runi-rep-seqs-dada2.qza
--o-merged-data merged-rep-seqs.qza
The problem is that as datasets accumulate, the time of subsequent analysis increases disproportionately, particularly in the steps of alignment with mafft and construction of the phylogenetic tree with fasttree. I would like to know if there is any faster or more efficient way to combine the data and obtain the merged tree and table artifacts. I also have some general questions about the process:
-
Does the merge-seqs step eliminates duplicate sequences present in more than one dataset?
-
Does the qiime alignment mafft module can take two different rep-seq artifacts as input? The first one already mafft-aligned/masked and a second one freshly obtained from a single-run dada2 cleaning step.
-
Would it be valid to analyze and compare diversity using separate trees for each run? Is there other way to merge trees coming from independent qiime2 analyses?
Apologies for the extension and thanks in advance.