Mafft alignment of multiple datasets

ju4n_dc · April 10, 2019, 7:21pm

Hi all. I'm working on multiple datasets, each one from an independent MiSeq run (lets name them run1, run2, run3, etc), all of them using the same pair of primers targeting the 16S rRNA gene. I receive a new dataset regularly which has to be compared with the previous ones in order to evaluate the temporal changes in composition and diversity. I have standardized a work scheme in which each dataset is "cleaned" independently using dada2 (based on run quality), and then the resulting table and rep-seq artifacts are combined with those of previous runs, obtaining two newly merged table and rep-seq artifacts encompassing all my datasets. For this I use the qiime feature-table merge and merge-seqs commands:

qiime feature-table merge
--i-tables run1-table-dada2.qza
--i-tables run2-table-dada2.qza
--i-tables run3-table-dada2.qza
...
--i-tables runi-table-dada2.qza
--o-merged-table merged-table.qza

qiime feature-table merge-seqs
--i-data run1-rep-seqs-dada2.qza
--i-data run2-rep-seqs-dada2.qza
--i-data run3-rep-seqs-dada2.qza
...
--i-data runi-rep-seqs-dada2.qza
--o-merged-data merged-rep-seqs.qza

The problem is that as datasets accumulate, the time of subsequent analysis increases disproportionately, particularly in the steps of alignment with mafft and construction of the phylogenetic tree with fasttree. I would like to know if there is any faster or more efficient way to combine the data and obtain the merged tree and table artifacts. I also have some general questions about the process:

Does the merge-seqs step eliminates duplicate sequences present in more than one dataset?
Does the qiime alignment mafft module can take two different rep-seq artifacts as input? The first one already mafft-aligned/masked and a second one freshly obtained from a single-run dada2 cleaning step.
Would it be valid to analyze and compare diversity using separate trees for each run? Is there other way to merge trees coming from independent qiime2 analyses?

Apologies for the extension and thanks in advance.

Nicholas_Bokulich · April 11, 2019, 12:30pm

Indeed, since the alignment and tree is being built de novo each time.

I do not know if it is necessarily faster, but q2-fragment-insertion would provide a way to insert your sequences into an existing tree rather than building de novo each time.

Yes, as long as the sequence IDs are identical (they will be if they are ASVs, not if they are OTUs).

No

No, though you could perform non-phylogenetic analyses and avoid the need for a tree.

I hope that helps!

system · May 12, 2019, 6:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.