When we are creating a phylogenetic tree for use in phylogenetic diversity analysis, there is a step,
qiime alignment mask, where we mask unconserved or highly gapped columns from our multiple sequence alignment before building the tree. This is the only step in the tree creation workflow that has parameters the user can change.
I am curious whether anyone can illuminate how the default parameters were arrived at, and in what situtations we might want to change them? I don’t have much experience in alignments or phylogeny. I tried looking for publications on MAFFT but didn’t really find anything that helped me understand how to choose these parameters.
The default parameters used for the
alignment mask method were derived to match the “Lanemask”, which was used in QIIME 1 after running PyNAST (as well as other tools) for filtering very high entropy positions from the alignment before phylogenetic reconstruction. That was a static filter that was specific to one 16S alignment, so we derived these parameters to not be bound to using that alignment.
The approach that we take in the Moving Pictures tutorial is known to be a crude approach for developing this tree. We’re working on some approaches that are likely to be better, but we started here in QIIME 2 because this approach is not reference-based, and thus doesn’t require that we have a reference alignment or tree to work with. You might also be interested in trying out the q2-fragment-insertion plugin, which is currently available as a Community Plugin (i.e., one that install independently of the QIIME 2 core distribution). At some point we’ll likely transition to using this approach for phylogenetic reconstruction in cases where there is a reference tree available (as is the case for 16S). We will also likely add support for structure-based alignment of 16S using ssu-align to the
alignment package at some point.
Hope this helps!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.