Import amino acid sequences?

gregcaporaso · January 24, 2018, 12:07pm

Hi @steff1088,
Good question, thanks for bringing this up. You and @colinbrislawn are correct - at the moment we don't support sequences of amino acids, but it is something that we could support. Out of curiosity, would you mind describing your use case? I'm going to assume in this message that all of your amino acid sequences are homologous with one another (i.e., they are sequences of the same protein) - if that is not the case, let me know.

The reason that we currently don't allow sequences of amino acids is that some actions in QIIME 2 assume that the sequence is nucleotide. This includes methods in the q2-vsearch, q2-alignment, q2-phylogeny, and feature-classifier plugin, and the feature-table tabulate-seqs visualizer (which creates links to BLAST sequences against a nucleotide database). To support this, we would need a new semantic type to indicate that the sequences are of amino acids (probably something like FeatureData[ProteinSequence]) so actions that assume a nucleotide sequence don't mistakenly operate on an amino acid sequence, and then of course methods to operate on the type.

I think you and @colinbrislawn have mostly made it to this point, but what I would recommend for now is building your tree and feature table outside of QIIME 2, and then importing both of those. You can import your tree as illustrated here (this is for an unrooted tree - you can use qiime phylogeny midpoint-root to root that tree, or if your tree is already rooted you can import it with --type 'Phylogeny[Rooted]'. Any of the methods downstream of here should work fine with your data.