I would be very interested in this! I have per-feature rep reqs files I can use for any alpha testing. I'm sure I can coerce DNA amplicons "off the sequencer" to amino acids sequences if needed also for any testing of clustering pipelines, etc. Feel free to communicate with me in a private message and I can provide my email address. I will try the q2-protein-pca plug in the mean time.
Sure! Yes, I have a database of amino acids coding for a subunit of the enzyme involved in methane generation from archaea (mcrA). Similar to @ahfitzpa, I am following literature suggesting that DNA coding for a non-universal protein should be classified as amino acids since the amino acid sequences evolve faster than methanogenic 16S genes, for example. I am still working on a thorough comparison studying this gene as DNA or amino acid sequence (@Nicholas_Bokulich, I, and another had a good chat about this previously on a qiime2 forum post - Will qiime2 support functional gene analysis in the future?).
I currently use the Fungene Pipeline which has a step to convert DNA amplicons to amino acids, correcting for frameshifts in the reading frame, and provides nearest-neighbor classification. This seems robust... but I'd be interest in q2-classifier for increased confidence when training a classifier on the amplicon region the primers cover. Another reason is that nearest-neighbor classification always classified every rep seq to the full ranking of the database (to genus level). I like the naive-bayes classifier because it seems to stop at distinct ranks, if it has no confidence it can go deeper taxonomically.
Other q2-plugins in which accepting amino acid sequences could be advantageous might be q2-diversity, longitudinal (and associated plugins), Vsearch, ANCOM, and q2-phylogeny. Of course, all aspects of qiime2 would be great, but here are a few examples without providing a laundry list
Thanks q2-team for your assistance. And welcome @ahfitzpa to the conversation and to qiime2 forum