Hey there @Jennifer_Fouquier!
Cool, I think you have a few options here, I will lay them out here, based around answers to the following question:
FeatureData[AlignedSequence] the right sematic type for this data?
(Why am I asking this question?)
I ask this question because as a programmer-turned-psuedo-biologist, I am not sure what the considerations of using this semantic type on RNA seqs would be, pinging @gregcaporaso & @Nicholas_Bokulich for their input.
You could defined a new format,
AlignedRNAFASTAFormat (and a corresponding directory format
AlignedRNASequencesDirectoryFormat, examples of this below). Since the sematic type
FeatureData[AlignedSequence] already exists, you don’t need to register that. Then, you can define a transformer that transforms
def _my_great_transformer(ff: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
# convert RNA to DNA, output is a new instance of AlignedDNAFASTAFormat
The way a user would use this new format wrt q2-ghosttree is they would import RNA seqs as AlignedRNAFASTAFormat:
qiime tools import \
--input-path my-ghosttree-RNA-seqs.fasta \
--type 'FeatureData[AlignedSequence]' \
--source-format AlignedRNAFASTAFormat \
The transformer will be invoked while importing, so the data will be converted from RNA to DNA while loading up. The user will not have access to RNA sequences in this artifact now, which may or may not be a problem, I just want you to be aware that by the time it makes it into an Artifact, it will be DNA.
As promised, some examples of how the different types, file formats, and directory formats fit together in the case of aligned DNA seqs:
AlignedDNASequencesDirectoryFormat, which in turn is the directory format representation of
Also, what I am proposing above with defining a new transformer, this is super similar to how the two BIOM formats work in QIIME 2 (V1.0.0 & V2.1.0). Any
FeatureTable[Frequency | RelativeFrequency | PresenceAbsence | Balance | Composition] created in QIIME 2 will be saved as
BIOMV210DirFmt, this is because the
artifact_format is specified as such. Then, there is a transformer defined from
BIOMV210Format. When you import specifying the source format, that transformer is invoked, which allows users to import other formats of feature tables.
In this case, it probably just makes more sense to create a new
Method on your plugin that will accept a
FeatureData[AlignedRNASequence] as input and will return a new
FeatureData[AlignedSequence] as output (the input would be an all new semantic type). An example of that would be creating a relative frequency feature table: the method accepts
FeatureTable[Frequency] as input and produces
FeatureTable[RelativeFrequency] as output. The method’s signature annotation looks like this:
def relative_frequency(table: biom.Table) -> biom.Table:
In your case, this would look something like this:
def convert_rna_to_dna(data: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
Yowza, that was a mouthful! Okay, I probably missed something, so if you have any questions, please don’t hesitate to ask them!