Hey there @Jennifer_Fouquier!
Cool, I think you have a few options here, I will lay them out here, based around answers to the following question:
Is FeatureData[AlignedSequence]
the right sematic type for this data?
(Why am I asking this question?)
I ask this question because as a programmer-turned-psuedo-biologist, I am not sure what the considerations of using this semantic type on RNA seqs would be, pinging @gregcaporaso & @Nicholas_Bokulich for their input.
If "Yes"...
You could defined a new format, AlignedRNAFASTAFormat
(and a corresponding directory format AlignedRNASequencesDirectoryFormat
, examples of this below). Since the sematic type FeatureData[AlignedSequence]
already exists, you don't need to register that. Then, you can define a transformer that transforms AlignedRNAFASTAFormat
to AlignedDNAFASTAFormat
:
def _my_great_transformer(ff: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
# convert RNA to DNA, output is a new instance of AlignedDNAFASTAFormat
The way a user would use this new format wrt q2-ghosttree is they would import RNA seqs as AlignedRNAFASTAFormat:
qiime tools import \
--input-path my-ghosttree-RNA-seqs.fasta \
--type 'FeatureData[AlignedSequence]' \
--source-format AlignedRNAFASTAFormat \
--output-path my-ghosttree-DNA-seqs.qza
The transformer will be invoked while importing, so the data will be converted from RNA to DNA while loading up. The user will not have access to RNA sequences in this artifact now, which may or may not be a problem, I just want you to be aware that by the time it makes it into an Artifact, it will be DNA.
As promised, some examples of how the different types, file formats, and directory formats fit together in the case of aligned DNA seqs:
The artifact_format
for FeatureData[AlignedSequence]
is AlignedDNASequencesDirectoryFormat
, which in turn is the directory format representation of AlignedDNAFASTAFormat
Also, what I am proposing above with defining a new transformer, this is super similar to how the two BIOM formats work in QIIME 2 (V1.0.0 & V2.1.0). Any FeatureTable[Frequency | RelativeFrequency | PresenceAbsence | Balance | Composition]
created in QIIME 2 will be saved as BIOMV210DirFmt
, this is because the artifact_format
is specified as such. Then, there is a transformer defined from BIOMV100Format
to BIOMV210Format
. When you import specifying the source format, that transformer is invoked, which allows users to import other formats of feature tables.
If "No"...
In this case, it probably just makes more sense to create a new Method
on your plugin that will accept a FeatureData[AlignedRNASequence]
as input and will return a new FeatureData[AlignedSequence]
as output (the input would be an all new semantic type). An example of that would be creating a relative frequency feature table: the method accepts FeatureTable[Frequency]
as input and produces FeatureTable[RelativeFrequency]
as output. The method's signature annotation looks like this:
def relative_frequency(table: biom.Table) -> biom.Table:
# etc
In your case, this would look something like this:
def convert_rna_to_dna(data: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
# etc
Conclusion
Yowza, that was a mouthful! Okay, I probably missed something, so if you have any questions, please don't hesitate to ask them!