Creating a variable directory format that only collects file if it exists

Hello,

I am trying to change a directory format to only include a file if it exists at all.

Here is the current class:

class EnrichedPeptideDirFmt(model.DirectoryFormat):
    pairwise = model.FileCollection(
        r".+_+.+\.txt",
        format=PeptideIDListFmt
    )
    @pairwise.set_path_maker
    def pairwise_pathmaker(self, comparisons, suffix):
        return f'{"~".join(comparisons)}_{suffix}.txt'
    
    failures = model.File(
        "failedEnrichment.txt",
        format=EnrichmentFailureFmt
    )

In the plugin itself, when I create this Artifact:

enriched_dir = ctx.make_artifact(
        type="PairwiseEnrichment",
        view=enriched_dir_filepath,
        view_type=EnrichedPeptideDirFmt
    )

An error occurs when failedEnrichment.txt is not in the directory, but I would like this to be optional.

Hi @Sean_Golez,

It sounds like you're asking for this to be an optional output - this isn't currently supported functionality in QIIME 2, but we are actively in discussion regarding adding optional outputs in the very near future! In the meantime, you could do something like write an empty enrichment file to this output directory if it doesn't already exist.

Hope this helps! Cheers :lizard:

2 Likes

@lizgehret Sorry, I was more so referring to an optional file in the input directory. The current input directory may or may not have a 'failedEnrichment.txt' file in it, but it is not used either way. But, when creating the directory format, this file needs to be collected, otherwise this error occurs if the file is present:

Unrecognized file (7Z-HDI90_2CS_75000raw/failedEnrichment.txt) for EnrichedPeptideDirFmt

So, is there a way for this file to just to be ignored when collecting all of the other files?

Hi @Sean_Golez,

Ah, I see - thanks for providing that clarification!

So there isn't a way to create an optional file (i.e. one that may or may not exist) within a directory format - the nature of our type/format system is to ensure consistent structure for specified inputs and outputs. Could you provide me with more information on what this input file is, and what you're using it for? It may make more sense to just include it as a separate input type entirely.

Cheers :lizard:

3 Likes

The software used to generate the enriched peptide directory includes this file in the directory, it includes information about replicates with no enriched peptides. This file is not used at all when processing the data.
The main reason why it needs to be optional is because there are times when some other enriched peptide directory that does not have this file needs to be inputted (and an empty failedEnrichment.txt file is needed).
So, it would be best for this file to just be ignored. Is that currently possible?

1 Like

Hi @Sean_Golez,

Thanks for providing that context! Okay so since this file isn't needed for any data analysis/processing, it might be best to just remove the file if it's present. I think it might be possible to write this into the directory format if this particular file can be consistently removed if found in a predictable way. Does this file always have the same filename/extension?

Yes, how would I go about removing the file? I noticed that I am not able access the filepath from the directory format subclass.

Hey @Sean_Golez,

Do you have an example of your input directory and your source code available (if you're willing to share it)? I think it might be more helpful to see an example if how you're trying to use this format, and we can hopefully figure something out from there. Feel free to DM me the details as well if you'd rather not share them publicly.

Cheers :lizard: