I am trying to import metagenomic sequencing results from a single sample. I received a MAGs fasta file from the sequencing center, and moved it into a new folder. I tried importing the MAGs using the following command -
mosh tools cache-import
--cache ./cache
--key mags
--type "FeatureData[MAGs]"
--input-path data/mags
and got the following error message -
Semantic type FeatureData[MAGs] is invalid, either because it doesn't have a compatible directory format, or because it's not registered.
What is the correct way to import MAGs from a fasta file into the MOSHPIT pipeline?
I think the issue is that you want the type SampleData[MAGs] which has the s and does not yet presume to know what your features are, instead of FeatureData[MAG] which doesn't have the s and does presume each record is an interesting feature.
My assumption here is that you have a fasta file for each sample in your data/mags directory.
It is also likely that you will want to use:
--input-format 'MultiFASTADirectoryFormat'
Unless your sequencing provider happened to include a MANIFEST file which is unlikely.
I have a single sample, so initially I had just a single fasta file within my data/mags directory. I also tried creating a sub directory called sample1 and moving the fasta file there, but I get the same error. I searched through the help files but could not find an example for the structure of a 'MultiFASTADirectoryFormat' input format.
I took a quick peek at the code which raises that error, and it does seem like it would work with the sub-directory you tried, but did you import sample1/ or data/mags (which has sample1 inside). I think it's the latter one which will get through the error (it wants a directory with directories inside it).
I am attempting to import a directory. The directory has one directory in it named sample1. This directory has one file in it, it's a fasta file (tried both .fasta and .fa extensions). The error reads
Files should be organised in per-sample directories
@ebolyen@misialq
Problem solved!
Found a hidden .DS_Store file in the directory. Once I deleted it all was fine. It's a bit tricky since macOS keeps making these hidden files every time you open the folder.
I can't believe .DS_Store reared its head. A lot of our internal code ignores that (having learned our lesson long ago), but I guess the validate mixin above isn't aware.
We should probably provide an "iter" API internally that skips these. (We definitely ignore them when we move the data into a zip file.)