importing MAGs into moshpit

Error while importing MAGs into MOSHPIT

I am trying to import metagenomic sequencing results from a single sample. I received a MAGs fasta file from the sequencing center, and moved it into a new folder. I tried importing the MAGs using the following command -
mosh tools cache-import
--cache ./cache
--key mags
--type "FeatureData[MAGs]"
--input-path data/mags

and got the following error message -

Semantic type FeatureData[MAGs] is invalid, either because it doesn't have a compatible directory format, or because it's not registered.

What is the correct way to import MAGs from a fasta file into the MOSHPIT pipeline?

Thanks!

Hi @shira,

I think the issue is that you want the type SampleData[MAGs] which has the s and does not yet presume to know what your features are, instead of FeatureData[MAG] which doesn't have the s and does presume each record is an interesting feature.

My assumption here is that you have a fasta file for each sample in your data/mags directory.
It is also likely that you will want to use:

--input-format 'MultiFASTADirectoryFormat'

Unless your sequencing provider happened to include a MANIFEST file which is unlikely.

Thanks @ebolyen
I tried your suggestion and got the following error

Files should be organised in per-sample directories

This is the code I ran:

qiime tools cache-import
--type 'SampleData[MAGs]'
--input-path data/mags
--cache ./cache
--input-format 'MultiFASTADirectoryFormat'
--key mags

I have a single sample, so initially I had just a single fasta file within my data/mags directory. I also tried creating a sub directory called sample1 and moving the fasta file there, but I get the same error. I searched through the help files but could not find an example for the structure of a 'MultiFASTADirectoryFormat' input format.

Hi @shira,

I took a quick peek at the code which raises that error, and it does seem like it would work with the sub-directory you tried, but did you import sample1/ or data/mags (which has sample1 inside). I think it's the latter one which will get through the error (it wants a directory with directories inside it).

Hi @ebolyen,

I am attempting to import a directory. The directory has one directory in it named sample1. This directory has one file in it, it's a fasta file (tried both .fasta and .fa extensions). The error reads

Files should be organised in per-sample directories

Thanks for confirming @shira!

@misialq, I can't really work out why this is happening. I think the above process should avoid this error:

But clearly that isn't happening. Any ideas?

@ebolyen @misialq
Problem solved! :grinning_face:
Found a hidden .DS_Store file in the directory. Once I deleted it all was fine. It's a bit tricky since macOS keeps making these hidden files every time you open the folder.

3 Likes

Holy moly, great job @shira!.

I can't believe .DS_Store reared its head. A lot of our internal code ignores that (having learned our lesson long ago), but I guess the validate mixin above isn't aware.

We should probably provide an "iter" API internally that skips these. (We definitely ignore them when we move the data into a zip file.)

1 Like

I've made an issue to fix this: Some validators use `self.path.iterdir()` which trips over `.DS_store` · Issue #362 · qiime2/q2-types · GitHub

Thanks again @shira !

3 Likes