Moshpit import error

Hi,

I have problem importing the reference fasta files in this step:

mosh tools cache-import 
--cache ./cache 
--key reference_seqs 
--type "FeatureData[Sequence]" 
--input-path ./reference_seqs.fasta 

My work is with mouse so I used https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M38/GRCm39.primary_assembly.genome.fa.gz as my input. But I got this error consistently (I’m not using cache so it just outputs to a .qza file):

qiime tools import \
    --input-path GRCm39.primary_assembly.genome.fa.gz \
    --output-path reference_seqs.qza \
    --type 'FeatureData[Sequence]'

There was a problem importing GRCm39.primary_assembly.genome.fa.gz:

  GRCm39.primary_assembly.genome.fa.gz is not a(n) DNAFASTAFormat file:

  First line of file is not a valid description. Descriptions must start with '>'

Also I’m using primary assembly version of the reference genome and I’m wondering if there’s any recommendation on which version I should use.

Welcome to the forum!

Did you try to unzip the fasta file from the archive, so instead of the “.fa.gz” it would be “.fasta”?

Hey, I just gave it another shot after unzipping the fasta file, and it actually worked this time! For my second question, in moshpit or shotgun metagenomics, do you have any advice on whether to use the primary assembly or all the regions?

2 Likes

Hi @Brycealong

I am not sure that I can answer this question with 100% confidence, and since the topic of your question now changed I advice you to create a new post, so other mods/users pay to it more attention.

Best,

2 Likes