Data retrieval step through the get-all action using the fondue plugin in the MOSHPIT tutorial is taking a huge amount of time

NewUser · July 15, 2025, 6:02pm

I am following the MOSHPIT Tutorials, and I am encountering a problem in the data retrieval step. After downloading the files and importing the data into an QIIME2 artifact, I am proceeding to run the get-all action from the fondue plugin. I am providing the exact code provided in the tutorial (just edited the email id that we are supposed to provide).

mosh fondue get-all
--i-accession-ids ./cache:ids
--p-email YOUR.EMAIL@domain.com
--p-threads 5
--p-retries 5
--o-paired-reads ./cache:reads_paired
--o-metadata ./cache:metadata
--o-single-reads ./cache:reads_single
--o-failed-runs ./cache:failed_runs
--verbose

Although the code is running, it seems to run for a long period of time. How much time does it usually take for this step? I am using my personal computer (Windows 11 64-bit).

lizgehret · July 15, 2025, 8:34pm

Hi @NewUser,

Welcome to the forum

A few questions:

You mention that you're using a Windows machine, but how are you running these commands? Via WSL command line? How much RAM are you allocating? And what length of time is this command running for? Is it completing, or are you receiving a non-zero exit code?

This information will help us to better understand the length of this runtime and whether or not it's to be expected. Cheers

NewUser · July 16, 2025, 8:15am

Hi,

Yes, I am running these commands using WSL. I am not pre-allocating RAM for running these programs. As of now, it has been running for 4+ hours. The specific actions are being executed (For e.g., downloading sequences, checks for single and paired-end reads, and writing sequences in the directory) but the code keeps on running.

I am attaching a screenshot of the outputs generated after the code was executed for 2+ hours. Notice the timestamp in the outputs generated. Thanks!

lizgehret · July 16, 2025, 8:29pm

Hey @NewUser,

Thanks for sharing this context! This is honestly a super reasonable runtime range. Depending on the size and type of data you are retrieving (i.e. amplicon vs. metagenomics), this command can take up to several days.

That being said, let us know if you run into any failures and we can take a closer look!

Cheers

NewUser · July 25, 2025, 4:59pm

Hi,

Thanks for letting me know. I ran the code and it worked! However, during running the code for quality overview (code mentioned below), I encountered an error message saying:

Code: mosh demux summarize
--i-data ./cache:reads
--o-visualization demux.qzv

Error:

I guess this is because there are no cache files with the name reads, as the reads can be single-end or paired-end. I changed the code to adjust for single-end (./cache: reads_single), and a qsv file was then generated for single-end reads.

However, I got the same error message when I ran the code for paired-end reads. Does this mean my dataset does not contain any paired reads?

Note: I ran the code for single-end reads again, but I got the same error message this time. Could there be a specific reason? Please let me know.