Importing is taking days, what's going on?

msport469 · November 8, 2019, 1:14pm

I am at the import stage using fastq files (692 gb total) and it has been running for 3 days with 1000 gb memory and 5 cores (I couldn't find out if allocating more than one core made a difference).

Is this a normal amount of time for it to take to import?

dwt · November 8, 2019, 2:23pm

That's a huge amount of data! You may want to import them in batches, then merge. That way at least you could track the progress.
Is that all from the same sequencing run? Because if you're planning on running dada2 you should run each sequencing lane independently anyway, then merge the tables and rep-seqs.

msport469 · November 8, 2019, 2:46pm

no several sequencing runs. I did a search on the ncbi database for sequences that meet my criteria and wound up with 190 samples of interest, but these samples lie in various different fastq files. importing in batches is a good idea. i was planning on using dada2, that's an excellent point. Thank you.