qiime import takes way too long

wei_wei · August 8, 2022, 5:46pm

I was trying to import 80 pairs of paired-end reads (i.e. 160 fastq files in total). I was running it on a server that's usually fast. It took 4 days now and it is still running. There's no error message whatsoever.
Here's the code. Is the input-format correct? Thanks.

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path data/colorectal_neoplasms_16s/manifest.txt --output-path data/colorectal_neoplasms_16s/paired-end-demux.qza --input-format PairedEndFastqManifestPhred33

wei_wei · August 8, 2022, 10:15pm

I believe it could be because I omitted the "V2" at the end of "PairedEndFastqManifestPhred33". However, adding the V2 gave me so many mistakes that I have no idea how to fix. Basically it's all about the format of the manifest file.
Below is an example of my manifest.csv file

sample-id,absolute-filepath,direction
DRR127476,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127476/DRR127476_1.fastq,forward
DRR127476,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127476/DRR127476_2.fastq,reverse
DRR127478,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127478/DRR127478_1.fastq,forward
DRR127478,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127478/DRR127478_2.fastq,reverse
DRR127481,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127481/DRR127481_1.fastq,forward
DRR127481,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127481/DRR127481_2.fastq,reverse
DRR127485,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127485/DRR127485_1.fastq,forward

But it didn't work, because it shows the IDs are duplicated (forward and reverse). But when I changed the IDs to unique IDs by adding _1 and _2 suffixes, it tells me that they are not paired. So how exactly should I format the manifest file? Should it be coma separated or tab separated? I tuned it so many times and still couldn't get it to work.

Keegan-Evans · August 8, 2022, 11:51pm

@wei_wei,
I am not sure, but I think you might be on the right track here, here is the documentation for importing using the manifest format. But you will want to have the columns sample-id, forward-absolute-filepath, and reverse-absolute-filepath in your manifest. If you store the manifest inside of the folder with your data, you can cut down the path to the files a bit using the $PWD environment variable, as shown in the tutorial. Hope this helps!

wei_wei · August 10, 2022, 9:15pm

Thanks for the reply, but it is still very slow...I modified the manifest file and restarted. 24 hours have past and it is still in the process.

Keegan-Evans · August 15, 2022, 10:22pm

@wei_wei,

Unfortunately, sometimes importing is just slow, from the directory you are importing, can you run ls -alh and post your result here so that we can get a better idea if your import is being unreasonably slow?

wei_wei · August 16, 2022, 8:51pm

Hi,

-rw-r--r--  1 wjw5274 e5-cse-compbiodmk 305G Aug 13 23:26 paired-end-demux.qza

Keegan-Evans · August 18, 2022, 4:38pm

@wei_wei,
Wow, alright it is unsurprising that it is taking that long, that is a lot of data! Just to check, are there multiple MiSeq runs or multiple HiSeq lanes in your dataset? If so, they should be imported, denoised/other QC steps performed on them separately and then merged once you have your feature tables.

system · September 18, 2022, 10:38pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.