qiime import takes way too long

I was trying to import 80 pairs of paired-end reads (i.e. 160 fastq files in total). I was running it on a server that's usually fast. It took 4 days now and it is still running. There's no error message whatsoever.
Here's the code. Is the input-format correct? Thanks.

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path data/colorectal_neoplasms_16s/manifest.txt --output-path data/colorectal_neoplasms_16s/paired-end-demux.qza --input-format PairedEndFastqManifestPhred33

I believe it could be because I omitted the "V2" at the end of "PairedEndFastqManifestPhred33". However, adding the V2 gave me so many mistakes that I have no idea how to fix. Basically it's all about the format of the manifest file.
Below is an example of my manifest.csv file

sample-id,absolute-filepath,direction
DRR127476,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127476/DRR127476_1.fastq,forward
DRR127476,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127476/DRR127476_2.fastq,reverse
DRR127478,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127478/DRR127478_1.fastq,forward
DRR127478,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127478/DRR127478_2.fastq,reverse
DRR127481,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127481/DRR127481_1.fastq,forward
DRR127481,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127481/DRR127481_2.fastq,reverse
DRR127485,/data/wjw5274/L2-UniFrac-Paper/data/colorectal_neoplasms_16s/DRR127485/DRR127485_1.fastq,forward

But it didn't work, because it shows the IDs are duplicated (forward and reverse). But when I changed the IDs to unique IDs by adding _1 and _2 suffixes, it tells me that they are not paired. So how exactly should I format the manifest file? Should it be coma separated or tab separated? I tuned it so many times and still couldn't get it to work.

@wei_wei,
I am not sure, but I think you might be on the right track here, here is the documentation for importing using the manifest format. But you will want to have the columns sample-id, forward-absolute-filepath, and reverse-absolute-filepath in your manifest. If you store the manifest inside of the folder with your data, you can cut down the path to the files a bit using the $PWD environment variable, as shown in the tutorial. Hope this helps!

1 Like

Thanks for the reply, but it is still very slow...I modified the manifest file and restarted. 24 hours have past and it is still in the process.

@wei_wei,

Unfortunately, sometimes importing is just slow, from the directory you are importing, can you run ls -alh and post your result here so that we can get a better idea if your import is being unreasonably slow?

1 Like

Hi,

-rw-r--r--  1 wjw5274 e5-cse-compbiodmk 305G Aug 13 23:26 paired-end-demux.qza

@wei_wei,
Wow, alright it is unsurprising that it is taking that long, that is a lot of data! Just to check, are there multiple MiSeq runs or multiple HiSeq lanes in your dataset? If so, they should be imported, denoised/other QC steps performed on them separately and then merged once you have your feature tables.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.