Demux and subsampling

AdventureDavid · February 22, 2024, 7:57pm

Hello,

I am working with paired end sequences. So I ran:
qiime tools import for EMPPairedEndSequences to generate my qza file.

I was trying to demux but my command timed out (alloting 10 hours on the university cluster utilizing 10 cores for the code to run). This does not seem right. So now I want to subsample to test whether or not the code is working at all...

Below is the code I used to attempt to demux.

----------------Load Modules--------------------

module purge
module load QIIME2/2023.7

----------------Commands------------------------

qiime demux emp-paired
--m-barcodes-file metadata.txt
--m-barcodes-column BarcodeSequence
--i-seqs raw-sequences.qza
--p-no-golay-error-correction
--o-per-sample-sequences demux.qza
--o-error-correction-details demux-details.qza

I need help to figure out what is going wrong with this code and also would like help to figure out how to demux a subsample from the raw-sequences.qza.

Appreciate any suggestions and feedback

Oddant1 · February 22, 2024, 9:08pm

Hello @AdventureDavid, allocating multiple cores to qiime demux emp-paired is unfortunately not really going to speed things up because it is a single threaded action.

How large is your data? Since it sounds like you timed out and didn't otherwise see any errors, it is possible that this is just going to take a long time.

AdventureDavid · February 22, 2024, 9:12pm

I agree, the dataset might need more time (approximately 220 samples) but I'm also cautious to run it again if I can confirm the code is running correctly. Based on the comment, the code looks fine though?

Is there anyway I can tell the code to demux only specific barcodes for subsampling to get a faster output to test if the codes running correctly?

Oddant1 · February 22, 2024, 11:53pm

I believe you would have to subsample your raw data before importing it into QIIME 2 which would be difficult to do, but you could do it.

I talked to some other members of the QIIME 2 team, and the general consensus is that it is not unusual for the action to take this long on a substantial amount of data. Probably give it 24 hours or so, and if it times out after that there may be a problem, but currently there doesn't seem to be any indication that anything has actually gone wrong.

AdventureDavid · February 23, 2024, 5:58am

Alrighty, thanks for looking into it. I will let it run for longer and see if that resolves it!

Update: Looks like that was it, the job just needed extra time. Thank you!

system · March 25, 2024, 11:59am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.