Hi There, I would like some help and advice for analyzing qiime2 and whole genome shotgun sequncing. At the moment I am trying to run my data through DADA2 however it is taking an seemingly infinite amount of time. The data set is about 35 gb with about 700 paired end sequences that I imported via qiime tools. I am testing with a smaller dataset of about 700 mb, which timesout at 3 hours (24 threads, 12 cpus, 15 gb ram). These data had been demultiplexed.
Should I have imported much smaller batches of sequence data, then DADA2'd them, and then merged them. Hereafter I would like to use shogun to assign the reads.
I am concerned as there are many different reads that can be attributed to a single OTU. I am concerned that if I do this the feature tables would be of different dimensions and may not merge.
Any advice would be greatly appreciated, and I am (of course) up against a clock...
Script :-
#!/bin/sh
#!/bin/bash -e
#SBATCH --account massey03345
#SBATCH -J dada2
#SBATCH --time 3:00:00
#SBATCH --mem 15GB
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 12
#SBATCH -e dada2_3.err
#SBATCH -o dada2_3.out
#SBATCH --export NONE
#SBATCH --profile task
module purge
module load QIIME2/2021.11
qiime dada2 denoise-paired
--i-demultiplexed-seqs import2.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 150
--p-trunc-len-r 150
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
--p-n-threads 24 \
Error :- slurmstepd: error: *** JOB 24972355 ON wbn132 CANCELLED AT 2022-02-23T06:54:19 DUE TO TIME LIMIT ***
Many thanks
Matt Adlam