An error was encountered while running DADA2 in R (return code -9) even with 125GB of RAM

Hi everyone,

I am using qiime2 for analyzing the microbiome data. I am stuck at the step of using dada2 denoise-paired for quality control processing. For this step I am using the command:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 150 --p-trunc-len-r 150 --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza --p-n-threads 10 --verbose

and the output error I'm getting is:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/tmpykr9z832/forward --input_directory_reverse /tmp/tmpykr9z832/reverse --output_path /tmp/tmpykr9z832/output.tsv.biom --output_track /tmp/tmpykr9z832/track.tsv --filtered_directory /tmp/tmpykr9z832/filt_f --filtered_directory_reverse /tmp/tmpykr9z832/filt_r --truncation_length 150 --truncation_length_reverse 150 --trim_left 0 --trim_left_reverse 0 --max_expected_errors 2.0 --max_expected_errors_reverse 2.0 --truncation_quality_score 2 --min_overlap 12 --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 10 --learn_min_reads 1000000

R version 4.2.3 (2023-03-15)
Loading required package: Rcpp
DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6
2) Filtering ..........
3) Learning Error Rates
4584282000 total bases in 30561880 reads from 1 samples will be used for learning the error rates.
Traceback (most recent call last):
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 326, in denoise_paired
run_commands([cmd])
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_dada.R', '--input_directory', '/tmp/tmpykr9z832/forward', '--input_directory_reverse', '/tmp/tmpykr9z832/reverse', '--output_path', '/tmp/tmpykr9z832/output.tsv.biom', '--output_track', '/tmp/tmpykr9z832/track.tsv', '--filtered_directory', '/tmp/tmpykr9z832/filt_f', '--filtered_directory_reverse', '/tmp/tmpykr9z832/filt_r', '--truncation_length', '150', '--truncation_length_reverse', '150', '--trim_left', '0', '--trim_left_reverse', '0', '--max_expected_errors', '2.0', '--max_expected_errors_reverse', '2.0', '--truncation_quality_score', '2', '--min_overlap', '12', '--pooling_method', 'independent', '--chimera_method', 'consensus', '--min_parental_fold', '1.0', '--allow_one_off', 'False', '--num_threads', '10', '--learn_min_reads', '1000000']' died with <Signals.SIGKILL: 9>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2cli/commands.py", line 468, in call
results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 274, in bound_callable
outputs = self.callable_executor(
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/qiime2/sdk/action.py", line 509, in callable_executor
output_views = self._callable(**view_args)
File "/home/mohini/miniconda3/envs/qiime2-2023.5/lib/python3.8/site-packages/q2_dada2/_denoise.py", line 339, in denoise_paired
raise Exception("An error was encountered while running DADA2"
Exception: An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code -9), please inspect stdout and stderr to learn more.

Also, I am running this step in conda environment on a system with 125GB RAM and the size of the file (demux-paired-end.qza) is 112GB. So, is it the memory issue but I feel that 125GB RAM is quite sufficient to run this file. If not, please lead me towards the correct way. I am struggling with this step from last 5 days.

Thanks everyone for the help!

Hi @mjais,

Welcome to the :qiime2: forum!

You are correct - this error message below is due to memory allocation:

125GB of RAM is a lot, but you also have a very large demux file. Keep in mind that the file size you're mentioning (112 GB) is currently compressed, so it would be even larger than that when uncompressed.

While this is most likely a lot of data, the size of your file could also be due to a high variability in your data's quality - similar data is much easier to compress vs. data that has more variability in quality scores, etc. Something you might consider doing is splitting up your data by sequencing run (if your data was collected across multiple runs) and then joining them afterwards.

Cheers :lizard:

Hi @lizgehret ,

Thanks for the reply!
Can you just little bit explain that splitting up data by sequencing run meaning, means how to do that?
Like I have paired-end reads for 10 different samples, so you are suggesting to prepare demux file separately for each sample, then run dada2 on each of them, then merging the files afterwards, is it?
If it is, then my next concern is how to merge the .qza artifacts of different runs?

Thanks again for the help!

Hi @mjais,

Thanks for your patience here! Apologies as I didn't get around to following up with you on this before heading out for the weekend. I will circle back on Monday! :qiime2:

Hello @mjais,

Just to qiime in here, it sounds like your entire demux has 10 samples, all of which were sequenced in the same run. If that's the case it's less ideal to split the demux apart because dada2 will learn different error models for sequences that should make up a single error model. If you had a demux made up of different sequencing runs it wouldn't be a problem.

Is it possible for you to get access to more memory?

You could try to lower the --p-n-threads parameter to see if that lowers memory usage.

1 Like

Hello @colinvwood,

Yes, the entire demux file has 10 samples. Currently, I am not aware about whether all of them has been sequenced in same or different run. But still, I tried to import and run DADA2 separately for each sample file (for which the size of .qza file reduced to around ~12GB each sample), but still it didn't worked and showing the same error of SIGKILL. I am wondering that how much memory DADA2 exactly requires as now I am having 125GB memory for 12GB of .qza file, is it still not enough?
I tried --p-n-threads with 0, 1, and 10, but none of them worked. Now I am trying with 2, I don't know whether it will work or not. Also, I tried to run it with slurm and mpirun, but it showed same error.

It would be really helpful if you can suggest some ways to solve this issue!

Thanks.

Hello @mjais,

No that should definitely not be happening. How confident are you that you in fact have that much memory allocated to these runs? Running with 2 threads is not necessary since it failed with 1.

You could share your demux.qzv and we could see if there is something strange going on there.

Hi @colinvwood,

I am saying that dada2 is using all 125GB memory because I had allocated --mem=MaxMemPerNode in slurm script and also I checked the RAM usage with htop command and I didn't run any other command on that system. The script is as follows what I was using.
#!/bin/bash -l
#SBATCH --job-name="DADA2"
#SBATCH --partition=LocalQ
#SBATCH --time 00-48:00:00
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --mem=MaxMemPerNode
#SBATCH --export=ALL
time qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-n-threads 0 --p-trunc-len-f 0 --p-trunc-len-r 0 --o-representative-sequences rep-seqs-dada2.qza --o-table table-dada2.qza --o-denoising-stats stats-dada2.qza --verbose

demux.qzv (310.9 KB) = for all 10 samples
demux2.qzv (313.2 KB) = for 1 sample

Please find the attachment and help me out if I am missing something in executing the dada2 plugin.

Thanks!

Hello @mjais,

Since you said you've watched the process in htop while it's running, can you confirm that you were allocated 125G and that it's a memory issue? In other words do you see the memory usage near the 125G limit before it crashes?

Can you run scontrol show config and post the output here?

1 Like

Hi @colinvwood,

Yes, the memory usage was upto 124GB before it crashes. I am attaching here the output of scontrol show config, please find it.
Configuration data as of 2023-07-24T13:50:23
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = none
AccountingStorageHost = localhost
AccountingStorageLoc = /var/log/slurm_jobacct.log
AccountingStoragePort = 0
AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages
AccountingStorageType = accounting_storage/none
AccountingStorageUser = root
AccountingStoreJobComment = Yes
AcctGatherEnergyType = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInterconnectType = acct_gather_interconnect/none
AcctGatherNodeFreq = 0 sec
AcctGatherProfileType = acct_gather_profile/none
AllowSpecResourcesUsage = 0
AuthAltTypes = (null)
AuthInfo = (null)
AuthType = auth/munge
BatchStartTimeout = 10 sec
BOOT_TIME = 2023-07-21T17:49:25
BurstBufferType = (null)
CheckpointType = checkpoint/none
CliFilterPlugins = (null)
ClusterName = localcluster
CommunicationParameters = (null)
CompleteWait = 0 sec
CoreSpecPlugin = core_spec/none
CpuFreqDef = Unknown
CpuFreqGovernors = Performance,OnDemand,UserSpace
CredType = cred/munge
DebugFlags = (null)
DefMemPerNode = UNLIMITED
DisableRootJobs = No
EioTimeout = 60
EnforcePartLimits = NO
Epilog = (null)
EpilogMsgTime = 2000 usec
EpilogSlurmctld = (null)
ExtSensorsType = ext_sensors/none
ExtSensorsFreq = 0 sec
FastSchedule = 1
FederationParameters = (null)
FirstJobId = 1
GetEnvTimeout = 2 sec
GresTypes = (null)
GpuFreqDef = high,memory=high
GroupUpdateForce = 1
GroupUpdateTime = 600 sec
HASH_VAL = Match
HealthCheckInterval = 0 sec
HealthCheckNodeState = ANY
HealthCheckProgram = (null)
InactiveLimit = 0 sec
JobAcctGatherFrequency = 30
JobAcctGatherType = jobacct_gather/none
JobAcctGatherParams = (null)
JobCheckpointDir = /var/slurm/checkpoint
JobCompHost = localhost
JobCompLoc = /var/log/slurm_jobcomp.log
JobCompPort = 0
JobCompType = jobcomp/none
JobCompUser = root
JobContainerType = job_container/none
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobDefaults = (null)
JobFileAppend = 0
JobRequeue = 1
JobSubmitPlugins = (null)
KeepAliveTime = SYSTEM_DEFAULT
KillOnBadExit = 0
KillWait = 30 sec
LaunchParameters = (null)
LaunchType = launch/slurm
Layouts =
Licenses = (null)
LicensesUsed = (null)
LogTimeFormat = iso8601_ms
MailDomain = (null)
MailProg = /bin/mail
MaxArraySize = 1001
MaxJobCount = 10000
MaxJobId = 67043328
MaxMemPerNode = UNLIMITED
MaxStepCount = 40000
MaxTasksPerNode = 512
MCSPlugin = mcs/none
MCSParameters = (null)
MessageTimeout = 10 sec
MinJobAge = 300 sec
MpiDefault = none
MpiParams = (null)
MsgAggregationParams = (null)
MULTIPLE_SLURMD = Yes
NEXT_JOB_ID = 9
NodeFeaturesPlugins = (null)
OverTimeLimit = 0 min
PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
PlugStackConfig = /etc/slurm-llnl/plugstack.conf
PowerParameters = (null)
PowerPlugin =
PreemptMode = OFF
PreemptType = preempt/none
PreemptExemptTime = 00:00:00
PriorityParameters = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityType = priority/basic
PrivateData = none
ProctrackType = proctrack/linuxproc
Prolog = (null)
PrologEpilogTimeout = 65534
PrologSlurmctld = (null)
PrologFlags = (null)
PropagatePrioProcess = 0
PropagateResourceLimits = ALL
PropagateResourceLimitsExcept = (null)
RebootProgram = (null)
ReconfigFlags = (null)
RequeueExit = (null)
RequeueExitHold = (null)
ResumeFailProgram = (null)
ResumeProgram = (null)
ResumeRate = 300 nodes/min
ResumeTimeout = 60 sec
ResvEpilog = (null)
ResvOverRun = 0 min
ResvProlog = (null)
ReturnToService = 2
RoutePlugin = route/default
SallocDefaultCommand = (null)
SbcastParameters = (null)
SchedulerParameters = (null)
SchedulerTimeSlice = 30 sec
SchedulerType = sched/backfill
SelectType = select/cons_tres
SelectTypeParameters = CR_CORE
SlurmUser = slurm(64030)
SlurmctldAddr = (null)
SlurmctldDebug = info
SlurmctldHost[0] = mohini-ubuntu(localhost)
SlurmctldLogFile = /var/log/slurm-llnl/slurmctld.log
SlurmctldPort = 6817
SlurmctldSyslogDebug = unknown
SlurmctldPrimaryOffProg = (null)
SlurmctldPrimaryOnProg = (null)
SlurmctldTimeout = 120 sec
SlurmctldParameters = (null)
SlurmdDebug = info
SlurmdLogFile = /var/log/slurm-llnl/slurmd.log
SlurmdParameters = (null)
SlurmdPidFile = /var/run/slurmd.pid
SlurmdPort = 6818
SlurmdSpoolDir = /var/lib/slurm-llnl/slurmd
SlurmdSyslogDebug = unknown
SlurmdTimeout = 300 sec
SlurmdUser = root(0)
SlurmSchedLogFile = (null)
SlurmSchedLogLevel = 0
SlurmctldPidFile = /var/run/slurmctld.pid
SlurmctldPlugstack = (null)
SLURM_CONF = /etc/slurm-llnl/slurm.conf
SLURM_VERSION = 19.05.5
SrunEpilog = (null)
SrunPortRange = 0-0
SrunProlog = (null)
StateSaveLocation = /var/lib/slurm-llnl/slurmctld
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendProgram = (null)
SuspendRate = 60 nodes/min
SuspendTime = NONE
SuspendTimeout = 30 sec
SwitchType = switch/none
TaskEpilog = (null)
TaskPlugin = task/none
TaskPluginParam = (null type)
TaskProlog = (null)
TCPTimeout = 2 sec
TmpFS = /tmp
TopologyParam = (null)
TopologyPlugin = topology/none
TrackWCKey = No
TreeWidth = 50
UsePam = 0
UnkillableStepProgram = (null)
UnkillableStepTimeout = 60 sec
VSizeFactor = 0 percent
WaitTime = 0 sec
X11Parameters = (null)

Slurmctld(primary) at mohini-ubuntu is UP

Also, all the 10 samples of mine has been derived from same sequencing run. So, I guess I have to run dada2 or deblur on all the 10 samples' demux file in one go, right?

Thanks!

Hello @mjais,

It sounds like you have essentially proven it's a memory issue. It's very strange that that happens for an input file as small as ~12G.

Also, all the 10 samples of mine has been derived from same sequencing run. So, I guess I have to run dada2 or deblur on all the 10 samples' demux file in one go, right?

For dada2, this is correct. For deblur it does not matter, however deblur is not an option because it only supports single-end reads.

You could try making an issue on dada2's GitHub page, they may have some insight as to why your memory is ballooning.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.