Import memory issues i think ? I can do smaller sets of data...but when I try to do the all of it...

Kia Ora All,
I have been trying to make a large .qza with 1400 PE fastq's. I have managed to do this in batches of 100, however I get bus errors when I try to do more (>150 PE's (i.e. 300 fastq's)

I have tried with and without the temp directory (see script). this was not required for the smaller data sets.
ALTERNATELY :- can I merge a lot of small artifacts ?

Data = Illumina 250 bp PE fastq's
qiime2 version = QIIME2/2022.2

Errors include....
/var/spool/slurm/job30423207/slurm_script: line 21: 5310 Bus error
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.tsv
--output-path all_eo_1.qza
--input-format PairedEndFastqManifestPhred33V2

the bus error number is random

Job log

Job ID: 30423207
Cluster: mahuika
User/Group: matt.adlam/matt.adlam
State: FAILED (exit code 135)
Nodes: 1
Cores per node: 8
CPU Utilized: 04:34:18
CPU Efficiency: 12.43% of 1-12:46:56 core-walltime
Job Wall-clock time: 04:35:52
Memory Utilized: 220.64 MB
Memory Efficiency: 2.69% of 8.00 GB

Script
GNU nano 2.3.1 File: qiime_import.sl
#!/bin/sh
#!/bin/bash -e
#SBATCH --account massey03345
#SBATCH -J qiime2_import
#SBATCH --time 65:00:00
#SBATCH --mem 16GB
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 8
#SBATCH -e q2_in_2022_09_13.err
#SBATCH -o q2_in_022_09_13.out
#SBATCH --export TMPDIR=/nesi/nobackup/massey03345/tmp_QIIME2
#SBATCH --export TMPDIR

module purge
module load QIIME2/2022.2

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest.tsv
--output-path all_eo_2022_09_13.qza
--input-format PairedEndFastqManifestPhred33V2
--verbose

Hi @madlam,

That's honestly a new error for me, exit 135 usually means SIGBUS an unusual relative of SIGSEGV (segmentation fault). As far as I can tell, that's indeed what happened.

This error usually means that some memory address (location) that should have existed does not. This particular QIIME 2 import is entirely Python so generally speaking, raw memory access isn't happening which is the first step to making the above error, so it suggests either a bug in cPython, possible but unlikely, or a transient/systemic hardware failure. There's not really a way to show that outside of a sysadmin running something like a memtest on your platform (which may be worth suggesting, as a python script was killed by SIGBUS, which should raise some eyebrows on their end as well).


Regarding the actual problem at hand, you can absolutely merge different downstream tables, in fact, this is the most correct (and at least more efficient) approach depending on your denoising algorithm. For example, DADA2 will construct an error profile under the assumption that all of your samples are from a single sequencing run (Deblur won't care, but you could still merge at the end).

I think unless you were doing old-school de-novo XX% OTU picking, you would want to do smaller batches here.

Once you have denoised/picked your otus/features (ideally with multiple jobs so it all happens "quickly") you will be able to merge the resulting tables and representative sequences using the qiime feature-table plugin. From there everything looks the same and you've avoided providing all of the raw sequence data to a single denoiser/otu-picker job.

Hi Evan, thank you so much for the fast reply.
I did neglect to mention this is WGS shotgun data of microbiome. I will cross that bridge soon.
This is new to me, especially Qiime2, so I’m really excited to progress.

Again, thank you.
Ngā mihi nui
Matt Adlam