Hi everybody,
I'm trying to process some metatranscriptomic data, where I first extracted the 18S rRNA reads using SortMeRNA and the PR2 database, and then umerged the Paired end reads. But after importing (which works fine), denoising with dada2 runs for ~3+ days and eventually times out on our HPC. I've run the same data set using dada2 in R and it works but also with similar time restraints, so I am not sure if what I am doing is the most efficient. Read quality looks fine when checking also.
Below is the current steps I am taking in QIIME2, if anybody has any recommendations on things to change please let me know! I have read that it is possible to skip denoising altogether and proceed via vsearch, but that eventually gives me issues down the line as well.
Thanks in advance!
(HPC configuration for reference)
#SBATCH --time=03-00:00:00 ## time format is DD-HH:MM:SS
#SBATCH --nodes=1
#SBATCH --cpus-per-task=36
#SBATCH --mem=100G ## max amount of memory per node you require
#import reads
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path PE
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path PE.qza
view
qiime demux summarize
--i-data PE.qza
--o-visualization SSU-single-demux.qzv
##Denoise
qiime dada2 denoise-paired
--i-demultiplexed-seqs PE.qza
--o-table PE-demuxtable.qza
--o-representative-sequences PE-rep-seqs.qza
--p-trunc-len-f 120
--p-trunc-len-r 120
--o-denoising-stats PE-DADA2-stats.qza
--p-n-threads 36