Hi everyone,
I’m currently training a naive Bayes classifier on the SILVA 138.2 SSU NR99 database, and I’ve noticed that it takes significantly longer compared to previous versions. For instance:
Training with SILVA 138.1 finished in about 2 hours.
Training with SILVA 138.2 has already taken over 10 hours, and the job still hasn’t complete, requiring me to extend the allocated runtime further.
Here’s my job script:
#!/bin/bash --login
########## SBATCH Lines for Resource Request ##########
#SBATCH --time=10:00:00 # limit of wall clock time - how long the job runs #SBATCH --nodes=1 # number of different nodes #SBATCH --ntasks=1 # number of tasks #SBATCH --cpus-per-task=6 # number of CPUs (or cores) per task #SBATCH --mem-per-cpu=16G # memory required per allocated CPU (or core) #SBATCH --mail-user= # email for notifications #SBATCH --mail-type=ALL # tye of emails: BEGIN, END, FAIL #SBATCH --job-name qi2-silva # name of the job
########## Command Lines for Job Running ##########
Load the required environment
module purge
conda activate qiime2
Import your reference sequence and taxonomy files into QIIME 2 artifacts
It should not be taking that long. I just constructed my own SILVA 138.2 classifier a few days ago w/o issue. There really is no difference between 138.1 and 138.2 other than updates to the taxonomy. So, there should be no differences in the time it takes to make the classifier.
You can simply run qiime rescript get-silva-data ... to fetch and import the taxonomy for you. See the tutorial which starts with this step here.
Thanks for the quick response. I tried using Rescript several time but I kept getting this error message,
Plugin error from rescript:
Parameter 'version' received '138.2' as an argument, which is incompatible with parameter type: Str % Choices('128', '132')¹ | Str % Choices('138')² | Str % Choices('138.1')³
I reran the job, extending the time limit to 15 hours, and it was successfully completed in 12 hours. I'm not sure why it took that long, and I wasn't able to use rescript. For reference, I am using QIIME 2 version 2024.5.0
You'll need to install the latest version of QIIME 2 (2024.10) to make use of get-silva-data for SILVA v138.2.
Or simply follow the tutorial, using your current version of QIIME 2, to download all the files required. There are several... click on the The gritty details, menu to reveal the detailed instructions... It should work if you simply replace '138.1' with '138.2'.
Thanks for the information. I will consider using [QIIME 2 (2024.10) in future analysis. I have another question related to this topic, how can I decide the minimum length when training the classifier on only the V4 region? The expected amplicon size is ~390bp, thus I set the maximum length to 400, but I am not sure how to decide the minimum length.
I would simply follow the instructions for making an amplicon specific classifier. Basically, trim out the amplicon region of interest using your PCR primer sequences. Then you should not need to worry about length trimming.