SILVA or Greengene

farhad1990 · July 21, 2020, 2:04pm

Hello everyone,

I am about to train my own classifier with primer set 341F (CCTACGGGNGGCWGCAG) and 805R (GACTACHVGGGTATCTAATCC). I have two questions, first which databases do you recommend, SILVA or Greengene? Second, I used different values to truncate my forward and reverse reads (during denoising). And, based on the tutorial, ‘‘For classification of paired-end reads and untrimmed single-end reads, we recommend training a classifier on sequences that have been extracted at the appropriate primer sites, but are not trimmed’’. I am a bit confused that what does it mean by “…but are not trimmed”? Shouldn’t I use the trimmed req-seqs for this?

I appreciate your help in advance.

ben · July 21, 2020, 2:12pm

SILVA vs. Greengenes - it depends, and sometimes it depends on the reviewer. I will tell you that training classifiers with either repository is great and easy. It is … roughly easy to do both … and I would reocmmend that both be put into your pipelines. SILVA is updated regularly and generally more accepted, but Greengenes is conservative and hasn’t been updated since … 2013(?) I am not sure.
Truncating ends will depend on your quality at the ends of your forward and reverse reads. a) I believe it is saying NOT to trim the training classifiers.

Ben

PS welcome to the forums Farhad!

SoilRotifer · July 21, 2020, 7:43pm

Hi @farhad1990,

I just wanted to add that we've recently made it much easier for you to make your own SILVA classifier. Check out the RESCRIPt tutorial:

-Best wishes!
-Mike

farhad1990 · July 22, 2020, 1:39pm

Thanks @ben,

I will go with SILVA 138 and it was also my wild guess not to truncate the classifier.

Kinds,
Farhad

farhad1990 · July 22, 2020, 1:39pm

Thanks @SoilRotifer,

I will go through it and hopefully I will make my own classifier
Thanks for this amazing pipeline <3

Kinds,
Farhad

farhad1990 · July 30, 2020, 12:44pm

Hello again

Per my previous comment on using the pretrained classifier silva-138-99 I managed to remove the sklearn version error and the script gone running for 5h, howerver at the end failed to an ERRno 17 file exists:
Plugin error from feature-classifier:

[Errno 17] File exists: ‘/home/farhad1990/faststorage/data/tmp/q2-TSVTaxonomyDirectoryFormat-k7syus2a’ -> ‘/home/farhad1990/faststorage/data/tmp/qiime2-archive-nbtqovkz/eaac4c52-85bb-45b9-8c67-1e59c3b7a297/data’*

Debug info has been saved to /home/farhad1990/faststorage/data/tmp/qiime2-q2cli-err-8_iva1pd.log

I am using my university’s cluster and I have submitted the job through this .sh script:

#!/bin/bash
#SBATCH --partition normal
#SBATCH --mem-per-cpu 64G
#SBATCH -c 1
#SBATCH -t 1200

mkdir /home/farhad1990/faststorage/data/temp/
export TMPDIR=/home/farhad1990/faststorage/data/temp/

source activate qiime2.1
qiime feature-classifier classify-sklearn --i-classifier /home/farhad1990/faststorage/data/classifier-consensus.qza --i-reads /home/farhad1990/faststorage/data/repseqs.qza --o-class
ification /home/farhad1990/faststorage/data/temp/taxonomy-pre.qza

It seems like that there is already a file existing in the path but to make sure I also made a customized tmp directory!

I appreciate the help in advance.

Kinds,
Farhad

SoilRotifer · July 31, 2020, 2:16pm

Hi @farhad1990

The error says:

but your batch submission script says:

Note the tmp vs temp.

You may have to set up your .bashrc / .bash_profile with your TMPDIR path? Different HPC systems allow / don't allow certain setups. So you should check with your system admins about how to dynamically set up your temporary paths.

-Mike

farhad1990 · July 31, 2020, 2:22pm

Hi @SoilRotifer,

Thanks, I thought the export TMPDIR=/home/farhad1990/faststorage/data/temp/ would direct the analysis into my defined temp directory, now I can see that for some reason it is still using the default one. Thanks anyway I will contact our system admin, and have a good weekend

Kinds,
Farhad