Silva 132 Database, which taxonomy & reference sequence files to select for classifier training? - part 2

AstroBioJack · April 27, 2023, 10:07am

Dear all,

I feel the need to re-open the topic because I was not able to find a satisfactory solution for me. Discussion was between @DannyBoi97 and @SoilRotifer, which I hope will be able to help me once more.

I am trying to train a classifier myself, as I am not able to find classifiers for Silva 132, or better, I got one but Qiime 2 (I am using version 2023.2 in a singularity container) retrieve the error message:

Plugin error from feature-classifier:

The scikit-learn version (0.20.2) used to generate this artifact does not match the current version of scikit-learn installed (0.24.1). Please retrain your classifier for your current deployment to prevent data-corruption errors.*

For this reason I decided to train my own classifier and found very useful instructions here: Training feature classifiers with q2-feature-classifier — QIIME 2 2022.2.0 documentation

The point is, with all the folders we have when we download Silva database, which one should I use when I am compiling the following commands?

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path INSERT_REF_SEQ_FILE.fasta
–output-path ref_seq.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path INSERT_TAXONOMY_FILE.txt
–output-path ref-taxonomy.qza

Is it rep_set or rep_set_aligned?

Unfortunately, as I open the link about RESCRIPt suggested by MIke I can see guidelines only about Silva 138, which I dont need. Can you please give support?
Thanks in advance for your answer!

EDIT: while waiting for the answer, I proceeded running the command as following.

qiime tools import --type 'FeatureData[Sequence]' --input-path Databases/Silva_132/rep_set/rep_set_all/99/silva132_99.fna --output-path Databases/Silva_132/99_otus_all.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path Databases/Silva_132/taxonomy/taxonomy_all/99/raw_taxonomy.txt --output-path Databases/Silva_132/ref-taxonomy.qza

Both ended successfully. I will give further updates about the results of classifier training and its use.

Nicholas_Bokulich · April 27, 2023, 10:31am

Hi @AstroBioJack ,

RESCRIPt's get-silva-data action has a --p-version parameter that you can change to 132 to download and format version 132.

You want the unaligned

Yeah, 132 is rather old at this point. The pre-trained classifiers on the QIIME 2 website are always for the latest available version of SILVA, so we have not released pre-trained classifiers for version 132 for a few years now I think.

Good luck!

AstroBioJack · May 2, 2023, 8:21am

Hi @Nicholas_Bokulich ,

thanks for your feedback!
I am now facing some troubles. I tried to build my own classifier with the above mentioned commands but it runs 19 hours and then gives me error like "no space left on device", which sounds odd for me as it is running in a 1TB environment.

As alternative, I then tried to follow the instructions given in the past to @DannyBoi97 about RESCRIPt, but as I try the first script mentioned (for which you as well where specifying the features), it gives me an error. Here it is:

singularity run singularity_containers/qiime2_2023.2 qiime rescript get-silva-data --p-version '132' --p-target 'SSURef_NR99' --p-include-species-labels --o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza --o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza
Error: QIIME 2 has no plugin/command named 'rescript'.

I was quite astonished by that
Would you mind at this point to kindly supply me the classifier of Silva 132 that can work with Qiime 2 2023.2 if possible, or do you think I should drop this?

Thanks in advance

PS: as above mentioned, I am using Qiime version 2023.2 in a singularity container, downloaded from official website

Nicholas_Bokulich · May 2, 2023, 8:35am

Hi @AstroBioJack ,

That's bizarre. The SILVA database should not take up all that much space. BUT it sounds like you are maybe running this on a cluster and it is probably configured to use a temp directory that might have much more limited space. You can look into changing the temp directory, see other topics on the forum, e.g., this one for instructions:

You are using a singularity container — but is RESCRIPt installed in this container? This would explain the error that you are seeing.

The pre-trained SILVA 132 classifiers were trained using an older version of QIIME 2 (or more importantly an older version of scikit-learn), so would not be compatible with QIIME 2023.2. You would need to use an older version of QIIME 2 to run the SILVA 132 pre-trained classifiers.

Why do you need 132 specifically? You could get a SILVA 138 classifier that is compatible with QIIME 2023.2, otherwise get RESCRIPt working to build a SILVA 132 classifier.

Good luck!

AstroBioJack · May 2, 2023, 8:54am

Hi @Nicholas_Bokulich

Yes, I am actually working on a cluster and I will contact technical assistance for that, thanks for the suggestion.
About the alternatives: I don't know if rescript is in the container, but if it is not and container has been built using the instructions here Installing QIIME 2 using Docker — QIIME 2 2023.2.0 documentation, it means that it is not equipped, is it correct?
Quite a pity, as modifying it in the container is a mess from my experience, and running it local doesn't seem very comfortable...

About the reason I am trying to do this, it is that I am trying to compare some results with Silva 132, and Silva 138. I want to see what is different. Reading you last lines I unfortunately reach the conclusion that it would be better to scale down to older versions of :qiime2: to use Silva 132 at this point, am I correct?

Thanks for the great support

Nicholas_Bokulich · May 2, 2023, 9:11am

Hi @AstroBioJack

Right. RESCRIPt is not part of the "core distribution" at the moment, so needs to be installed separately.

RESCRIPt/get-silva-data should not be too space or RAM hungry, so it should be possible to get this running on your cluster once you sort out the tempdir issue OR run this locally to build the database and then upload to the cluster for the other steps (though the tempdir size issue will most likely constrain you even more with downstream steps). So I would recommend getting this sorted out to create a 132 database instead of giving up or downgrading to an older version of QIIME 2 (which would be a workable plan B).

Good luck!

AstroBioJack · May 4, 2023, 9:07am

Hi @Nicholas_Bokulich, in the end I succeeded (working with the technical assistance of my server) in making my own classifier without using RESCRIPt.
I was wondering whether I could share it somewhere, so that other researchers could find it in case of need. Of course is "made by me" but it could be better than nothing. Let me know, I would be glad to cooperate with :qiime2: community

Cheers

Nicholas_Bokulich · May 4, 2023, 10:25am

Sure, you could post it on the forum — we have a community contributions section where several users have uploaded databases/pre-trained classifiers, etc.

system · June 4, 2023, 4:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.