Reference data_taxonomy

Hi all,

I am on reference data prep for taxonomy assessment of bacterial 16S reads that had obtained by PCR using primers 357f/805r (~450 bp in length).

I could download rescript Silva data from qiime2 website [i.e., Silva 138 SSURef NR99 full-length sequences and Silva 138 SSURef NR99 full-length taxonomy].

But I don’t know how to handle the downloaded seqs for going to next steps. Could someone let me know what are necessary processes (e.g., cutting the exact region of database seqs)?

thanks,

Hee-Sung

Welcome to the forum!
Here are the links to two very nice tutorials that may be useful for you to train a classifier:
link1
link2
I hope this helps you

2 Likes

Thanks Timur,

to get silva-data, i followed below command as link2, and found error: qiime2 has no plugin command named ‘rescript’. what can i do in order to solve this problem?

qiime rescript get-silva-data
–p-version ‘138’
–p-target ‘SSURef_NR99’
–p-include-species-labels
–o-silva-sequences silva-138-ssu-nr99-seqs.qza
–o-silva-taxonomy silva-138-ssu-nr99-tax.qza

Hi, @baehsung
Currently rescript is not included to the basic Qiime2 installation (I hope it will be soon) and if you are going to use it, you need to install it first.
You can install it inside of your Qiime2 environment as instructed here

2 Likes

Thanks Timur,

I tried to install rescript according the instruction described in the linked websit.

I tried to install rescript using below 3 codes and get results as below.

(qiime2-2021.4) [email protected]:~ conda activate qiime2-2021.4** **(qiime2-2021.4) [email protected]:~ conda install -c conda-forge -c bioconda -c qiime2 -c defaults xmltodict

Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /home/qiime2/miniconda/envs/qiime2-2021.4
added / updated specs:

  • xmltodict

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
certifi-2021.5.30          |   py38h578d9bd_0         141 KB  conda-forge
xmltodict-0.12.0           |             py_0          11 KB  conda-forge
------------------------------------------------------------
                                       Total:         288 KB

The following NEW packages will be INSTALLED:

xmltodict conda-forge/noarch::xmltodict-0.12.0-py_0

The following packages will be UPDATED:

ca-certificates pkgs/main::ca-certificates-2021.4.13-~ → conda-forge::ca-certificates-2021.5.30-ha878542_0
certifi pkgs/main::certifi-2020.12.5-py38h06a~ → conda-forge::certifi-2021.5.30-py38h578d9bd_0

The following packages will be SUPERSEDED by a higher-priority channel:

openssl pkgs/main::openssl-1.1.1k-h27cfd23_0 → conda-forge::openssl-1.1.1k-h7f98852_0

Proceed ([y]/n)? y

Downloading and Extracting Packages
ca-certificates-2021 | 136 KB | ##################################### | 100%
certifi-2021.5.30 | 141 KB | ##################################### | 100%
xmltodict-0.12.0 | 11 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(qiime2-2021.4) [email protected]:~$ pip install git+https://github.com/bokulich-lab/RESCRIPt.git

Collecting git+https://github.com/bokulich-lab/RESCRIPt.git
Cloning https://github.com/bokulich-lab/RESCRIPt.git to /tmp/pip-req-build-gz8j8q3b
Running command git clone -q https://github.com/bokulich-lab/RESCRIPt.git /tmp/pip-req-build-gz8j8q3b
Building wheels for collected packages: rescript
Building wheel for rescript (setup.py) … done
Created wheel for rescript: filename=rescript-2021.8.0.dev0+2.g5e0c872-py3-none-any.whl size=226039 sha256=ab3cff1c288205c52eb0b077be091cb7adab8868244dc0077c4f3351b6f82d8e
Stored in directory: /tmp/pip-ephem-wheel-cache-el817amb/wheels/a7/2f/ca/a4cfe2ac81c54ea686727a464c1029266233e0df67566e3523
Successfully built rescript
Installing collected packages: rescript
Successfully installed rescript-2021.8.0.dev0+2.g5e0c872

It seemed to be successfully installed but when I commanded for silva data getting with below code, i got the same error massage; in terms of, "Error: qiime2 has no plugin command named ‘rescript’.
Could you figure out what is wrong?

qiime rescript get-silva-data --p-version ‘138’ --p-target ‘SSURef_NR99’ --p-include-species-labels --o-silva-sequences /media/sf_Bac16S/ref_seqa/silva-138-ssu-nr99-seqs.qza --o-silva-taxonomy /media/sf_Bac16S/ref_seqs/silva-138-ssu-nr99-tax.qza

Thanks,

Hee-Sung

Hi, @baehsung
Is it possible that you forgot to run:

qiime dev refresh-cache

after installation?

If yes, please run it and try again. Let us know if it is still giving you an error.

1 Like

Thanks for answering.
I successfully installed rescript, and am making classifier for full-length SSU sequences, using below codes.
qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads silva-138-ssu-nr99-seqs-derep-uniq.qza
–i-reference-taxonomy silva-138-ssu-nr99-tax-derep-uniq.qza
–o-classifier silva-138-ssu-nr99-classifier.qza

However, I met another plugin error from feature-classifier: unable to allocate 8 GB for an array with shape (1073741824,) and data type float 64.

Could you figure out what is wrong?

Hee-Sung

Looks like your machine is not strong enough to process it. You need more RAM. You may try to allocate more memory, if you are using a virtual machine, or process it on stronger one.

1 Like

Thanks again for your kind reply.

Now, I am planning to use UF campus cluster (HiperGator) to solve the memory problem, and searching which capacity is fine for us in Price Sheets – Research Computing. What RAM and CPU are enough to run qiime2, do you think?

Best regards,

Hee-Sung

Usually 32-64 gb of RAM enough for most of the datasets to process. You also don’t need a lot of CPUs. My old laptop with 32 Ram and 8 threads can handle most of my sets. For really big datasets, better to have at least 128 gb of RAM.
Usually, there are only several steps on which you need a lot of RAM, such us taxonomy annotation, denoising, classifier training.

The price sheets you shared were interesting, @baehsung. Do you have to buy exclusive access to dedicated resources? Unless you have a huge data stream and are planning to run these jobs constantly, you might get better bang for your buck from a shared cluster environment where you aren’t paying upfront for months of exclusivity. Does UF offer something like that?

2 Likes

Thanks for reply.

I got a shared cluster from UF HiperGator to continue my work, and sett up MobaXterm in my office computer to connect to that. Now I am installing miniconda in the MobaXterm in order to install qiime2 in it using following codes.

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

However, it seems to have something wrongs (eg., error message, Not found, No file and directory) as shown in below picuture. Could you know what is wrong in installing procedure?

thanks,

Hello @baehsung!
Couple of days ago I successfully installed miniconda on cluster using the following command:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

Not sure if it will work for you on your cluster, but please check it out.