Reference data_taxonomy

baehsung · June 4, 2021, 12:52am

Hi all,

I am on reference data prep for taxonomy assessment of bacterial 16S reads that had obtained by PCR using primers 357f/805r (~450 bp in length).

I could download rescript Silva data from qiime2 website [i.e., Silva 138 SSURef NR99 full-length sequences and Silva 138 SSURef NR99 full-length taxonomy].

But I don't know how to handle the downloaded seqs for going to next steps. Could someone let me know what are necessary processes (e.g., cutting the exact region of database seqs)?

thanks,

Hee-Sung

timanix · June 4, 2021, 7:12am

Welcome to the forum!
Here are the links to two very nice tutorials that may be useful for you to train a classifier:
link1
link2
I hope this helps you

baehsung · June 7, 2021, 4:48am

Thanks Timur,

to get silva-data, i followed below command as link2, and found error: qiime2 has no plugin command named 'rescript'. what can i do in order to solve this problem?

qiime rescript get-silva-data
--p-version '138'
--p-target 'SSURef_NR99'
--p-include-species-labels
--o-silva-sequences silva-138-ssu-nr99-seqs.qza
--o-silva-taxonomy silva-138-ssu-nr99-tax.qza

timanix · June 7, 2021, 7:00am

Hi, @baehsung
Currently rescript is not included to the basic Qiime2 installation (I hope it will be soon) and if you are going to use it, you need to install it first.
You can install it inside of your Qiime2 environment as instructed here

baehsung · June 7, 2021, 6:05pm

Thanks Timur,

I tried to install rescript according the instruction described in the linked websit.

I tried to install rescript using below 3 codes and get results as below.

(qiime2-2021.4) qiime2@qiime2core2021-4:~ conda activate qiime2-2021.4** **(qiime2-2021.4) qiime2@qiime2core2021-4:~ conda install -c conda-forge -c bioconda -c qiime2 -c defaults xmltodict

Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /home/qiime2/miniconda/envs/qiime2-2021.4
added / updated specs:

xmltodict

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
certifi-2021.5.30          |   py38h578d9bd_0         141 KB  conda-forge
xmltodict-0.12.0           |             py_0          11 KB  conda-forge
------------------------------------------------------------
                                       Total:         288 KB

The following NEW packages will be INSTALLED:

xmltodict conda-forge/noarch::xmltodict-0.12.0-py_0

The following packages will be UPDATED:

ca-certificates pkgs/main::ca-certificates-2021.4.13-~ --> conda-forge::ca-certificates-2021.5.30-ha878542_0
certifi pkgs/main::certifi-2020.12.5-py38h06a~ --> conda-forge::certifi-2021.5.30-py38h578d9bd_0

The following packages will be SUPERSEDED by a higher-priority channel:

openssl pkgs/main::openssl-1.1.1k-h27cfd23_0 --> conda-forge::openssl-1.1.1k-h7f98852_0

Proceed ([y]/n)? y

Downloading and Extracting Packages
ca-certificates-2021 | 136 KB | ##################################### | 100%
certifi-2021.5.30 | 141 KB | ##################################### | 100%
xmltodict-0.12.0 | 11 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(qiime2-2021.4) qiime2@qiime2core2021-4:~$ pip install git+https://github.com/bokulich-lab/RESCRIPt.git

Collecting git+https://github.com/bokulich-lab/RESCRIPt.git
Cloning GitHub - bokulich-lab/RESCRIPt: REference Sequence annotation and CuRatIon Pipeline to /tmp/pip-req-build-gz8j8q3b
Running command git clone -q GitHub - bokulich-lab/RESCRIPt: REference Sequence annotation and CuRatIon Pipeline /tmp/pip-req-build-gz8j8q3b
Building wheels for collected packages: rescript
Building wheel for rescript (setup.py) ... done
Created wheel for rescript: filename=rescript-2021.8.0.dev0+2.g5e0c872-py3-none-any.whl size=226039 sha256=ab3cff1c288205c52eb0b077be091cb7adab8868244dc0077c4f3351b6f82d8e
Stored in directory: /tmp/pip-ephem-wheel-cache-el817amb/wheels/a7/2f/ca/a4cfe2ac81c54ea686727a464c1029266233e0df67566e3523
Successfully built rescript
Installing collected packages: rescript
Successfully installed rescript-2021.8.0.dev0+2.g5e0c872

It seemed to be successfully installed but when I commanded for silva data getting with below code, i got the same error massage; in terms of, "Error: qiime2 has no plugin command named 'rescript'.
Could you figure out what is wrong?

qiime rescript get-silva-data --p-version '138' --p-target 'SSURef_NR99' --p-include-species-labels --o-silva-sequences /media/sf_Bac16S/ref_seqa/silva-138-ssu-nr99-seqs.qza --o-silva-taxonomy /media/sf_Bac16S/ref_seqs/silva-138-ssu-nr99-tax.qza

Thanks,

Hee-Sung

timanix · June 8, 2021, 6:29am

Hi, @baehsung
Is it possible that you forgot to run:

qiime dev refresh-cache

after installation?

If yes, please run it and try again. Let us know if it is still giving you an error.

baehsung · June 8, 2021, 4:43pm

Thanks for answering.
I successfully installed rescript, and am making classifier for full-length SSU sequences, using below codes.
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138-ssu-nr99-seqs-derep-uniq.qza
--i-reference-taxonomy silva-138-ssu-nr99-tax-derep-uniq.qza
--o-classifier silva-138-ssu-nr99-classifier.qza

However, I met another plugin error from feature-classifier: unable to allocate 8 GB for an array with shape (1073741824,) and data type float 64.

Could you figure out what is wrong?

Hee-Sung

timanix · June 8, 2021, 5:25pm

Looks like your machine is not strong enough to process it. You need more RAM. You may try to allocate more memory, if you are using a virtual machine, or process it on stronger one.

baehsung · June 10, 2021, 3:42pm

Thanks again for your kind reply.

Now, I am planning to use UF campus cluster (HiperGator) to solve the memory problem, and searching which capacity is fine for us in https://www.rc.ufl.edu/services/rates/. What RAM and CPU are enough to run qiime2, do you think?

Best regards,

Hee-Sung

timanix · June 10, 2021, 6:40pm

Usually 32-64 gb of RAM enough for most of the datasets to process. You also don't need a lot of CPUs. My old laptop with 32 Ram and 8 threads can handle most of my sets. For really big datasets, better to have at least 128 gb of RAM.
Usually, there are only several steps on which you need a lot of RAM, such us taxonomy annotation, denoising, classifier training.

ChrisKeefe · June 11, 2021, 4:17pm

The price sheets you shared were interesting, @baehsung. Do you have to buy exclusive access to dedicated resources? Unless you have a huge data stream and are planning to run these jobs constantly, you might get better bang for your buck from a shared cluster environment where you aren't paying upfront for months of exclusivity. Does UF offer something like that?

baehsung · June 20, 2021, 12:55am

Thanks for reply.

I got a shared cluster from UF HiperGator to continue my work, and sett up MobaXterm in my office computer to connect to that. Now I am installing miniconda in the MobaXterm in order to install qiime2 in it using following codes.

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

However, it seems to have something wrongs (eg., error message, Not found, No file and directory) as shown in below picuture. Could you know what is wrong in installing procedure?

thanks,

timanix · June 20, 2021, 6:14am

Hello @baehsung!
Couple of days ago I successfully installed miniconda on cluster using the following command:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

Not sure if it will work for you on your cluster, but please check it out.

baehsung · June 23, 2021, 8:13pm

Thanks timanix,

I successfully installed miniconda and qiime2 under conda environment. For analyzing my 16S seqs, what i have to do as an initial step? Do I have to get codes necessary for that analysis? If so, would you mind letting me know the steps and/or DOCs describing that? Best regards.

ChrisKeefe · June 24, 2021, 12:32am

@baehsung, please spend some time with the tutorials. There's a lot of good information in there. "Moving pictures" and "parkinson's mouse" are good examples of some of the basic things you can do with 16s in QIIME 2.

baehsung · June 26, 2021, 1:27am

Thanks for all your helps.

Now, I am successfully running qiime2 within the conda environment. I could activate qiime2 with the code "source activate qiime2-2021.4". But I don't know how to leave the running qiime2. I have tried using "deactivate, bash deactivate, or conda deactivate qiime2-2021.4", but they did not work. Do you have code for this?

timanix · June 27, 2021, 7:23pm

Hi, you almost guessed it!
The right one is conda deactivate. The same command is valid for other conda environments as well.

ChrisKeefe · June 28, 2021, 4:10pm

Consider bookmarking this page from the conda documentation, @baehsung. I find it useful very often.

baehsung · June 29, 2021, 7:21pm

Thanks guys for your answer and inform on conda doc.

I successfully made classifier, and made progresses more.
Now, I am in front of alpha and beta diversity analysis, which need to sampling depth.

Wound you mind letting me know how to convert stats.qza to stats.qzv? I want to see the depth thorough qiime2 view.

ChrisKeefe · June 30, 2021, 3:51pm

If you can't find what you're looking for there, please open a new topic for your new question. We try to keep topics on this forum focused on a single question.