Import reference sequences database and train classifier for mcrA sequences

Hi everybody,

I am in seq analysis of mcrA that is a functional gene related to methanogenesis.
The mcrA seqs were filtered through DADA2, getting feature table, rep-seqs already.

Now I want to analyze the taxonomy of them. I would appreciate if somebody give me tips; more specifically, how to get reference seq database and make the classifier for mcrA.

Thanks,

Hee-Sung

Hi @baehsung,

I'd recommend trying out our super awesome RESCRIPt plugin.This plugin will enable you to construct and curate your own marker gene reference set. You can start by reading through our get-ncbi-data tutorial. For another example checkout the notebook for making a 12S rRNA reference set.

-Cheers!
-Mike

1 Like

Thanks Mike!

I read the tutorial that you suggested. Even though, I could not understand all, but try to download mcrA sequeces from NCBI first. Recently i updated qiime2 to version 2022.2. Do I need to reinstall RESCRIPt in the updated qiime2? I had installed RESCRIPt in the old version of qiime2 (2021.11).

Hee-Sung

1 Like

I'd start with the 12S notebook and simply replace the gene name terms (there may be a few different terms to search for..). If you are able to get that to work we can help you refine your reference sequence database preparation. :hammer_and_wrench:

Yes, plugins must be installed for each qiime environment to be available.

Keep us posted! :slight_smile:

1 Like

Hi Mike,

I tried to retrieve mcrA seqs from ncbi Entrez with key words "methyl coenzyme m reductase alpha subunit mcrA" and "euryarchaeotes", which selected 1637 seqs.
if i want to retrieve those seqs, how to make --p-query [text]?

When i used --p-query ((methyl coenzyme m reductase alpha subunit mcrA) AND "euryarchaeotes"[porgn:__txid28890] as shown in the query box, it made a problem with the command Got unexpected extra argyment..

1 Like

You need to place everything within quotes like so:

--p-query ' ((methyl coenzyme m reductase alpha subunit mcrA) AND "eukaryotes" ... '

When you have quotes as part of your query search term you have to use a different quote type to encompass the entire search string. In this case I am using single-quotes: ' so that we can make use of the double-quotes " within the search string.

-Mike

Thanks Mike,

I entered pligin as below;

qiime rescript get-ncbi-data
--p-query ' "methyl coenzyme m reductase alpha subunit mcrA" AND "euryarchaeotes" '
--p-ranks domain .....species
--p-rank propagation
--o-sequences ....
--o-taxonomy ....
--verbose

and Plugin error from rescript and attached picture.

do you have a comment for me to solve this error problem?

Best regards,

Hee-Sung

Hi @baehsung,

The issue is occuring due to an incorrect query statement. I suggest you read NCBI's documentation on composing queries. I was successfully able to run this command locally (below):

qiime rescript get-ncbi-data \
	--p-query '(methyl coenzyme m reductase alpha subunit OR mcrA) AND txid28890[ORGN]' \
	--p-ranks domain superkingdom kingdom phylum class order family genus species \
	--p-rank-propagation \
	--o-sequences mcrA-seqs.qza \
	--o-taxonomy mcrA-tax.qza \
	--verbose

If you are searching for a particular taxonomic group I suggest you always provide a txid statement, e.g. txid28890[ORGN] which basically means, "return Euryarchaeota". You can search the NCBI Taxonomy resource to determine the txid numbers associated with a given taxonomic group.

Also note the OR statement contained within the (), and the AND statement. Breaking down the query, we are basically saying that we'd like records that :

  • ( are annotated as either methyl coenzyme m reductase alpha subunit OR mcrA)
  • AND the record must be from txid28890[ORGN] # i.e. Euryarchaeota
1 Like

Hi Mike,

I am so happy that I could get get the mcrA seqs and tax with your advice, thanks very much for this.

The seqs that I retrieved are including the seqs from uncultured strains, which may interrupt the classification of my seqs. Could I exclude those ones by changing the code of --p-query?

Cheers,

Hee-Sung

Yes you can exclude items from a search using commands like NOT. See the 12S notebook example and the NCBI documents I referred to earlier in this thread.

-Mike

thanks Mike,

I did it as advised by you !!

Hee-Sung

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.