Taxonomy file creation query

Hello, All,

I want to create customized database for analysing bacterial 16srRNA sequence data, I have downloaded NCBI REF-seq 21,789 16srRNA sequence. could you please guide me how to create taxonomy txt file for this 21,789 sequence

Is there any method or coding to create taxonomy file for 21,789 sequence rather than manually?

is it mandatory to give all level of taxonomy ( Kingdom to species) or geneus and species name is enough for running in qiime?

we have sequenced only v3-v4 region for our bacterial sample. Now, I have download whole 16srRNA sequence, is it mandatory to pick only v3-v4 target region alone from the sequence or could I use entire 16srRNA sequence for analysis in qiime? Using the entire 16srRNA sequence affect the analysis or not?

Hi @Asha1,

Yes, there is a new QIIME 2 plugin that will do this for you. We have a tutorial for it on this forum:

Nothing is mandatory but only genus and species is probably not enough for good classifications.

see the classifier training tutorial at docs.qiime2.org for more discussion of this. Short answer is you do not need to trim to the target region, but it is usually beneficial. And why not trim? It is a single command in q2-feature-classifier and a little more time to wait :stopwatch:

2 Likes

Thank so much for your reply sir. I owe you a lot sir

As per your suggestion, I have installed rescript and ran the following command line “qiime rescript get-ncbi-data --p-query singleline2.fasta --o-sequences ncbi-refseqs-unfiltered.qza --o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza”

In -p query options I gave my ncbi refseq fasta file after converting all the sequence in to single line.

I am getting the following error

Plugin error from rescript:

Taxonomy format requires at least one row of data.

Debug info has been saved to /tmp/qiime2-q2cli-err-x9a_zcsl.log

I got one doubt do I have to create ncbi-refseqs-unfiltered.qza and refseqs-taxonomy-unfiltered.qza separately or this plugin will generate?

If want to create mean, I have fasta sequence alone. How to download the taxonomy file for the 33175[BioProject] OR 33317[BioProject]. I didnt get taxonomy file for this id in NCBI taxonomy database.

kindly help me

Hi @Asha1,
The command you are using is not really following the tutorial I sent you before... that tutorial gives the exact instructions you need to build a classifier on NCBI-Refseqs 16S.

The error you are getting is precisely because you are inputting an improperly formatted sequence file as metadata, which will not work.

Follow the instructions and you will get both. If you don't follow the instructions I cannot help...

Thank so much for your reply sir. I have finished creating the customised database. data.tsv (476 Bytes)

Please tell me can I proceed my further data analysis with this classifier?. Classifier performance is good or not sir?

Quick, wasn't it :smile:

that classifier can now be used to classify other sequences.

you should see the cited paper in that tutorial to learn how to assess database quality (this is really subjective, depends on the input data and domain knowledge), and also for a test of the full RefSeq classifier.

1 Like

A post was split to a new topic: fungal analysis: most sequences are assigned to “unidentified fungus”

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.