I want to create customized database for analysing bacterial 16srRNA sequence data, I have downloaded NCBI REF-seq 21,789 16srRNA sequence. could you please guide me how to create taxonomy txt file for this 21,789 sequence
Is there any method or coding to create taxonomy file for 21,789 sequence rather than manually?
is it mandatory to give all level of taxonomy ( Kingdom to species) or geneus and species name is enough for running in qiime?
we have sequenced only v3-v4 region for our bacterial sample. Now, I have download whole 16srRNA sequence, is it mandatory to pick only v3-v4 target region alone from the sequence or could I use entire 16srRNA sequence for analysis in qiime? Using the entire 16srRNA sequence affect the analysis or not?
Yes, there is a new QIIME 2 plugin that will do this for you. We have a tutorial for it on this forum:
Nothing is mandatory but only genus and species is probably not enough for good classifications.
see the classifier training tutorial at docs.qiime2.org for more discussion of this. Short answer is you do not need to trim to the target region, but it is usually beneficial. And why not trim? It is a single command in q2-feature-classifier and a little more time to wait
Thank so much for your reply sir. I owe you a lot sir
As per your suggestion, I have installed rescript and ran the following command line “qiime rescript get-ncbi-data --p-query singleline2.fasta --o-sequences ncbi-refseqs-unfiltered.qza --o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza”
In -p query options I gave my ncbi refseq fasta file after converting all the sequence in to single line.
I am getting the following error
Plugin error from rescript:
Taxonomy format requires at least one row of data.
Debug info has been saved to /tmp/qiime2-q2cli-err-x9a_zcsl.log
I got one doubt do I have to create ncbi-refseqs-unfiltered.qza and refseqs-taxonomy-unfiltered.qza separately or this plugin will generate?
If want to create mean, I have fasta sequence alone. How to download the taxonomy file for the 33175[BioProject] OR 33317[BioProject]. I didnt get taxonomy file for this id in NCBI taxonomy database.
that classifier can now be used to classify other sequences.
you should see the cited paper in that tutorial to learn how to assess database quality (this is really subjective, depends on the input data and domain knowledge), and also for a test of the full RefSeq classifier.