I am looking for a way to create files including a taxonomy and the ref-seq for a list of about 1900 plants I have (I only have the species names for the moment). I first tried to use the PLANiTS and UNITE reference databases but less than 50% of my plant list is present in these databases.
These files will be used for the taxonomic classification (with Naive Bayes classification) of my sample. The primer used is the ITS region (more precisely ITS1-u1 / ITS1-u2) and my sample is composed at least of fungi and plant species.
I found the RESCRIPt pipeline to get the NCBI database but I am not sure how to specify in the p-query what I am looking for.
The command I used is the following (I only show here a short part of the query I wrote):
qiime rescript get-ncbi-data
–p-query ‘Blepharis ciliaris [ORGN] Nucleotide OR
Acorus calamus [ORGN] Nucleotide OR
Sambucus nigra [ORGN] Nucleotide OR
Viburnum lantana [ORGN] Nucleotide OR
Scilla siberica [ORGN] Nucleotide’
But the resulting taxonomy is completely wrong (I compared the taxonomy for the same feature ID I had with the UNITE and PLANiTS databases and it did not match).
I use the Qiime2-2021.2 version (installed by the Conda environment).
Thank you for reading me,