Hello
I try to follow [Using RESCRIPt to compile sequence databases and taxonomy classifiers from NCBI Genbank] for STIRRUPS. There are indeed 973 accessions (with some having no "id" and "GenBank_Accession_Number".). I converted the download to .txt and named it "stirrups-accessions.txt" accordingly.
When running
qiime rescript get-ncbi-data
--p-query '33175[BioProject] OR 33317[BioProject]'
--m-accession-ids-file stirrups-accessions.txt
--o-sequences ncbi-refseqs-unfiltered.qza
--o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza
(understandably), I got the following error:
There was an issue with loading the file stirrups-accessions.txt as metadata:
There was an issue with loading the metadata file:
Metadata IDs must be unique. The following IDs are duplicated: '-'
So I just deleted those rows with missing "id" and "GenBank_Accession_Number" and reran the above script. This time, I got the following error:
Plugin error from rescript:
Partial download. Expected 939 records, but got 938.
More than 10 ids were missing. Ten were: 219857437, 125487083, 265678780, 265679029, 343200178, 285162865, 307816494, 301072779, 116054477, 10862897.
How to solve this issue? What is the correct way to solve those rows with missing "id" and "GenBank_Accession_Number"?
Thank you.
Best Regards
Stephanie