Sequences not retrieved via Rescript

Hi! I was downloading CO1 sequences from NCBI over the weekend and in my output, I got some messages that it didn't download some sequences.

Warnings in Output File (was stated multiple time):
WARNING:2024-09-07 13:45:57,826:LokyProcess-4:Expected 5000 sequences in this chunk, but got 4995. I do not know why, or which sequences are missing.

Warnings at the end of the output file:
WARNING:2024-09-07 14:41:30,989:MainProcess:The following accessions were deleted from the sequence database because there was a problem with their taxonomies: LC277241.1, LC277240.1, LC277239.1, LC277238.1, LC277237.1, LC277236.1, LC277235.1, AP011270.1, AB626856.1, GU987838.1, LC735809.1, MW291683.1, MT491941.1, MW991407.1, MW991406.1, MW830102.1, MW830101.1, MW830100.1, MW830099.1, MW830076.1, MW830075.1, MW830074.1, MW830073.1, MW830072.1, MW830071.1, MW830070.1, MW830069.1, MW830068.1, LC613154.1.
The problematic taxids were: 2821972, 2821967, 2821979, 2821966, 2821965, 2821971, 2821977, 2821978, 2821973, 2821970, 2821969, 2791187, 2821968, 2821976, 0.

My code:
qiime rescript get-ncbi-data
--p-query '(cytochrome c oxidase subunit I[gene] OR cytochrome oxidase subunit 1[gene] OR cytochrome oxidase subunit I[gene] OR COX1[gene] OR CO1[gene] OR COI[gene] OR COXI[gene] NOT environmental sample[Title] NOT environmental samples[Title] NOT environmental[Title] NOT uncultured[Title] NOT unclassified[Title] NOT unidentified[Title] NOT unverified[Title] NOT txid2[ORGN] NOT txid2157[ORGN] NOT txid10239[ORGN])'
--verbose --p-logging-level INFO
--p-n-jobs 5
--o-sequences CO1_sequences.qza
--o-taxonomy CO1_taxonomy.qza

My error file was empty so I just want to make sure that this is fine and I don't need to troubleshoot anything. Thank you!!

1 Like

Hi @paperwolf ,

I do not see anything abnormal here. RESCRIPt is just reporting that some accessions were not downloaded because they had errors. But it sounds like the bulk of your request was downloaded properly.

See here:

Evidently the accession numbers (below) are retrieved with some of your keywords, but they do not have valid taxids and hence are deleted.

That would be these:

So it sounds like the download worked as intended, and you still downloaded most of your sequences, but some were excluded due to errors with their annotations.

Good luck!

2 Likes

Thank you for your help! I just wanted to confirm I was understanding everything correctly! Thank you!!