Hi! I was downloading CO1 sequences from NCBI over the weekend and in my output, I got some messages that it didn't download some sequences.
Warnings in Output File (was stated multiple time):
WARNING:2024-09-07 13:45:57,826:LokyProcess-4:Expected 5000 sequences in this chunk, but got 4995. I do not know why, or which sequences are missing.
Warnings at the end of the output file:
WARNING:2024-09-07 14:41:30,989:MainProcess:The following accessions were deleted from the sequence database because there was a problem with their taxonomies: LC277241.1, LC277240.1, LC277239.1, LC277238.1, LC277237.1, LC277236.1, LC277235.1, AP011270.1, AB626856.1, GU987838.1, LC735809.1, MW291683.1, MT491941.1, MW991407.1, MW991406.1, MW830102.1, MW830101.1, MW830100.1, MW830099.1, MW830076.1, MW830075.1, MW830074.1, MW830073.1, MW830072.1, MW830071.1, MW830070.1, MW830069.1, MW830068.1, LC613154.1.
The problematic taxids were: 2821972, 2821967, 2821979, 2821966, 2821965, 2821971, 2821977, 2821978, 2821973, 2821970, 2821969, 2791187, 2821968, 2821976, 0.
My code:
qiime rescript get-ncbi-data
--p-query '(cytochrome c oxidase subunit I[gene] OR cytochrome oxidase subunit 1[gene] OR cytochrome oxidase subunit I[gene] OR COX1[gene] OR CO1[gene] OR COI[gene] OR COXI[gene] NOT environmental sample[Title] NOT environmental samples[Title] NOT environmental[Title] NOT uncultured[Title] NOT unclassified[Title] NOT unidentified[Title] NOT unverified[Title] NOT txid2[ORGN] NOT txid2157[ORGN] NOT txid10239[ORGN])'
--verbose --p-logging-level INFO
--p-n-jobs 5
--o-sequences CO1_sequences.qza
--o-taxonomy CO1_taxonomy.qza
My error file was empty so I just want to make sure that this is fine and I don't need to troubleshoot anything. Thank you!!