rescript get-ncbi-data ‘dict’ object has no attribute ‘add’

Hello,

Thank you for such tool. It is very useful. I have walked over tutorial for 16S, yes it works, but then I tried to get ITS data with command below.

qiime rescript get-ncbi-data \
    --p-query '177353[BioProject]' \ 
    --o-sequences ncbi-its-seqs.qza \ 
    --o-taxonomy ncbi-its-taxa.qza \ 
    --verbose

This is the error I got.

Traceback (most recent call last):
File "/home/bmlab/anaconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in get_ncbi_data
File "/home/bmlab/anaconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/bmlab/anaconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home/bmlab/anaconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/rescript/ncbi.py", line 82, in get_ncbi_data
taxids, ranks, rank_propagation, entrez_delay)
File "/home/bmlab/anaconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/rescript/ncbi.py", line 255, in get_taxonomies
missing_taxids.add(taxid)
AttributeError: 'dict' object has no attribute 'add'
Plugin error from rescript:
'dict' object has no attribute 'add'
See above for debug info.

Isn't the tool supposed to get data from any BioProject just as 16S? I want to build a database for ITS region. Could you point me to the reason of this failure?

Edit: Well... The command below is not working for me either.

Edit2: I might be complicating things, but I think I need to share this one, too. Switching ' with " triggers a different error.

Hi @the_dummy,

I’ve tried running these commands, and initially I ran into the same errors too. However, I adjusted the --p-entrez-delay by doubling the default value, and it worked. This leads me to think that the issue in on the NCBI-end. :man_shrugging:

The following worked (with and without the --p-n-jobs argument):

qiime rescript get-ncbi-data     \
    --p-query '33175[BioProject] OR 33317[BioProject]'   \
    --o-sequences ncbi-refseqs-unfiltered.qza. \ 
    --o-taxonomy ncbi-refseqs-taxonomy-unfiltered.qza \
    --p-entrez-delay 0.668 \
    --p-n-jobs 4

Saved FeatureData[Sequence] to: ncbi-refseqs-unfiltered.qza
Saved FeatureData[Taxonomy] to: ncbi-refseqs-taxonomy-unfiltered.qza

Give that a try and let us know if it works out.

-Cheers!
-Mike

2 Likes

Hey @the_dummy,
Just to follow on @SoilRotifer’s comments to clarify — we see various server-side issues with NCBI so sometimes using get-ncbi-data can be a bit bumpy depending on:

  1. the time of day
  2. the size/content of your query
  3. which way the wind is blowing (just kidding :wink:)

Usually trying again later works (e.g., if you are trying to run the job during peak hours) and we have some pending changes to rescript to make these failures more graceful in the future.

Good luck!

2 Likes

There is no --p-n-jobs argument. I use the version stated below. Is there another version that has --p-n-jobs argument?

QIIME 2 Plugin 'rescript' version 2020.11.0.dev0+6.g836422d (from package 'rescript' version 2020.11.0.dev0+6.g836422d)

Time zone difference corresponds to 10 hours for USA and my country, so my active hours should be somewhat quiet and still it fails. The weird thing is, error message differs for each attempt. I can send error logs if it matters.

I will build database from this BioProject some other way and try rescript after updates. I will let you know how that goes if anyone is interested.

Thank you.

Oops.. sorry @the_dummy, I was running in my test environment and not the live release. Have you tried altering the delay? Did that work?

-Mike

1 Like

Yes, I tried altering the delay with many different values between 0.1 and 300. Unfortunately, it did not work.

Updates to original tutorial page solved the problem. I tried using --p-n-jobs 5 at the suggested time window and it worked.

Thank you all.

1 Like