RESCRIPT error: Download did not finish. Reason unknown

Hi @avtober,

When this happens, I usually suspect there are a few specific txid queries that are failing. So, I used this simple bash script, to run the qiime rescript get-ncbi-data command for each txid separately.

Save this to a file… say get-txids.sh. Then run by typing bash get-txids.sh

txid_list="554915 2686027 554296 1401294 2608240 3027 2611352 38254 2608109 2611341 2763 2698737 2683617"

for taxa in $txid_list
	do 
		cmd="qiime rescript get-ncbi-data --p-query 'txid$taxa[ORGN] AND (LSU[TITLE] OR 28S[TITLE] OR large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE] NOT unverified[TITLE])' --p-n-jobs 4 --o-sequences ncbi-LSU-seqs-txid$taxa.qza --o-taxonomy ncbi-LSU-taxonomy-txid$taxa.qza"
		echo "Processing : $taxa"
		echo $cmd
		eval $cmd
	done;
	

From here I found that two txids failed:

  • 2611352
  • 2698737

I am not sure why, but when I paste the search strings (below) here, I do get a few results. I’ve not dug deeply into this… but at least we could narrow it down.

  • txid2611352[ORGN] AND (LSU[TITLE] OR 28S[TITLE] OR large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE] NOT unverified[TITLE])
  • txid2698737[ORGN] AND (LSU[TITLE] OR 28S[TITLE] OR large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE] NOT unverified[TITLE])

Any ideas @BenKaehler?

Thanks for looking into that, I took out those two IDs and the download worked. I have also been downloading the rest of Eukaryota in small batches with less than 120,000 sequences, some have worked and some still have the error message ‘download did not finish reason unknown’. It seems very random as to whether they download or not and doesn’t seem to depend on size. Hopefully if I keep trying they may just all work.

Best
Anya

Thanks @avtober, @Nicholas_Bokulich, and @SoilRotifer for already boiling this problem down to a relatively small query.

From those errors (particularly the KeyError) it looks like we’re getting a record back that has a format we haven’t seen before.

I’ll have to debug, so it might take a day or two for me to get back to you.

Thanks @BenKaehler and @SoilRotifer for your help with this. I just have a couple more downloads that are still not working. The first is under 100,000 sequences and I used the following script:

qiime rescript get-ncbi-data *
** --p-query ‘txid189478[ORGN] OR txid147549[ORGN] OR txid205932[ORGN] OR txid147537[ORGN] OR txid451866[ORGN] OR txid129384[ORGN] OR txid136265[ORGN] OR txid2283618[ORGN] OR txid112252[ORGN] AND (LSU[TITLE] OR 28S[TITLE] OR large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE] NOT unverified[TITLE])’ *

** --o-sequences ncbi-LSU-seqs-fungi4.qza **
** --o-taxonomy ncbi-LSU-taxonomy-fungi4.qza **
** --verbose**

when I try to run this i get the following error message:

Traceback (most recent call last):
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/q2cli/commands.py”, line 329, in call
results = action(**arguments)
File “”, line 2, in get_ncbi_data
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 390, in callable_executor
output_views = self._callable(**view_args)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/rescript/ncbi.py”, line 89, in get_ncbi_data
query, logging_level, n_jobs, request_lock, _entrez_delay)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/rescript/ncbi.py”, line 365, in get_nuc_for_query
for chunk in range(0, expected_num_records, 5000))
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 1044, in call
while self.dispatch_one_batch(iterator):
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 859, in dispatch_one_batch
self._dispatch(tasks)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 208, in apply_async
result = ImmediateResult(func)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 572, in init
self.results = batch()
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 263, in call
for func, args, kwargs in self.items]
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 263, in
for func, args, kwargs in self.items]
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/rescript/ncbi.py”, line 342, in _get_query_chunk
raise RuntimeError(‘Download did not finish. Reason unknown.’)
RuntimeError: Download did not finish. Reason unknown.

Plugin error from rescript:

Download did not finish. Reason unknown.

See above for debug info.

The second download is larger at 123,000 sequences and I used the following script:

qiime rescript get-ncbi-data *
** --p-query ‘txid6960[ORGN] AND (LSU[TITLE] OR 28S[TITLE] OR large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE] NOT unverified[TITLE])’ *

** --p-n-jobs 5 **
** --o-sequences ncbi-LSU-seqs-metazoa1.qza **
** --o-taxonomy ncbi-LSU-taxonomy-metazoa1.qza **
** --verbose**

For this one it looks like I get a slightly different error message:

Traceback (most recent call last):
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/q2cli/commands.py”, line 329, in call
results = action(**arguments)
File “”, line 2, in get_ncbi_data
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 245, in bound_callable
output_types, provenance)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 390, in callable_executor
output_views = self._callable(**view_args)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/rescript/ncbi.py”, line 89, in get_ncbi_data
query, logging_level, n_jobs, request_lock, _entrez_delay)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/rescript/ncbi.py”, line 365, in get_nuc_for_query
for chunk in range(0, expected_num_records, 5000))
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 1054, in call
self.retrieve()
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/parallel.py”, line 933, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/site-packages/joblib/_parallel_backends.py”, line 542, in wrap_future_result
return future.result(timeout=timeout)
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/concurrent/futures/_base.py”, line 432, in result
return self.__get_result()
File “/mnt/scratch/nodelete/sbiat4/qiime_conda/envs/rescript/lib/python3.6/concurrent/futures/_base.py”, line 384, in __get_result
raise self._exception
RuntimeError: Download did not finish. Reason unknown.

Plugin error from rescript:

Download did not finish. Reason unknown.

Any thoughts on this would be greatly appreciated, I am so close to having all the database downloaded!

Anya

Hi @BenKaehler, did you manage to get anywhere with the KeyError formatting issue?

Many thanks
Anya

Hi @avtober I spoke with @BenKaehler about this earlier this week and he is still investigating. Thanks for your patience!

I believe NCBI RefSeqs has an LSU reference set… it would be smaller and pre-curated, so might be a good start that should work with a simpler query (basically just the project ID you can get from the NCBI RefSeqs website, see the RESCRIPt tutorial for the 16S SSU RefSeqs example). Just an idea that might get you moving while Ben investigates this issue…

1 Like

Hi @Nicholas_Bokulich, thanks for that suggestion, I did have a look but it seems the LSU RefSeqs are only for fungi and I am looking for eukaryotes really (parasites in snails) but would also like to include fungi and bacteria just incase. Thank you both for all your help with this, I am happy to wait and work on other things for now.

Best
Anya

2 Likes

In that case, why not use the LSU from SILVA? See here:

You can use qiime rescript get-silva-data to download the latest v138.1 LSU database. Just set --p-version 138.1 and --p-target LSURef_NR99 or --p-target LSURef. You can follow along the in the above linked tutorial to perform further filtering and curation of the reference database.

Hi @SoilRotifer, I have already tried the SILVA LSU NR99 and full database which did work, however both databases do not have all the parasite species that I am looking for. The NCBI database has a lot more parasite sequences for 28S. If I cannot get the whole NCBI database my other thought was to just supplement the SILVA databases with some of the parasite sequences from NCBI.

I am not sure exactly how to do this yet, I guess I would just download the individual sequences and copy them into the SILVA database. Is there a tutorial on this somewhere?

Best
Anya

Add all of your sequences into a single FASTA file, and the associated taxonomy into another file. Then, you can simply import your sequence and taxonomy data into QIIME then merge these with the SILVA LSU qza files as you would have done when downloading separate chunks of data from genbank:

-Mike

It’s a good thought, but you would need to make sure that the taxonomies align (e.g., use the same lineage names and conventions). If it’s a matter of adding a few species, it is probably not a big deal to manually do this. But it will be challenging if you have a large number… if so, it will probably be better (less time/effort!) to wait than to manually stitch these together.

Hopefully @BenKaehler will be able to track down why NCBI is hanging up in this case.

1 Like

Hi everyone, sorry for the slow turnaround on this one.

I have tweaked get-ncbi-data so that it now accommodates the NCBI weirdness that you found.

Once this PR is merged you should be good to go.

1 Like

Hi @BenKaehler, thank you so much for that, I will give it another try next week and let you know how it goes.

Best
Anya

Thanks @avtober!

We are having some minor versioning issues merging that PR.

I will let you know when that’s resolved, but in the meantime the version in my fork works with qiime2-2021.2.

You can grab it by running the standard installation instructions until you get to

pip install git+https://github.com/bokulich-lab/RESCRIPt.git

at which point you should run

pip install git+https://github.com/BenKaehler/RESCRIPt.git

instead.

2 Likes

Hi @BenKaehler,

Thanks for that, I am currently using version 2020.11. Will I need to download the latest version or should it work on all versions?

Thanks
Anya

The current version of RESCRIPt only works with 2021.2 and later.

1 Like

Hi @SoilRotifer and @BenKaehler, just letting you know that I tried the updated RESCRIPt and I have no managed to download the whole LSU NCBI database (in chunks), so just have to merge them all back together now. Thank you both for all your help with this!

Best
Anya

2 Likes

That’s great news @avtober! Thank you for letting us know :pray:. We should have all of this incorporated into the next RESCRIPt release once QIIME 2 (2021.4) goes live.

1 Like