RESCRIPT error: Download did not finish. Reason unknown

SoilRotifer · March 23, 2021, 2:18pm

@avtober, I forgot to mention that you should be aware that the help text also has other helpful information, e.g.:

Please be aware of the NCBI Disclaimer and Copyright notice
(Policies and Disclaimers - NCBI), particularly "run
retrieval scripts on weekends or between 9 pm and 5 am Eastern Time
weekdays for any series of more than 100 requests". As a rough guide, if
you are downloading more than 125,000 sequences, only run this method at
those times...

Which could also impact your ability to download data.

One other thought, download your data in chunks as outlined here:

by querying separate taxonomic groups. As an example, I ran the following command to download only Rotifera sequences and it worked:

$ qiime rescript get-ncbi-data \
	--p-query 'txid10190[ORGN] AND (LSU[TITLE] OR 28S[TITLE] or large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE])' \
	--o-sequences ncbi-LSU-rotifera-seqs-unfiltered.qza \
	--o-taxonomy ncbi-LSU-rotifera-taxonomy-unfiltered.qza \
	--verbose

Saved FeatureData[Sequence] to: ncbi-LSU-rotifera-seqs-unfiltered.qza
Saved FeatureData[Taxonomy] to: ncbi-LSU-rotifera-taxonomy-unfiltered.qza

Since you've listed 28S as your LSU of interest, I assumed you only wanted to download data from within the Eukaryota, that is 23S is the LSU for Bacteria / Archaea, which you did not list. Below would be the command for downloading Eukaryote LSU sequences. Note, this may still be too large of a query, and I'd suggest downloading in chunks as mentioned above.

qiime rescript get-ncbi-data \
	--p-query 'txid2759[ORGN] AND (LSU[TITLE] OR 28S[TITLE] or large ribosomal subunit[TITLE] NOT uncultured[TITLE] NOT unidentified[TITLE] NOT unclassified[TITLE])' \
	--o-sequences ncbi-LSU-eukaryota-seqs-unfiltered.qza \
	--o-taxonomy ncbi-LSU-eukaryota-taxonomy-unfiltered.qza \
	--verbose

You can search the NCBI Taxonomy page to figure out what the txid for a given group is.

Finally, you can simply use RESCRIPt to download the LSU data from SILVA (an update was recently pushed to the GitHub code to do this for SILVA ver 138.1). Then you can run the following:

qiime rescript get-silva-data \
	--p-version  '138.1' \
	--p-target 'LSURef_NR99' \
	--p-include-species-labels \
	--p-ranks domain domain superkingdom kingdom subkingdom superphylum phylum subphylum infraphylum superclass class subclass infraclass superorder order suborder superfamily family subfamily genus  \
	--p-rank-propagation \
	--output-dir silva-138.1-LSU

Note, I listed all available taxonomic ranks to be parsed, as I am not sure which would be most helpful for you in this case. You can remove the ranks that you do not need. Any empty ranks will be filled in by the nearest upper-level rank.

-Cheers!