Hi @lastewart & @Micro_Biologist,
You can modify the approach outlined here, by replacing the initial step with:
! qiime rescript get-ncbi-data \
--p-query "txid33208[ORGN] AND (cytochrome c oxidase subunit 1[Title] OR cytochrome c oxidase subunit I[Title] OR cytochrome oxidase subunit 1[Title] OR cytochrome oxidase subunit I[Title] OR COX1[Title] OR CO1[Title] OR COI[Title]) AND mitochondrion[Filter] NOT environmental sample[Title] NOT environmental samples[Title] NOT environmental[Title] NOT uncultured[Title] NOT unclassified[Title] NOT unidentified[Title] NOT unverified[Title]" \
--p-ranks kingdom phylum class order family genus species \
--p-rank-propagation \
--p-n-jobs 1 \
--o-sequences COI-ref-seqs.qza \
--o-taxonomy COI-ref-tax.qza \
--verbose
Note: this example downloads metazoa. As there is a lot of data, I'd recommend that you download in chunks of taxonomic groups. For example you can then replace the txid
numbers with the taxonomic groups you think are relevant for your reference database, e.g.:
- fungi:
txid4751[ORGN]
- rhodophyta:
txid2763[ORGN]
- alveolata:
txid33630[ORGN]
- viridiplantae:
txid33090[ORGN]
- stramenopiles:
txid33634[ORGN]
- rhizaria:
txid543769[ORGN]
- etc...
Then use qiime feature-table merge-seqs ...
and qiime feature-table merge-taxa ...
on the outputs. Then proceed with the rest of the tutorial linked above adjusting all the downstream commands as appropriate.