Hi @Tessa_M ,
Great question!
Maybe yes, but probably not in a simple way. The first block of code is specifying which sequences to download, using an entrez query. The txid33208
part is the key part, specifying that all metazoa sequences should be downloaded that fit the following criteria (12S gene, etc). txid33208
is the NCBI taxonomy ID (taxid) for metazoa.
I am not sure if it is possible to use a location keyword with Entrez — probably not — and even if you could, it would require the accessions to have that metadata entered... and species with a broader distribution (e.g., invasive species in Australia!) might not be included.
So I think that this would be quite complicated to modify this code block to only get sequences from species found in Australia. You would need to make an exhaustive list of all species found (and/or clades), and then look up the NCBI taxid for these and use those taxids instead of txid33208
.
Something else you could try doing is follow the instructions at that tutorial to download a 12S sequence database, and then use taxonomic weights to instruct the classifier which species are more likely to be found in Australia, and downweight others). This would also be very complicated... you would need to create a feature table with all taxa found in your 12S database and the probability of finding these. Here is a tutorial describing how to do this with 16S (for which the process can be automated, because the class weights are based on existing observations) but for your use case you would need to manually assemble the class weights (as I assume that observation frequency data are lacking for 12S in Australia):
So unfortunately creating an Australia-only database will be heaps of work one way or another, as it will require manual curation of either the sequences or the class weights. If you do not have the time for this, just use the complete 12S database and then manually curate the assignments if you see any hits to species that should not be detected in Australia. This would be a faster but less specific approach.
Good luck!