HI @TLBR007,
I guess my reply got sent out before I deleted it. I had realized that you would run into these issues within a few minutes of my post.
The problem I realized, as you've noted, is that the assemblies are not accessible within the nucleotide database. I was going to update my reply to say that you'd have to extract the Accessions from the headers of the downloaded FASTA files. Here is an updated version of my reply, in case others come across this post:
RESCRIPt cannot be directly used for this. Though it should be possible to prepare the data for use in QIIME 2 and RESCRIPt.
I'd recommend pulling microbial genomes from RefSeq. Basically, if you look here you can click on the links that lead to the various clades of Bacterial and Archaeal Assemblies. That is, if you click on the "Assemblies" link for "Acidobacteria", it'll take you here.
Then simply download the FASTA files of all the assemblies for each clade. Then you can merge them all into a single FASTA. Or you can simply use the search term to download all RefSeq Prokaryotic genomes:*
(txid2[orgn] OR txid2157[orgn]) AND "latest_refseq"[Properties]
But this is about 127 GB of data! So, I'd not recommend doing this for fear of a failing internet connection! If you do, I'd highly recommend doing this during NIH "off-hours". Better yet, I'd simply stick with the approach of downloading each clade of genome assemblies individually, and then merge them.
After these are in one big FASTA file you should be able to run qiime feature-classifier extract-reads ...
using your primer sequences, to extract the operon region from these genomes. Actually, it might be better to run this command separately on each clade's FASTA file, as it can take a very long time for the extract-reads
command to process a single large file. Then again, you can use multiple-cpu-threads for the extraction... Anyway, depending on how the genomes are linerarized this may or may not work.
The main issue is fetching the taxonomy. That is, you'd need to pull the accessions from the FASTA headers of the downloaded assemblies. Which would mean downloading and parsing NCBI's taxonomy on your own somehow....
Perhaps you can use one of the following tools to extract a fixed-rank taxonomy based on the accessions within the assembly FASTA files?
-Mike