I would like to subsample a large set of genomes with genome sampler.
For the subsampling, currently I use the following pipeline:
qiime genome-sampler sample-longitudinal
qiime genome-sampler sample-diversity \ --i-context-seqs filtered-context-seqs.qza \ --p-percent-id 0.9995 \ --o-selection diversity-selection.qza qiime genome-sampler sample-neighbors \ --i-focal-seqs filtered-focal-seqs.qza \ --i-context-seqs filtered-context-seqs.qza \ --m-locale-file context-metadata.tsv \ --m-locale-column location \ --p-percent-id 0.9999 \ --p-samples-per-cluster 3 \ --o-selection neighbor-selection.qza
What I am able to do is changing the percent-id at sample-diversity and check how many sequences I have after subsampling. My goal is to tell genome-sampler that it should choose a certain number (e.g. 700) of sequences or at least an upper bound for this number. Is it possible to do so?