I would like to subsample a large set of genomes with genome sampler.
For the subsampling, currently I use the following pipeline:
qiime genome-sampler sample-longitudinal
–i-context-seqs filtered-context-seqs.qza
–m-dates-file context-metadata.tsv
–m-dates-column date
–o-selection date-selection.qza
qiime genome-sampler sample-diversity \
--i-context-seqs filtered-context-seqs.qza \
--p-percent-id 0.9995 \
--o-selection diversity-selection.qza
qiime genome-sampler sample-neighbors \
--i-focal-seqs filtered-focal-seqs.qza \
--i-context-seqs filtered-context-seqs.qza \
--m-locale-file context-metadata.tsv \
--m-locale-column location \
--p-percent-id 0.9999 \
--p-samples-per-cluster 3 \
--o-selection neighbor-selection.qza
What I am able to do is changing the percent-id at sample-diversity and check how many sequences I have after subsampling. My goal is to tell genome-sampler that it should choose a certain number (e.g. 700) of sequences or at least an upper bound for this number. Is it possible to do so?