Hi @zsggq2006,
Thanks for your interest in genome-sampler
!
First, just for your information, we created a new genome-sampler release for the first time in a few years last week, so it may be worth updating if you're using an old version. More information on this here.
I would like to know the minimum information needed for the metadata.
You can find the format described here. Technically speaking, there is no metadata required beyond the identifier column.
Additionally, my sequence sampling only includes the year information.
That should technically still work, as long as you have the four digit year included in the column that you're providing as the dates
column to sample-longitudinal
. Try providing something like the following to your sample-longitudinal
command:
--p-samples-per-interval 10 --p-days-per-interval 365
This should select 10 samples per year, and you can change the 10
to whatever value you'd like.
As far as I know, no one has used genome-sampler this way, but I did some testing to confirm that this will work and it looks like it will. Let me know if you run into problems.
Also, note that genome-sampler
was designed to work on viral genome sequences (SARS-CoV-2 specifically). If you're working with genomes that are longer than about 23k bases, the sample-diversity
command almost certainly won't work for you.