Hi Rob,
You're very right about it not being exactly defined - it is biology, after all! ![]()
The nucleotide numbering commonly used to name primers comes from Escherichia coli. Not only do the sequences of the hypervariable regions vary, but their lengths vary too, so the absolute positions of the hypervariable regions vary between different taxa.
To answer your last question, I don't know a source that would already have all that information for all the taxa that you're interested in, but because of the length diversity, in my opinion, your best bet for accurately handling these sequences is using relevant (degenerate) primer sequences and RESCRIPT. You could also make a multiple sequence alignment and extract the relevant regions like that, but I haven't tried that before.
As an illustration:
When I was considering truncation lengths to use for DADA2, I wanted to know how long the V3-V4 region is, so I could ensure I had enough overlap between forward and reverse reads. Using the most recent SILVA database, RESCRIPT, and the primers for V3-V4 regions, I extracted the V3-V4 regions from the 16S sequences. From there I was able to estimate the variation of lengths between different taxonomic groups. For example, the V3-V4 regions from the class Clostridia were around 404 nt long (mode of the length distribution), whereas for the class Bacilli, the mode length was around 429 nt.
I hope this helps!
All the best,
Marko