A spatial index (HNSW) finds each genome's 10 nearest neighbours. The average distance to those neighbours gives an
isolation score — high values flag rare lineages and contamination candidates. The k-NN graph's minimum spanning tree (Kruskal's algorithm) is then computed: its longest edge is the natural inter-strain scale at which the population becomes connected, and sets the diversity threshold automatically.
No fixed ANI threshold to tune — the threshold is inferred from the shape of the population itself.
Lines show nearest-neighbour connections; the most isolated genome becomes the first representative.