Skip to content

Changelog

Unreleased

  • repack command: two-pass taxonomy-grouped reshard for fast per-taxon NFS access
  • Phase 1 reads only GenomeDirEntry arrays (~300 MB) instead of full blobs; builds complete routing table in minutes
  • Phase 2 sorts all genome records by (taxonomy, oph_fingerprint) in memory
  • Phase 3 single-pass decompress + route with OMP parallelism; smart cap eviction (largest writer only) minimises shard fragmentation
  • --taxonomy-rank g|f flag (genus default, family fallback)
  • -m / --max-memory flag to control eviction threshold
  • OpenMP parallel decompression in repack Phase 3

0.1.0

  • Initial release: single-file .gpk format
  • build, merge, stat, extract, slice, add, rm, taxonomy, taxdump, similar, cidx, reindex, dedup commands
  • SHRD / CATL / GIDX / ACCX / CIDX / TAXN / TXDB / KMRX / HNSW / TOMB sections
  • Columnar catalog with row-group predicate pushdown
  • HNSW approximate nearest-neighbour index on KMRX profiles
  • NCBI taxdump export (names.dmp / nodes.dmp / acc2taxid.dmp)
  • Columnar binary taxonomy export (acc2taxid.bin / taxnodes.bin)
  • Distributed build scripts
  • ScanEngine with I/O / worker thread separation (NFS and NVMe modes)
  • MEM-delta codec (k=31 seed + zstd verbatim residue)