Changelog

All notable changes to PyHDC are documented here. The project follows Semantic Versioning and Keep a Changelog conventions.

The source is CHANGELOG.md on GitHub.


v2.1.0: 2026-06-18

Added

  • Multi-dimensional (D, N, M) batches. enc.generate(size=(D, N, M)) returns one Hypervector wrapping a (D, N, M) array; axis 0 is the dimension D and every trailing-axis slice is a hypervector.

  • axis= keyword on bundle(): reduce a chosen batch axis (an int or a tuple of ints) and return a single Hypervector. axis=None reduces the last axis, so (D, N) collapses to (D,) and (D, N, M) collapses to (D, N). Axis 0 is the dimension and cannot be reduced; passing axis=0 raises ValueError.

  • axis= keyword (keyword-only) on similarity(): for a single (D, N, M, ...) batch, selects which batch axis splits index 0 from the rest.

  • bind() and unbind() batch automatically. The element-wise binders (MAP multiply, BSC xor, FHRR angle add/sub) broadcast a batch natively: a (D,) key binds against every column, and operands of mixed rank align by trailing-axis padding. Every other binder (circular convolution/correlation, shifting, segment shifting, matrix binding, VTB, context-dependent thinning) is applied per column internally, so a batched bind(A, B) returns one (D, N) Hypervector without batch_dim.

  • Two-input similarity() broadcasting over trailing axes: the result shape is the broadcast of the two operands’ non-dimension axes. Two (D,) vectors return a Python float; every other pairing returns an array.

  • First-class permute(), inverse(), negative(), and normalize() on Encoding, mirrored as methods on Hypervector (permute(), inverse(), negative(), normalize()). permute is defined for every encoding (cyclic shift along axis 0); inverse / negative / normalize are wired per family and raise NotImplementedError where a family does not define them.

  • Operator overloading on Hypervector: + (bundle), * (bind), / (unbind), ~ (inverse), >> (permute by +k), << (permute by -k). A non-Hypervector operand to + * / returns NotImplemented and Python raises TypeError; a bool shift on >> / << is rejected.

  • Module-level permute(), inverse(), negative(), normalize(), and unbind(), joining the existing generate(), zeros(), bundle(), bind(), and stack().

  • BSDC_THIN is now reachable directly from the top level (previously only via pyhdc.encodings); all 15 encodings are exported at the top level.

Changed (breaking)

  • The misspelled BernoulliBiploar element generator is renamed to BernoulliBipolar; the old name is removed. Any direct import of the old name in pyhdc.components.elements must be updated. The MAP_I, MAP_I_Bits, and MAP_B encodings that use it are unchanged in behavior.

Migration guide:

# The element generator was misspelled; import the corrected name.
from pyhdc.components.elements import BernoulliBipolar   # was BernoulliBiploar

Changed

  • Vectorized fast path for batched i.i.d. generation: with the default i.i.d. element generators (Bernoulli bipolar/binary, uniform bipolar/angles, normal real, Bernoulli sparse), generate(size=(D, N)) draws the whole batch in one (D, *batch) call. It is reproducible under a fixed seed for a given batch shape, but no longer value-identical to generating the vectors one at a time. Dropping that cross-consistency removes a full-array transpose copy, about 10-24% faster than the prior order-preserving draw. Ordered and custom generators (and SparseSegmented for BSDC_SEG) keep the per-vector loop and still match N successive single-vector generate calls.

  • Non-batch-safe binders (circular convolution/correlation, shifting/segment-shifting for BSDC_S / BSDC_SEG / BSDC_THIN, matrix binding for MBAT, VTB, and context-dependent thinning for BSDC_CDT) are applied per column when bind() / unbind receives a batched (ndim > 1) input, returning one batched result. They previously produced a wrong result silently; single-vector inputs are unchanged.

  • random_zone_count returns an int for a single (D,) result and an array for a batched result.

  • ElementAdditionBits (MAP_I_Bits bundling) sums in a wide (int64) accumulator and clips the total once, saturating at the bounds. This replaces the previous per-addition clip, so results change when the running sum would have saturated mid-accumulation; it is vectorized and accepts a tuple of axes.

  • DisjunctionThinned (BSDC_THIN bundling) thins a batched result without a per-column Python loop: each surviving column keeps a uniformly random ceil(D * density)-subset of its set bits through a vectorized random-key selection.

  • bundle(array, batch_dim=k) on a 3-D array reduces the other batch axis in one vectorized op instead of Python-looping the split slices (about 8x faster on a 1000 x 20 x 500 array). Ragged nested-list inputs, batch_dim=0, and 4-D-or-larger arrays keep the per-group path. For tie-randomizing bundlers the random values at tie coordinates now differ from the previous per-group draws (still random; batch_dim has no fixed-seed guarantee). axis= remains the preferred vectorized form, returning a single tensor instead of a list.

Deprecated

  • batch_dim on bundle() / bind() / unbind() is deprecated and will be removed in a future release. Pass a batched array directly (operations batch automatically) or use axis= on bundle. Passing batch_dim now emits a DeprecationWarning.


v2.0.0: 2026-06-12

Added

  • Dimension-first (D, N) batched hypervectors. enc.generate(size=(D, N)) returns one Hypervector wrapping a (D, N) array whose columns are hypervectors. bundle() collapses a (D, N) batch to a single (D,) prototype; bind() / unbind operate per column.

  • select(): select hypervectors (columns) from a (D, N) batch by index, on both the NumPy and PyTorch backends.

  • stack(): backend-agnostic combine of hypervectors/batches into one (D, N) batch along the batch axis (a (D,) vector becomes a column).

  • Global backend/device defaults: prefer_torch(), prefer_cuda(), prefer_numpy(), prefer_cpu(), get_default_backend(), get_default_device(). Encodings created without an explicit backend / device inherit these.

  • Multi-mode similarity: a single (D, N) batch returns column 0 against each remaining column; two (D, N) batches return per-column pairs; a (D,) vector against a (D, N) batch broadcasts.

  • BSDC_THIN is now exported at the top level.

Changed (breaking)

  • Hypervector batches are now dimension-first (D, N) (each column is a hypervector), not batch-first (N, D). enc.generate(size=N) with an integer now returns a single N-dimensional vector; use enc.generate(size=(D, N)) for a batch of N vectors.

  • Batched similarity() is column-wise over (D, N) instead of per-row over (N, D): similarity(A, B) returns per-column pairs, and similarity(batch) returns column 0 vs each remaining column.

Migration guide:

# A batch of N vectors was (N, D) in 1.1.0; make or transpose it to (D, N).
batch = enc.generate(size=(10_000, 50))   # was enc.generate(size=50)

# Batched similarity now indexes columns, not rows.
sims   = enc.similarity(batch_a, batch_b)   # sims[i] = sim(batch_a[:, i], batch_b[:, i])
member = batch[:, i]                         # was batch[i]

Fixed

  • Batched generation is order-reproducible: generate(size=(D, N)) yields the same vectors as N successive generate() calls under a fixed seed, and works for every generator (a 2-D size previously mis-ordered the columns or failed).


v1.1.0: 2026-05-24

Added

  • BSDC_THIN encoding: sparse binary with post-bundling random thinning to enforce a density constraint. Uses Shifting / InverseShifting for binding.

  • DisjunctionThinned bundling function in pyhdc.components.bundling: bitwise OR followed by random thinning to a target density.

  • similarity_remap parameter on all encoding classes: optional callable applied to every similarity result before returning.

  • remap_to_unit in pyhdc.components.similarity: maps [-1, 1] → [0, 1]. Works on scalars, NumPy arrays, and PyTorch tensors.

  • PyTorch support for all four similarity functions (CosineSimilarity, HammingDistance, Overlap, AngleDistance).

  • Batched similarity calling conventions: (a, b) both 2-D returns per-row similarities; (arr,) single 2-D returns row 0 vs. rows 1+.

Changed (breaking)

  • HammingDistance now returns [-1, 1] instead of [0, 1].

  • Overlap now returns [-1, 1] instead of [0, 1].

Migration guide: any code comparing HammingDistance or Overlap output against thresholds in [0, 1] must be updated. The easiest fix:

from pyhdc.components.similarity import remap_to_unit

# Option A: remap manually
sim = hv1.similarity(hv2)
sim_01 = remap_to_unit(sim)

# Option B: remap automatically at the encoding level
enc = pyhdc.BSC(dimension=10_000, similarity_remap=remap_to_unit)
sim_01 = hv1.similarity(hv2)   # always in [0, 1]

Fixed

  • MAP_I_Bits integer overflow on Python 3.9.

  • All similarity functions now handle PyTorch tensors without falling back to NumPy.


v1.0.1: 2026-05-23

Changed

  • Added README.md with badges, installation instructions, and a quickstart example (omitted from the v1.0.0 tag; this patch ensures it appears on the PyPI release page).


v1.0.0: 2026-05-23

Added

  • Unit test suite covering all 14 encoding types, all 7 generator families, all components, and the hypervector API.

  • Performance benchmark suite (pytest-benchmark).

  • mypy static type checking configuration.

  • Pre-commit hooks: autoflake, isort, black, pylint, mypy.

  • CONTRIBUTING.md with developer setup and PR process.

  • SECURITY.md with vulnerability reporting guidance.

  • Codecov integration.

  • TestPyPI and PyPI publish workflows with OIDC Trusted Publishing.

Fixed

  • All internal imports changed from hdc. to pyhdc. namespace.

  • DefaultGenerator._next_word integer overflow for word_size >= 32.

  • MBAT.bind incorrectly storing tuple as hypervector data.

  • MAP_I_Bits wrong keyword argument names in ElementAdditionBits.

  • FeistelCounterGenerator non-deterministic round key generation.


v0.0.1: 2024-01-01

Initial template release to PyPI.

Added

  • Core encoding types: MAP_C, MAP_I, MAP_I_Bits, MAP_B, HRR, HRR_NoNorm, HRR_ConstNorm, FHRR, VTB, MBAT, BSC, BSDC_CDT, BSDC_S, BSDC_SEG

  • Random number generator families: LCG, DLFSR, LFSR, LCA, PCG, Xorshift, ShiftedCounter

  • Recovery algorithm framework (not yet public API)

  • NumPy backend; PyTorch optional

  • GitHub Actions CI: lint, test, PyPI publish workflows