Changelog
All notable changes to PyHDC are documented here. The project follows Semantic Versioning and Keep a Changelog conventions.
The source is CHANGELOG.md on GitHub.
v2.1.0: 2026-06-18
Added
Multi-dimensional
(D, N, M)batches.enc.generate(size=(D, N, M))returns oneHypervectorwrapping a(D, N, M)array; axis 0 is the dimensionDand every trailing-axis slice is a hypervector.axis=keyword onbundle(): reduce a chosen batch axis (an int or a tuple of ints) and return a singleHypervector.axis=Nonereduces the last axis, so(D, N)collapses to(D,)and(D, N, M)collapses to(D, N). Axis 0 is the dimension and cannot be reduced; passingaxis=0raisesValueError.axis=keyword (keyword-only) onsimilarity(): for a single(D, N, M, ...)batch, selects which batch axis splits index 0 from the rest.bind()andunbind()batch automatically. The element-wise binders (MAP multiply, BSC xor, FHRR angle add/sub) broadcast a batch natively: a(D,)key binds against every column, and operands of mixed rank align by trailing-axis padding. Every other binder (circular convolution/correlation, shifting, segment shifting, matrix binding, VTB, context-dependent thinning) is applied per column internally, so a batchedbind(A, B)returns one(D, N)Hypervectorwithoutbatch_dim.Two-input
similarity()broadcasting over trailing axes: the result shape is the broadcast of the two operands’ non-dimension axes. Two(D,)vectors return a Pythonfloat; every other pairing returns an array.First-class
permute(),inverse(),negative(), andnormalize()onEncoding, mirrored as methods onHypervector(permute(),inverse(),negative(),normalize()).permuteis defined for every encoding (cyclic shift along axis 0);inverse/negative/normalizeare wired per family and raiseNotImplementedErrorwhere a family does not define them.Operator overloading on
Hypervector:+(bundle),*(bind),/(unbind),~(inverse),>>(permute by+k),<<(permute by-k). A non-Hypervectoroperand to+ * /returnsNotImplementedand Python raisesTypeError; aboolshift on>>/<<is rejected.Module-level
permute(),inverse(),negative(),normalize(), andunbind(), joining the existinggenerate(),zeros(),bundle(),bind(), andstack().BSDC_THINis now reachable directly from the top level (previously only viapyhdc.encodings); all 15 encodings are exported at the top level.
Changed (breaking)
The misspelled
BernoulliBiploarelement generator is renamed toBernoulliBipolar; the old name is removed. Any direct import of the old name inpyhdc.components.elementsmust be updated. The MAP_I, MAP_I_Bits, and MAP_B encodings that use it are unchanged in behavior.
Migration guide:
# The element generator was misspelled; import the corrected name.
from pyhdc.components.elements import BernoulliBipolar # was BernoulliBiploar
Changed
Vectorized fast path for batched i.i.d. generation: with the default i.i.d. element generators (Bernoulli bipolar/binary, uniform bipolar/angles, normal real, Bernoulli sparse),
generate(size=(D, N))draws the whole batch in one(D, *batch)call. It is reproducible under a fixed seed for a given batch shape, but no longer value-identical to generating the vectors one at a time. Dropping that cross-consistency removes a full-array transpose copy, about 10-24% faster than the prior order-preserving draw. Ordered and custom generators (andSparseSegmentedforBSDC_SEG) keep the per-vector loop and still matchNsuccessive single-vectorgeneratecalls.Non-batch-safe binders (circular convolution/correlation, shifting/segment-shifting for
BSDC_S/BSDC_SEG/BSDC_THIN, matrix binding forMBAT, VTB, and context-dependent thinning forBSDC_CDT) are applied per column whenbind()/unbindreceives a batched (ndim > 1) input, returning one batched result. They previously produced a wrong result silently; single-vector inputs are unchanged.random_zone_countreturns anintfor a single(D,)result and an array for a batched result.ElementAdditionBits(MAP_I_Bits bundling) sums in a wide (int64) accumulator and clips the total once, saturating at the bounds. This replaces the previous per-addition clip, so results change when the running sum would have saturated mid-accumulation; it is vectorized and accepts a tuple of axes.DisjunctionThinned(BSDC_THIN bundling) thins a batched result without a per-column Python loop: each surviving column keeps a uniformly randomceil(D * density)-subset of its set bits through a vectorized random-key selection.bundle(array, batch_dim=k)on a 3-D array reduces the other batch axis in one vectorized op instead of Python-looping the split slices (about 8x faster on a1000 x 20 x 500array). Ragged nested-list inputs,batch_dim=0, and 4-D-or-larger arrays keep the per-group path. For tie-randomizing bundlers the random values at tie coordinates now differ from the previous per-group draws (still random;batch_dimhas no fixed-seed guarantee).axis=remains the preferred vectorized form, returning a single tensor instead of a list.
Deprecated
v2.0.0: 2026-06-12
Added
Dimension-first
(D, N)batched hypervectors.enc.generate(size=(D, N))returns oneHypervectorwrapping a(D, N)array whose columns are hypervectors.bundle()collapses a(D, N)batch to a single(D,)prototype;bind()/unbindoperate per column.select(): select hypervectors (columns) from a(D, N)batch by index, on both the NumPy and PyTorch backends.stack(): backend-agnostic combine of hypervectors/batches into one(D, N)batch along the batch axis (a(D,)vector becomes a column).Global backend/device defaults:
prefer_torch(),prefer_cuda(),prefer_numpy(),prefer_cpu(),get_default_backend(),get_default_device(). Encodings created without an explicitbackend/deviceinherit these.Multi-mode similarity: a single
(D, N)batch returns column 0 against each remaining column; two(D, N)batches return per-column pairs; a(D,)vector against a(D, N)batch broadcasts.BSDC_THINis now exported at the top level.
Changed (breaking)
Hypervector batches are now dimension-first
(D, N)(each column is a hypervector), not batch-first(N, D).enc.generate(size=N)with an integer now returns a singleN-dimensional vector; useenc.generate(size=(D, N))for a batch ofNvectors.Batched
similarity()is column-wise over(D, N)instead of per-row over(N, D):similarity(A, B)returns per-column pairs, andsimilarity(batch)returns column 0 vs each remaining column.
Migration guide:
# A batch of N vectors was (N, D) in 1.1.0; make or transpose it to (D, N).
batch = enc.generate(size=(10_000, 50)) # was enc.generate(size=50)
# Batched similarity now indexes columns, not rows.
sims = enc.similarity(batch_a, batch_b) # sims[i] = sim(batch_a[:, i], batch_b[:, i])
member = batch[:, i] # was batch[i]
Fixed
Batched generation is order-reproducible:
generate(size=(D, N))yields the same vectors asNsuccessivegenerate()calls under a fixed seed, and works for every generator (a 2-Dsizepreviously mis-ordered the columns or failed).
v1.1.0: 2026-05-24
Added
BSDC_THINencoding: sparse binary with post-bundling random thinning to enforce a density constraint. UsesShifting/InverseShiftingfor binding.DisjunctionThinnedbundling function inpyhdc.components.bundling: bitwise OR followed by random thinning to a target density.similarity_remapparameter on all encoding classes: optional callable applied to every similarity result before returning.remap_to_unitinpyhdc.components.similarity: maps [-1, 1] → [0, 1]. Works on scalars, NumPy arrays, and PyTorch tensors.PyTorch support for all four similarity functions (
CosineSimilarity,HammingDistance,Overlap,AngleDistance).Batched similarity calling conventions:
(a, b)both 2-D returns per-row similarities;(arr,)single 2-D returns row 0 vs. rows 1+.
Changed (breaking)
HammingDistancenow returns [-1, 1] instead of [0, 1].Overlapnow returns [-1, 1] instead of [0, 1].
Migration guide: any code comparing HammingDistance or Overlap
output against thresholds in [0, 1] must be updated. The easiest fix:
from pyhdc.components.similarity import remap_to_unit
# Option A: remap manually
sim = hv1.similarity(hv2)
sim_01 = remap_to_unit(sim)
# Option B: remap automatically at the encoding level
enc = pyhdc.BSC(dimension=10_000, similarity_remap=remap_to_unit)
sim_01 = hv1.similarity(hv2) # always in [0, 1]
Fixed
MAP_I_Bitsinteger overflow on Python 3.9.All similarity functions now handle PyTorch tensors without falling back to NumPy.
v1.0.1: 2026-05-23
Changed
Added README.md with badges, installation instructions, and a quickstart example (omitted from the v1.0.0 tag; this patch ensures it appears on the PyPI release page).
v1.0.0: 2026-05-23
Added
Unit test suite covering all 14 encoding types, all 7 generator families, all components, and the hypervector API.
Performance benchmark suite (
pytest-benchmark).mypy static type checking configuration.
Pre-commit hooks: autoflake, isort, black, pylint, mypy.
CONTRIBUTING.mdwith developer setup and PR process.SECURITY.mdwith vulnerability reporting guidance.Codecov integration.
TestPyPI and PyPI publish workflows with OIDC Trusted Publishing.
Fixed
All internal imports changed from
hdc.topyhdc.namespace.DefaultGenerator._next_wordinteger overflow forword_size >= 32.MBAT.bindincorrectly storing tuple as hypervector data.MAP_I_Bitswrong keyword argument names inElementAdditionBits.FeistelCounterGeneratornon-deterministic round key generation.
v0.0.1: 2024-01-01
Initial template release to PyPI.
Added
Core encoding types: MAP_C, MAP_I, MAP_I_Bits, MAP_B, HRR, HRR_NoNorm, HRR_ConstNorm, FHRR, VTB, MBAT, BSC, BSDC_CDT, BSDC_S, BSDC_SEG
Random number generator families: LCG, DLFSR, LFSR, LCA, PCG, Xorshift, ShiftedCounter
Recovery algorithm framework (not yet public API)
NumPy backend; PyTorch optional
GitHub Actions CI: lint, test, PyPI publish workflows