Changelog
All notable changes to PyHDC are documented here. The project follows Semantic Versioning and Keep a Changelog conventions.
The source is CHANGELOG.md on GitHub.
v2.2.0: 2026-06-27
Added
Data encoders in the new
pyhdc.encoderspackage. EachEncoderwraps anEncodinginstance and maps a value, feature vector, or batch to a dimension-first(D, B)Hypervectorviaencode(or by calling the encoder directly). Codebook encoders:Empty,Identity,Random,Level,Thermometer,Circular. Functional encoders:Projection,Sinusoid,Density,FractionalPower. A family-specific encoder raisesNotImplementedErrorwhere the family has no definition (Identityon VTB/MBAT/BSDC,Thermometer/Densityon continuous or phase families,Projectionon BSC/BSDC,FractionalPoweroutside FHRR and the HRR family).Identityreturns the binding-identity element (theewherebind(x, e) == x): all-ones for MAP, all-zeros for BSC, the impulse for the HRR family, zero phase for FHRR.Family-aware basis builders in the new
pyhdc.components.basispackage:empty(),identity(),random(),level(),circular(),thermometer(), plusfamily_endpoints(). Each returns a(D, count)codebook in the encoding’s value domain and backend.Cross similarity.
similarity()withmode="cross",A=(D, P)andB=(D, M)returns the full(P, M)matrix of every column ofAagainst every column ofB, backed by a single matmul with no(D, P, M)intermediate. Available onsimilarity(),similarity(), and the new module-levelsimilarity(). Implemented for Cosine, Hamming, Overlap, and Angle; an encoding whose metric is outside that set raisesNotImplementedErrorso the caller can fall back to a per-pair loop. Binary metrics cast tofloat64for a BLAS matmul, and cosine guards a zero-norm column (scores 0, notnan).Module-level convenience function
similarity(), joining the existinggenerate(),zeros(),bundle(),bind(), andunbind().Composable component helpers, each in an operation-named module: random-selection bundling
randsel/multirandseland additivemultiset/multibundleinpyhdc.components.bundling, multiplicativemultibindinpyhdc.components.binding, andhard_quantize/soft_quantizeinpyhdc.components.quantization.MAP_I_Bitsgains abit_widthparameter to set the signed saturation width explicitly (overridesmask).
Changed (breaking)
Narrow:
MAP_I_Bitsrejects amaskthat is not of the form2**k - 1(contiguous low bits). Such a value was previously accepted and silently ignored (always clipping at int32), it now raisesValueError. Passbit_width=kfor an explicit k-bit limit. Default construction is unaffected.
Fixed
zeros()now works on the torch backend. It previously passed the encoding’s numpy dtype straight totorch.zeros, which raised aTypeError, it now builds in numpy and converts, preserving the dtype.MAP_I_Bitsnow honors its bit width. The post-bundle saturation bounds and the storage dtype are derived frommask(which must be2**k - 1) or the newbit_width, instead of being hard-coded to int32 with themaskignored. The defaultmask=(2**32) - 1is unchanged (int32 bounds, int32 storage). A narrow width now saturates correctly (an 8-bit mask clips to[-128, 127]and stores int8), a width wider than 32 widens the storage dtype (up to int64) so the sum no longer wraps on cast.
v2.1.0: 2026-06-18
Added
Multi-dimensional
(D, N, M)batches.enc.generate(size=(D, N, M))returns oneHypervectorwrapping a(D, N, M)array; axis 0 is the dimensionDand every trailing-axis slice is a hypervector.axis=keyword onbundle(): reduce a chosen batch axis (an int or a tuple of ints) and return a singleHypervector.axis=Nonereduces the last axis, so(D, N)collapses to(D,)and(D, N, M)collapses to(D, N). Axis 0 is the dimension and cannot be reduced; passingaxis=0raisesValueError.axis=keyword (keyword-only) onsimilarity(): for a single(D, N, M, ...)batch, selects which batch axis splits index 0 from the rest.bind()andunbind()batch automatically. The element-wise binders (MAP multiply, BSC xor, FHRR angle add/sub) broadcast a batch natively: a(D,)key binds against every column, and operands of mixed rank align by trailing-axis padding. Every other binder (circular convolution/correlation, shifting, segment shifting, matrix binding, VTB, context-dependent thinning) is applied per column internally, so a batchedbind(A, B)returns one(D, N)Hypervectorwithoutbatch_dim.Two-input
similarity()broadcasting over trailing axes: the result shape is the broadcast of the two operands’ non-dimension axes. Two(D,)vectors return a Pythonfloat; every other pairing returns an array.First-class
permute(),inverse(),negative(), andnormalize()onEncoding, mirrored as methods onHypervector(permute(),inverse(),negative(),normalize()).permuteis defined for every encoding (cyclic shift along axis 0);inverse/negative/normalizeare wired per family and raiseNotImplementedErrorwhere a family does not define them.Operator overloading on
Hypervector:+(bundle),*(bind),/(unbind),~(inverse),>>(permute by+k),<<(permute by-k). A non-Hypervectoroperand to+ * /returnsNotImplementedand Python raisesTypeError; aboolshift on>>/<<is rejected.Module-level
permute(),inverse(),negative(),normalize(), andunbind(), joining the existinggenerate(),zeros(),bundle(),bind(), andstack().BSDC_THINis now reachable directly from the top level (previously only viapyhdc.encodings); all 15 encodings are exported at the top level.
Changed (breaking)
The misspelled
BernoulliBiploarelement generator is renamed toBernoulliBipolar; the old name is removed. Any direct import of the old name inpyhdc.components.elementsmust be updated. The MAP_I, MAP_I_Bits, and MAP_B encodings that use it are unchanged in behavior.
Migration guide:
# The element generator was misspelled; import the corrected name.
from pyhdc.components.elements import BernoulliBipolar # was BernoulliBiploar
Changed
Vectorized fast path for batched i.i.d. generation: with the default i.i.d. element generators (Bernoulli bipolar/binary, uniform bipolar/angles, normal real, Bernoulli sparse),
generate(size=(D, N))draws the whole batch in one(D, *batch)call. It is reproducible under a fixed seed for a given batch shape, but no longer value-identical to generating the vectors one at a time. Dropping that cross-consistency removes a full-array transpose copy, about 10-24% faster than the prior order-preserving draw. Ordered and custom generators (andSparseSegmentedforBSDC_SEG) keep the per-vector loop and still matchNsuccessive single-vectorgeneratecalls.Non-batch-safe binders (circular convolution/correlation, shifting/segment-shifting for
BSDC_S/BSDC_SEG/BSDC_THIN, matrix binding forMBAT, VTB, and context-dependent thinning forBSDC_CDT) are applied per column whenbind()/unbindreceives a batched (ndim > 1) input, returning one batched result. They previously produced a wrong result silently; single-vector inputs are unchanged.random_zone_countreturns anintfor a single(D,)result and an array for a batched result.ElementAdditionBits(MAP_I_Bits bundling) sums in a wide (int64) accumulator and clips the total once, saturating at the bounds. This replaces the previous per-addition clip, so results change when the running sum would have saturated mid-accumulation; it is vectorized and accepts a tuple of axes.DisjunctionThinned(BSDC_THIN bundling) thins a batched result without a per-column Python loop: each surviving column keeps a uniformly randomceil(D * density)-subset of its set bits through a vectorized random-key selection.bundle(array, batch_dim=k)on a 3-D array reduces the other batch axis in one vectorized op instead of Python-looping the split slices (about 8x faster on a1000 x 20 x 500array). Ragged nested-list inputs,batch_dim=0, and 4-D-or-larger arrays keep the per-group path. For tie-randomizing bundlers the random values at tie coordinates now differ from the previous per-group draws (still random;batch_dimhas no fixed-seed guarantee).axis=remains the preferred vectorized form, returning a single tensor instead of a list.
Deprecated
batch_dimonbundle()/bind()/unbind()is deprecated and will be removed in a future release. Pass a batched array directly (operations batch automatically) or useaxis=onbundle. Passingbatch_dimnow emits aDeprecationWarning.
v2.0.0: 2026-06-12
Added
Dimension-first
(D, N)batched hypervectors.enc.generate(size=(D, N))returns oneHypervectorwrapping a(D, N)array whose columns are hypervectors.bundle()collapses a(D, N)batch to a single(D,)prototype;bind()/unbindoperate per column.select(): select hypervectors (columns) from a(D, N)batch by index, on both the NumPy and PyTorch backends.stack(): backend-agnostic combine of hypervectors/batches into one(D, N)batch along the batch axis (a(D,)vector becomes a column).Global backend/device defaults:
prefer_torch(),prefer_cuda(),prefer_numpy(),prefer_cpu(),get_default_backend(),get_default_device(). Encodings created without an explicitbackend/deviceinherit these.Multi-mode similarity: a single
(D, N)batch returns column 0 against each remaining column; two(D, N)batches return per-column pairs; a(D,)vector against a(D, N)batch broadcasts.BSDC_THINis now exported at the top level.
Changed (breaking)
Hypervector batches are now dimension-first
(D, N)(each column is a hypervector), not batch-first(N, D).enc.generate(size=N)with an integer now returns a singleN-dimensional vector; useenc.generate(size=(D, N))for a batch ofNvectors.Batched
similarity()is column-wise over(D, N)instead of per-row over(N, D):similarity(A, B)returns per-column pairs, andsimilarity(batch)returns column 0 vs each remaining column.
Migration guide:
# A batch of N vectors was (N, D) in 1.1.0; make or transpose it to (D, N).
batch = enc.generate(size=(10_000, 50)) # was enc.generate(size=50)
# Batched similarity now indexes columns, not rows.
sims = enc.similarity(batch_a, batch_b) # sims[i] = sim(batch_a[:, i], batch_b[:, i])
member = batch[:, i] # was batch[i]
Fixed
Batched generation is order-reproducible:
generate(size=(D, N))yields the same vectors asNsuccessivegenerate()calls under a fixed seed, and works for every generator (a 2-Dsizepreviously mis-ordered the columns or failed).
v1.1.0: 2026-05-24
Added
BSDC_THINencoding: sparse binary with post-bundling random thinning to enforce a density constraint. UsesShifting/InverseShiftingfor binding.DisjunctionThinnedbundling function inpyhdc.components.bundling: bitwise OR followed by random thinning to a target density.similarity_remapparameter on all encoding classes: optional callable applied to every similarity result before returning.remap_to_unitinpyhdc.components.similarity: maps [-1, 1] → [0, 1]. Works on scalars, NumPy arrays, and PyTorch tensors.PyTorch support for all four similarity functions (
CosineSimilarity,HammingDistance,Overlap,AngleDistance).Batched similarity calling conventions:
(a, b)both 2-D returns per-row similarities;(arr,)single 2-D returns row 0 vs. rows 1+.
Changed (breaking)
HammingDistancenow returns [-1, 1] instead of [0, 1].Overlapnow returns [-1, 1] instead of [0, 1].
Migration guide: any code comparing HammingDistance or Overlap
output against thresholds in [0, 1] must be updated. The easiest fix:
from pyhdc.components.similarity import remap_to_unit
# Option A: remap manually
sim = hv1.similarity(hv2)
sim_01 = remap_to_unit(sim)
# Option B: remap automatically at the encoding level
enc = pyhdc.BSC(dimension=10_000, similarity_remap=remap_to_unit)
sim_01 = hv1.similarity(hv2) # always in [0, 1]
Fixed
MAP_I_Bitsinteger overflow on Python 3.9.All similarity functions now handle PyTorch tensors without falling back to NumPy.
v1.0.1: 2026-05-23
Changed
Added README.md with badges, installation instructions, and a quickstart example (omitted from the v1.0.0 tag; this patch ensures it appears on the PyPI release page).
v1.0.0: 2026-05-23
Added
Unit test suite covering all 14 encoding types, all 7 generator families, all components, and the hypervector API.
Performance benchmark suite (
pytest-benchmark).mypy static type checking configuration.
Pre-commit hooks: autoflake, isort, black, pylint, mypy.
CONTRIBUTING.mdwith developer setup and PR process.SECURITY.mdwith vulnerability reporting guidance.Codecov integration.
TestPyPI and PyPI publish workflows with OIDC Trusted Publishing.
Fixed
All internal imports changed from
hdc.topyhdc.namespace.DefaultGenerator._next_wordinteger overflow forword_size >= 32.MBAT.bindincorrectly storing tuple as hypervector data.MAP_I_Bitswrong keyword argument names inElementAdditionBits.FeistelCounterGeneratornon-deterministic round key generation.
v0.0.1: 2024-01-01
Initial template release to PyPI.
Added
Core encoding types: MAP_C, MAP_I, MAP_I_Bits, MAP_B, HRR, HRR_NoNorm, HRR_ConstNorm, FHRR, VTB, MBAT, BSC, BSDC_CDT, BSDC_S, BSDC_SEG
Random number generator families: LCG, DLFSR, LFSR, LCA, PCG, Xorshift, ShiftedCounter
Recovery algorithm framework (not yet public API)
NumPy backend; PyTorch optional
GitHub Actions CI: lint, test, PyPI publish workflows