How to Work with (D, N, M) Batches
PyHDC is dimension-first: axis 0 is always the hypervector dimension D, and
every trailing axis is batch structure. A single vector is (D,), a flat
batch is (D, N), and a two-level batch is (D, N, M). Each trailing-axis
slice is one hypervector, so batch[:, j] is column j and
batch[:, :, k] is a (D, N) array.
This guide covers building a (D, N, M) tensor, reducing a chosen batch axis
with bundle, computing similarity on a 3-D batch, and element-wise binding
that broadcasts a key against a batch. For the layout rules behind these
shapes, see Array Layout: Dimension-First (D, N, M).
Build a (D, N, M) tensor
generate takes a dimension-first size tuple. The first entry is D and
the remaining entries are batch axes. size=(D, N, M) returns a (D, N, M)
tensor holding N * M hypervectors:
import pyhdc
enc = pyhdc.MAP_C(dimension=10_000)
batch = enc.generate(size=(10_000, 4, 3)) # shape (10000, 4, 3)
print(batch.shape) # (10000, 4, 3)
print(batch[:, 0, 0].shape) # (10000,) one hypervector
print(batch[:, :, 0].shape) # (10000, 4) a (D, N) slab
Under a fixed seed, batched generation reproduces itself for a given shape. With
the i.i.d. element generators the whole batch is drawn in one call, so it is not
value-identical to N successive generate(size=D) calls. Ordered and
custom generators keep the per-vector loop and do match the sequential draws.
See How to Make Experiments Reproducible for the seeding details.
Bundle with the axis keyword
bundle reduces one or more batch axes and returns a single
Hypervector. Axis 0 is the dimension and is never a legal reduce
axis, passing axis=0 raises ValueError. axis is the vectorized reduce
keyword, the older batch_dim keyword is deprecated (see Convention 4).
Convention 1: default axis reduces the last batch axis
With axis=None (the default), bundle collapses the last axis. A
(D, N) batch collapses to (D,), a (D, N, M) tensor collapses to
(D, N):
flat = enc.generate(size=(10_000, 50)) # (D, N)
print(enc.bundle(flat).shape) # (10000,)
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
print(enc.bundle(cube).shape) # (10000, 4)
Convention 2: choose which batch axis to collapse
Pass an integer axis to fold a specific batch axis. Reducing axis 1 of a
(D, N, M) tensor leaves (D, M), reducing axis 2 leaves (D, N):
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
print(enc.bundle(cube, axis=1).shape) # (10000, 3)
print(enc.bundle(cube, axis=2).shape) # (10000, 4)
Negative indices work and are normalized against the input rank, so
axis=-1 is the same as axis=2 for a 3-D tensor.
Convention 3: collapse several axes with a tuple
The additive bundlers accept a tuple of axes and fold them together. Reducing
(1, 2) of a (D, N, M) tensor collapses both batch axes to a single
(D,) prototype:
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
print(enc.bundle(cube, axis=(1, 2)).shape) # (10000,)
A tuple of axes applies to a single batched tensor. Bundling multiple separate
operands requires (D,) or (D, N) inputs and rejects any operand with
three or more dimensions. The tuple path is supported by the element-wise
additive bundlers (the MAP addition variants, the normalized and threshold
addition variants, AnglesOfElementAddition, and the bitwise-OR disjunction
bundler, BSDC_S/SEG/CDT). BSDC_THIN (thinned OR) reduces a single axis only.
For the full per-operation list, see Bundling Operations
and How to Bundle Hypervectors.
Convention 4: per-group bundles with axis
To bundle each group of a 3-D batch on its own, reduce the other batch axis
with axis= and read the result columns. Reducing axis 1 of a (D, N, M)
tensor bundles the N vectors at each M index and returns a single
(D, M) tensor whose column j is the bundle of group j:
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
groups = enc.bundle(cube, axis=1) # (10000, 3): column j is group j
print(groups.shape) # (10000, 3)
print(groups[:, 0].shape) # (10000,)
The older batch_dim= keyword returned the same content as a Python list of
hypervectors. It is deprecated as of 2.1.0, emits a DeprecationWarning, and
will be removed. Pass a batched array or use axis= instead. axis= also
keeps the fixed-seed reproducibility that the tie-randomizing bundlers (majority
vote, thinned OR) lose under batch_dim.
Similarity on a 3-D batch needs an explicit axis
similarity reduces over axis 0 (the dimension). For a single input, the
axis keyword selects which batch axis separates the query from the
candidates, axis is keyword-only.
A (D, N) batch defaults to column 0 versus the rest
A single (D, N) batch with no axis compares column 0 against columns 1
through N - 1, returning N - 1 scores:
batch = enc.generate(size=(10_000, 101)) # (D, N)
sims = enc.similarity(batch) # shape (100,)
# sims[i] = similarity(column 0, column i + 1)
This matches the batched conventions in How to Compute Similarity.
A (D, N, M) batch requires you to name the axis
A single batch with three or more dimensions has no default split axis. Calling
similarity on it without axis raises
ValueError("single-input similarity on a (D, N, M, ...) batch requires an
explicit axis"). Name the batch axis to split:
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
# Wrong: no axis on a 3-D batch
# enc.similarity(cube) # ValueError
# Right: split along axis 1 (head row vs the remaining rows)
sims = enc.similarity(cube, axis=1)
The named axis is kept as a length-1 head against the length-(size minus one)
rest, so it broadcasts against the remaining batch axes. A single 1-D input is
rejected as single-input similarity needs at least a (D, N) batch.
Element-wise binding broadcasts
Element-wise binders (MAP multiply, BSC XOR, FHRR angle add and subtract) align
operands by trailing-axis broadcasting, the same way NumPy and PyTorch do. A
(D,) key binds against every column of a batch in one call:
enc = pyhdc.MAP_C(dimension=10_000)
key = enc.generate() # (D,)
batch = enc.generate(size=(10_000, 50)) # (D, N)
bound = enc.bind(key, batch) # (10000, 50): key bound to each column
Mixed ranks align by padding the lower-rank operand with trailing length-1
axes. A (D, N) operand binds against a (D, N, M) tensor by broadcasting
over the M axis:
keys = enc.generate(size=(10_000, 4)) # (D, N)
cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M)
bound = enc.bind(keys, cube) # (10000, 4, 3): keys broadcast over axis 2
Not every binder is element-wise. The convolution and correlation binders (the
HRR family), shifting and segment-shifting (the sparse families), matrix
binding (MBAT), VTB, and context-dependent thinning (BSDC_CDT) cannot broadcast
a per-coordinate rule across a batch. Pass a batch anyway and bind applies
the binder per column internally, returning one batched result. See
How to Bind and Unbind Key-Value Pairs for the per-family binding details.
Putting it together
The four shapes compose. The table below summarizes how each operation moves between ranks:
Call |
Input shape |
Result shape |
|---|---|---|
|
— |
|
|
|
|
|
|
|
|
|
|
|
|
broadcast over the kept axes |
|
|
|
|
|
|