How to Work with (D, N, M) Batches

PyHDC is dimension-first: axis 0 is always the hypervector dimension D, and every trailing axis is batch structure. A single vector is (D,), a flat batch is (D, N), and a two-level batch is (D, N, M). Each trailing-axis slice is one hypervector, so batch[:, j] is column j and batch[:, :, k] is a (D, N) array.

This guide covers building a (D, N, M) tensor, reducing a chosen batch axis with bundle, computing similarity on a 3-D batch, and element-wise binding that broadcasts a key against a batch. For the layout rules behind these shapes, see Array Layout: Dimension-First (D, N, M).

Build a (D, N, M) tensor

generate takes a dimension-first size tuple. The first entry is D and the remaining entries are batch axes. size=(D, N, M) returns a (D, N, M) tensor holding N * M hypervectors:

import pyhdc

enc   = pyhdc.MAP_C(dimension=10_000)
batch = enc.generate(size=(10_000, 4, 3))   # shape (10000, 4, 3)

print(batch.shape)       # (10000, 4, 3)
print(batch[:, 0, 0].shape)   # (10000,) one hypervector
print(batch[:, :, 0].shape)   # (10000, 4) a (D, N) slab

Under a fixed seed, batched generation reproduces itself for a given shape. With the i.i.d. element generators the whole batch is drawn in one call, so it is not value-identical to N successive generate(size=D) calls. Ordered and custom generators keep the per-vector loop and do match the sequential draws. See How to Make Experiments Reproducible for the seeding details.

Bundle with the axis keyword

bundle reduces one or more batch axes and returns a single Hypervector. Axis 0 is the dimension and is never a legal reduce axis, passing axis=0 raises ValueError. axis is the vectorized reduce keyword, the older batch_dim keyword is deprecated (see Convention 4).

Convention 1: default axis reduces the last batch axis

With axis=None (the default), bundle collapses the last axis. A (D, N) batch collapses to (D,), a (D, N, M) tensor collapses to (D, N):

flat = enc.generate(size=(10_000, 50))   # (D, N)
print(enc.bundle(flat).shape)            # (10000,)

cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)
print(enc.bundle(cube).shape)              # (10000, 4)

Convention 2: choose which batch axis to collapse

Pass an integer axis to fold a specific batch axis. Reducing axis 1 of a (D, N, M) tensor leaves (D, M), reducing axis 2 leaves (D, N):

cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)

print(enc.bundle(cube, axis=1).shape)   # (10000, 3)
print(enc.bundle(cube, axis=2).shape)   # (10000, 4)

Negative indices work and are normalized against the input rank, so axis=-1 is the same as axis=2 for a 3-D tensor.

Convention 3: collapse several axes with a tuple

The additive bundlers accept a tuple of axes and fold them together. Reducing (1, 2) of a (D, N, M) tensor collapses both batch axes to a single (D,) prototype:

cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)
print(enc.bundle(cube, axis=(1, 2)).shape)   # (10000,)

A tuple of axes applies to a single batched tensor. Bundling multiple separate operands requires (D,) or (D, N) inputs and rejects any operand with three or more dimensions. The tuple path is supported by the element-wise additive bundlers (the MAP addition variants, the normalized and threshold addition variants, AnglesOfElementAddition, and the bitwise-OR disjunction bundler, BSDC_S/SEG/CDT). BSDC_THIN (thinned OR) reduces a single axis only. For the full per-operation list, see Bundling Operations and How to Bundle Hypervectors.

Convention 4: per-group bundles with axis

To bundle each group of a 3-D batch on its own, reduce the other batch axis with axis= and read the result columns. Reducing axis 1 of a (D, N, M) tensor bundles the N vectors at each M index and returns a single (D, M) tensor whose column j is the bundle of group j:

cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)

groups = enc.bundle(cube, axis=1)   # (10000, 3): column j is group j
print(groups.shape)        # (10000, 3)
print(groups[:, 0].shape)  # (10000,)

The older batch_dim= keyword returned the same content as a Python list of hypervectors. It is deprecated as of 2.1.0, emits a DeprecationWarning, and will be removed. Pass a batched array or use axis= instead. axis= also keeps the fixed-seed reproducibility that the tie-randomizing bundlers (majority vote, thinned OR) lose under batch_dim.

Similarity on a 3-D batch needs an explicit axis

similarity reduces over axis 0 (the dimension). For a single input, the axis keyword selects which batch axis separates the query from the candidates, axis is keyword-only.

A (D, N) batch defaults to column 0 versus the rest

A single (D, N) batch with no axis compares column 0 against columns 1 through N - 1, returning N - 1 scores:

batch = enc.generate(size=(10_000, 101))   # (D, N)
sims  = enc.similarity(batch)              # shape (100,)
# sims[i] = similarity(column 0, column i + 1)

This matches the batched conventions in How to Compute Similarity.

A (D, N, M) batch requires you to name the axis

A single batch with three or more dimensions has no default split axis. Calling similarity on it without axis raises ValueError("single-input similarity on a (D, N, M, ...) batch requires an explicit axis"). Name the batch axis to split:

cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)

# Wrong: no axis on a 3-D batch
# enc.similarity(cube)   # ValueError

# Right: split along axis 1 (head row vs the remaining rows)
sims = enc.similarity(cube, axis=1)

The named axis is kept as a length-1 head against the length-(size minus one) rest, so it broadcasts against the remaining batch axes. A single 1-D input is rejected as single-input similarity needs at least a (D, N) batch.

Element-wise binding broadcasts

Element-wise binders (MAP multiply, BSC XOR, FHRR angle add and subtract) align operands by trailing-axis broadcasting, the same way NumPy and PyTorch do. A (D,) key binds against every column of a batch in one call:

enc   = pyhdc.MAP_C(dimension=10_000)
key   = enc.generate()                    # (D,)
batch = enc.generate(size=(10_000, 50))   # (D, N)

bound = enc.bind(key, batch)   # (10000, 50): key bound to each column

Mixed ranks align by padding the lower-rank operand with trailing length-1 axes. A (D, N) operand binds against a (D, N, M) tensor by broadcasting over the M axis:

keys = enc.generate(size=(10_000, 4))      # (D, N)
cube = enc.generate(size=(10_000, 4, 3))   # (D, N, M)

bound = enc.bind(keys, cube)   # (10000, 4, 3): keys broadcast over axis 2

Not every binder is element-wise. The convolution and correlation binders (the HRR family), shifting and segment-shifting (the sparse families), matrix binding (MBAT), VTB, and context-dependent thinning (BSDC_CDT) cannot broadcast a per-coordinate rule across a batch. Pass a batch anyway and bind applies the binder per column internally, returning one batched result. See How to Bind and Unbind Key-Value Pairs for the per-family binding details.

Putting it together

The four shapes compose. The table below summarizes how each operation moves between ranks:

Call	Input shape	Result shape
`generate(size=(D, N, M))`	—	`(D, N, M)`
`bundle(cube)` (default axis)	`(D, N, M)`	`(D, N)`
`bundle(cube, axis=1)`	`(D, N, M)`	`(D, M)`
`bundle(cube, axis=(1, 2))`	`(D, N, M)`	`(D,)`
`similarity(cube, axis=1)`	`(D, N, M)`	broadcast over the kept axes
`bind(key, batch)`	`(D,)` and `(D, N)`	`(D, N)`
`bind(keys, cube)`	`(D, N)` and `(D, N, M)`	`(D, N, M)`