Array Layout: Dimension-First (D, N, M)

PyHDC stores every hypervector and every batch of hypervectors with the hypervector dimension on axis 0. A single vector is (D,), a batch of N vectors is (D, N), and a tensor of N x M vectors is (D, N, M). The leading axis is always D, every trailing axis is part of the batch. This page explains the PyHDC convention, how the three primitives read the axes, and why this ordering is the one that makes batched generation reproducible.

PyHDC Convention

Axis 0 is the dimension. For any array you pass to PyHDC, array.shape[0] must equal the encoding’s dimension D. len(hv) equals hv.shape[0] equals D for every shape, whether the array holds one vector or a million.

Trailing axes are the batch. Axes 1 and beyond index which hypervector. Each trailing-axis slice (each column) is one complete hypervector laid out along axis 0. You index a batch the way you index any dimension-first array:

import pyhdc

enc = pyhdc.MAP_C(dimension=10_000)
batch = enc.generate(size=(10_000, 8))   # (D, N) = (10000, 8)

batch.shape          # (10000, 8)
len(batch)           # 10000  -> this is D, not N
batch[:, 0]          # (10000,) the first hypervector
batch[:, :5]         # (10000, 5) the first five hypervectors
batch[:, -1]         # (10000,) the last hypervector

Axis 0 is never a legal reduce axis. Reducing axis 0 would collapse the coordinates of a single hypervector into a scalar, which is meaningless for bundling. PyHDC enforces this: passing axis=0 to bundle() raises ValueError("axis 0 is the hypervector dimension and cannot be reduced"). Negative axes are normalized first, so axis=-3 on a 3D array resolves to axis 0 and is rejected the same way.

Shape to meaning

Shape	Name	Meaning
`(D,)`	single	One hypervector. Axis 0 is the dimension, there is no batch axis.
`(D, N)`	batch	`N` hypervectors. Column `batch[:, j]` is the `j`-th vector.
`(D, N, M)`	tensor	`N x M` hypervectors. Slice `tensor[:, i, j]` is one vector; `tensor[:, i]` is the `M`-vector row at index `i`.
`(D, *batch)`	general	One hypervector per trailing-axis index, `prod(batch)` vectors total.

A lone (D,) input is promoted internally to (D, 1) when an operation needs a batch axis, so a single vector behaves as a batch of one without you reshaping it.

How bundle reads the axes

Bundling combines several hypervectors into one. It reduces a batch axis (any trailing axis) and leaves axis 0 intact, because the output is itself a hypervector of dimension D.

With the default axis=None, bundle() reduces the last axis:

batch = enc.generate(size=(10_000, 50))     # (D, N)
summary = enc.bundle(batch)                 # reduces axis 1 -> (D,)

tensor = enc.generate(size=(10_000, 4, 6))  # (D, N, M)
rows = enc.bundle(tensor)                   # reduces axis 2 -> (D, 4)

Pass an explicit axis to choose a different batch axis, or a tuple of axes to fold several at once:

tensor = enc.generate(size=(10_000, 4, 6))  # (D, N, M)
enc.bundle(tensor, axis=1)                   # reduce N -> (D, 6)
enc.bundle(tensor, axis=(1, 2))              # reduce N and M -> (D,)

A tuple of axes applies to a single batched tensor on the additive, element-wise bundlers (the addition and OR variants). Bundling multiple separate operands instead requires (D,) or (D, N) inputs. An operand with three or more axes raises ValueError. To bundle each group of a 3D array on its own, reduce the other axis with axis= and read the result columns.

The older batch_dim= keyword returned a Python list of results, it is deprecated as of 2.1.0 and emits a DeprecationWarning. See Bundling Operations for the per-bundler formulas.

How similarity reads the axes

Similarity compares hypervectors, so it reduces axis 0 (the dimension) and broadcasts the trailing axes. The result shape is the broadcast of the two operands’ batch axes. Axis 0 disappears because each comparison produces one scalar per vector pair.

key = enc.generate(size=10_000)             # (D,)
codebook = enc.generate(size=(10_000, 200)) # (D, N)

enc.similarity(key, codebook)               # (200,) one score per column
enc.similarity(key, key)                    # Python float (both 1D)

A (D,) key broadcasts against every column of a (D, N) codebook, and a (D, N) batch broadcasts against a (D, N, M) tensor along axis 1. Two 1D inputs are the only case that returns a Python float, every other case returns a numpy array or torch tensor whose shape is the broadcast of the non-dimension axes.

A shape	B shape	Result
`(D,)`	`(D,)`	Python `float`
`(D,)`	`(D, N)`	`(N,)`
`(D, N)`	`(D, N)`	`(N,)`
`(D,)`	`(D, N, M)`	`(N, M)`
`(D, N)`	`(D, N, M)`	`(N, M)` (A broadcast over axis 2)
`(D, N, M)`	`(D, N, M)`	`(N, M)`

For the single-input form and the full set of modes, see batched calling conventions and Similarity Metrics.

How element-wise bind reads the axes

Element-wise binders (MAP multiply, BSC XOR, FHRR angle add/sub) operate per coordinate along axis 0 and broadcast over the trailing batch axes, exactly like similarity. A single (D,) key binds against every column of a batch:

key = enc.generate(size=10_000)             # (D,)
values = enc.generate(size=(10_000, 32))    # (D, N)

enc.bind(key, values)                       # (D, 32) key * each column

Mixed ranks align by trailing-axis padding, so a (D, N) batch binds against a (D, N, M) tensor column for column. The non-element-wise binders (circular convolution and correlation, shifting and segment shifting, matrix binding, VTB, and CDT) cannot broadcast a per-coordinate rule. Pass a batch anyway and bind applies the binder per column internally, returning one batched result. The per-binder behavior is in Binding Operations.

Why this layout

Putting D on axis 0 makes a batch a stack of columns: batch[:, j] is a complete hypervector, bundling reduces a trailing axis, and axis 0 stays the dimension throughout. Under a fixed seed and a fixed batch shape, generate reproduces itself.

With the i.i.d. element generators (Bernoulli, uniform, normal, sparse) the whole (D, *batch) array is drawn in one vectorized call. That batch is reproducible for a given seed and shape, but it is not value-identical to generating the vectors one at a time: a single block draw and a per-vector loop consume the random stream in different orders. Ordered generators (LCG, LFSR, and the rest), any custom HDCGenerator, and SparseSegmented keep the per-vector loop, so for those a seeded batch equals N successive single-vector calls.

For reproducible bundling, prefer axis=. It reduces in place without the extra random draw that the tie-randomizing bundlers (majority vote, thinned OR) make at tie coordinates. The deprecated batch_dim bundling has no fixed-seed guarantee.

The `from_array` convention

When you wrap an existing array with Encoding.from_array(), the same invariant applies and nothing is transposed or reshaped for you. Axis 0 must equal self.dimension (it is D), with the trailing axes as the batch:

import numpy as np

data = np.random.choice([-1, 1], size=(10_000, 16))  # (D, N)
hv = enc.from_array(data)                            # axis 0 is D

If you hand it an array with the batch on axis 0, every downstream operation will read the wrong axis as the dimension. Build your arrays dimension-first before wrapping them.