How to Work with (D, N, M) Batches =================================== PyHDC is dimension-first: axis 0 is always the hypervector dimension ``D``, and every trailing axis is batch structure. A single vector is ``(D,)``, a flat batch is ``(D, N)``, and a two-level batch is ``(D, N, M)``. Each trailing-axis slice is one hypervector, so ``batch[:, j]`` is column ``j`` and ``batch[:, :, k]`` is a ``(D, N)`` array. This guide covers building a ``(D, N, M)`` tensor, reducing a chosen batch axis with ``bundle``, computing similarity on a 3-D batch, and element-wise binding that broadcasts a key against a batch. For the layout rules behind these shapes, see :doc:`../user_manual/array_layout`. Build a (D, N, M) tensor ------------------------ ``generate`` takes a dimension-first ``size`` tuple. The first entry is ``D`` and the remaining entries are batch axes. ``size=(D, N, M)`` returns a ``(D, N, M)`` tensor holding ``N * M`` hypervectors: .. code-block:: python import pyhdc enc = pyhdc.MAP_C(dimension=10_000) batch = enc.generate(size=(10_000, 4, 3)) # shape (10000, 4, 3) print(batch.shape) # (10000, 4, 3) print(batch[:, 0, 0].shape) # (10000,) one hypervector print(batch[:, :, 0].shape) # (10000, 4) a (D, N) slab Under a fixed seed, batched generation reproduces itself for a given shape. With the i.i.d. element generators the whole batch is drawn in one call, so it is not value-identical to ``N`` successive ``generate(size=D)`` calls. Ordered and custom generators keep the per-vector loop and do match the sequential draws. See :doc:`reproducibility` for the seeding details. Bundle with the axis keyword ---------------------------- ``bundle`` reduces one or more batch axes and returns a single :class:`~pyhdc.Hypervector`. Axis 0 is the dimension and is never a legal reduce axis, passing ``axis=0`` raises ``ValueError``. ``axis`` is the vectorized reduce keyword, the older ``batch_dim`` keyword is deprecated (see Convention 4). **Convention 1: default axis reduces the last batch axis** With ``axis=None`` (the default), ``bundle`` collapses the last axis. A ``(D, N)`` batch collapses to ``(D,)``, a ``(D, N, M)`` tensor collapses to ``(D, N)``: .. code-block:: python flat = enc.generate(size=(10_000, 50)) # (D, N) print(enc.bundle(flat).shape) # (10000,) cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) print(enc.bundle(cube).shape) # (10000, 4) **Convention 2: choose which batch axis to collapse** Pass an integer ``axis`` to fold a specific batch axis. Reducing axis 1 of a ``(D, N, M)`` tensor leaves ``(D, M)``, reducing axis 2 leaves ``(D, N)``: .. code-block:: python cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) print(enc.bundle(cube, axis=1).shape) # (10000, 3) print(enc.bundle(cube, axis=2).shape) # (10000, 4) Negative indices work and are normalized against the input rank, so ``axis=-1`` is the same as ``axis=2`` for a 3-D tensor. **Convention 3: collapse several axes with a tuple** The additive bundlers accept a tuple of axes and fold them together. Reducing ``(1, 2)`` of a ``(D, N, M)`` tensor collapses both batch axes to a single ``(D,)`` prototype: .. code-block:: python cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) print(enc.bundle(cube, axis=(1, 2)).shape) # (10000,) A tuple of axes applies to a single batched tensor. Bundling multiple separate operands requires ``(D,)`` or ``(D, N)`` inputs and rejects any operand with three or more dimensions. The tuple path is supported by the element-wise additive bundlers (the MAP addition variants, the normalized and threshold addition variants, ``AnglesOfElementAddition``, and the bitwise-OR disjunction bundler, BSDC_S/SEG/CDT). BSDC_THIN (thinned OR) reduces a single axis only. For the full per-operation list, see :doc:`../user_manual/bundling_operations` and :doc:`bundle_hypervectors`. **Convention 4: per-group bundles with axis** To bundle each group of a 3-D batch on its own, reduce the *other* batch axis with ``axis=`` and read the result columns. Reducing axis 1 of a ``(D, N, M)`` tensor bundles the ``N`` vectors at each ``M`` index and returns a single ``(D, M)`` tensor whose column ``j`` is the bundle of group ``j``: .. code-block:: python cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) groups = enc.bundle(cube, axis=1) # (10000, 3): column j is group j print(groups.shape) # (10000, 3) print(groups[:, 0].shape) # (10000,) The older ``batch_dim=`` keyword returned the same content as a Python list of hypervectors. It is deprecated as of 2.1.0, emits a ``DeprecationWarning``, and will be removed. Pass a batched array or use ``axis=`` instead. ``axis=`` also keeps the fixed-seed reproducibility that the tie-randomizing bundlers (majority vote, thinned OR) lose under ``batch_dim``. Similarity on a 3-D batch needs an explicit axis ------------------------------------------------ ``similarity`` reduces over axis 0 (the dimension). For a single input, the ``axis`` keyword selects which batch axis separates the query from the candidates, ``axis`` is keyword-only. **A (D, N) batch defaults to column 0 versus the rest** A single ``(D, N)`` batch with no ``axis`` compares column 0 against columns 1 through ``N - 1``, returning ``N - 1`` scores: .. code-block:: python batch = enc.generate(size=(10_000, 101)) # (D, N) sims = enc.similarity(batch) # shape (100,) # sims[i] = similarity(column 0, column i + 1) This matches :ref:`the batched conventions ` in :doc:`compute_similarity`. **A (D, N, M) batch requires you to name the axis** A single batch with three or more dimensions has no default split axis. Calling ``similarity`` on it without ``axis`` raises ``ValueError("single-input similarity on a (D, N, M, ...) batch requires an explicit axis")``. Name the batch axis to split: .. code-block:: python cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) # Wrong: no axis on a 3-D batch # enc.similarity(cube) # ValueError # Right: split along axis 1 (head row vs the remaining rows) sims = enc.similarity(cube, axis=1) The named axis is kept as a length-1 head against the length-(size minus one) rest, so it broadcasts against the remaining batch axes. A single 1-D input is rejected as single-input similarity needs at least a ``(D, N)`` batch. Element-wise binding broadcasts ------------------------------- Element-wise binders (MAP multiply, BSC XOR, FHRR angle add and subtract) align operands by trailing-axis broadcasting, the same way NumPy and PyTorch do. A ``(D,)`` key binds against every column of a batch in one call: .. code-block:: python enc = pyhdc.MAP_C(dimension=10_000) key = enc.generate() # (D,) batch = enc.generate(size=(10_000, 50)) # (D, N) bound = enc.bind(key, batch) # (10000, 50): key bound to each column Mixed ranks align by padding the lower-rank operand with trailing length-1 axes. A ``(D, N)`` operand binds against a ``(D, N, M)`` tensor by broadcasting over the ``M`` axis: .. code-block:: python keys = enc.generate(size=(10_000, 4)) # (D, N) cube = enc.generate(size=(10_000, 4, 3)) # (D, N, M) bound = enc.bind(keys, cube) # (10000, 4, 3): keys broadcast over axis 2 Not every binder is element-wise. The convolution and correlation binders (the HRR family), shifting and segment-shifting (the sparse families), matrix binding (MBAT), VTB, and context-dependent thinning (BSDC_CDT) cannot broadcast a per-coordinate rule across a batch. Pass a batch anyway and ``bind`` applies the binder per column internally, returning one batched result. See :doc:`bind_unbind` for the per-family binding details. Putting it together ------------------- The four shapes compose. The table below summarizes how each operation moves between ranks: .. list-table:: :header-rows: 1 :widths: 35 30 35 * - Call - Input shape - Result shape * - ``generate(size=(D, N, M))`` - --- - ``(D, N, M)`` * - ``bundle(cube)`` (default axis) - ``(D, N, M)`` - ``(D, N)`` * - ``bundle(cube, axis=1)`` - ``(D, N, M)`` - ``(D, M)`` * - ``bundle(cube, axis=(1, 2))`` - ``(D, N, M)`` - ``(D,)`` * - ``similarity(cube, axis=1)`` - ``(D, N, M)`` - broadcast over the kept axes * - ``bind(key, batch)`` - ``(D,)`` and ``(D, N)`` - ``(D, N)`` * - ``bind(keys, cube)`` - ``(D, N)`` and ``(D, N, M)`` - ``(D, N, M)``