How to Bundle Hypervectors =========================== Bundling combines multiple hypervectors into one that is *similar to all inputs*. Four equivalent ways to bundle are provided. Four bundling methods --------------------- **1. Instance method**: bundle one hypervector with others: .. code-block:: python import pyhdc enc = pyhdc.MAP_C(dimension=10_000) a, b, c = enc.generate(), enc.generate(), enc.generate() result = a.bundle(b, c) # result is similar to a, b, and c **2. Encoding method**: bundle via the encoding object: .. code-block:: python result = enc.bundle(a, b, c) You can also pass a list: .. code-block:: python hvs = [enc.generate() for _ in range(10)] result = enc.bundle(*hvs) **3. Convenience function**: module-level shortcut: .. code-block:: python result = pyhdc.bundle(a, b, c) **4. Batched bundling**: bundle multiple groups at once: .. code-block:: python # Bundle [[a,b], [c,d]] -> returns [bundle(a,b), bundle(c,d)] results = enc.bundle([a, b], [c, d]) # list of two Hypervectors Collapse a whole batch to one prototype ---------------------------------------- A batch of ``N`` hypervectors is a ``(D, N)`` array, where each column is one hypervector. Passing such a batch to ``bundle`` folds over the ``N`` columns and returns a single ``(D,)`` prototype: .. code-block:: python import pyhdc enc = pyhdc.MAP_C(dimension=10_000) batch = enc.generate(size=(10_000, 50)) # 50 vectors as columns (shape: (10000, 50)) # collapse the 50 columns into one prototype result = enc.bundle(batch) print(result.shape) # (10000,) Choose which axis to collapse ------------------------------ A higher-rank batch is a ``(D, N, M)`` tensor, where axis 0 is the dimension ``D`` and each trailing-axis column is one hypervector. The ``axis`` keyword selects which batch axis ``bundle`` folds over. **Default is the last batch axis.** With ``axis=None`` (the default), ``bundle`` reduces the last axis. A ``(D, N)`` batch still collapses to ``(D,)``, and a ``(D, N, M)`` tensor collapses to ``(D, N)``: .. code-block:: python import pyhdc enc = pyhdc.MAP_C(dimension=10_000) tensor = enc.generate(size=(10_000, 8, 5)) # shape: (10000, 8, 5) # default axis=None reduces the last axis (axis 2) result = enc.bundle(tensor) print(result.shape) # (10000, 8) **Pass an explicit axis** to collapse a different batch axis. Axis 0 is the hypervector dimension and is never reducible, passing ``axis=0`` raises ``ValueError``. Negative indices are normalized the usual way: .. code-block:: python # reduce axis 1, keeping axis 2 as the remaining batch result = enc.bundle(tensor, axis=1) print(result.shape) # (10000, 5) **A tuple of axes** folds several batch axes at once. This applies to the additive, element-wise bundlers (the MAP, HRR, FHRR addition variants and the BSDC bitwise-OR disjunction, BSDC_S/SEG/CDT), and it operates on a single batched tensor. BSDC_THIN (thinned OR) reduces a single axis only: .. code-block:: python # reduce axes 1 and 2 together -> one prototype result = enc.bundle(tensor, axis=(1, 2)) print(result.shape) # (10000,) Get per-group results with axis -------------------------------- ``axis`` reduces the selected batch axis in place and returns a **single** ``Hypervector``. To bundle each group of a 3-D batch separately, reduce the *other* batch axis and read the result columns, column ``j`` is group ``j``'s bundle: .. code-block:: python single = enc.bundle(tensor, axis=2) # one Hypervector, shape (10000, 8) groups = enc.bundle(tensor, axis=1) # one Hypervector, shape (10000, 5) # groups[:, j] is the bundle of group j The older ``batch_dim=`` keyword split a 3-D array along a dimension and returned a Python list of hypervectors. It is deprecated as of 2.1.0, emits a ``DeprecationWarning``, and will be removed. ``axis=`` returns the same content as one tensor. Passing both ``axis`` and ``batch_dim`` raises ``ValueError``. Zero vector as the bundle identity ------------------------------------ The zero hypervector acts as the additive identity for bundling; bundling anything with zero leaves it unchanged: .. code-block:: python zero = enc.zeros() result = enc.bundle(a, zero) print(result.similarity(a)) # ~= 1.0 This is useful when building bundles iteratively: .. code-block:: python accumulator = enc.zeros() for hv in hvs: accumulator = enc.bundle(accumulator, hv) Capacity limits ---------------- Bundling is lossy: each bundle adds noise to every component. The more hypervectors you bundle, the harder it is to distinguish individual members via similarity. Approximate rule of thumb: bundling more than :math:`O(N \times ln(M))` vectors into a single hypervector of dimension :math:`D` causes the similarity to each component to drop below a useful threshold. Encoding-specific notes ------------------------ Different encoding families use different bundling implementations, but the interface is always the same: .. list-table:: :header-rows: 1 :widths: 20 80 * - Encoding - Bundling behaviour * - MAP_C - Element-wise addition then clip to [-1,1]; ties broken randomly (``random_choice_range`` parameter) * - MAP_I - Element-wise addition (plain sum), near-zero-sum ties broken randomly (``random_choice_range``) * - MAP_B - Element-wise addition then sign threshold (majority vote) to {-1, +1} * - HRR - Element-wise addition followed by L2 normalisation * - HRR_NoNorm - Element-wise addition without normalisation (vectors grow in magnitude) * - FHRR - Sum phasors, extract resultant angle * - BSC - Majority-vote threshold: each element is 1 if more than half the inputs are 1 * - BSDC family - Bitwise OR. BSDC_THIN applies random thinning after OR to maintain density