HDC Theory and Mathematics
===========================

The following sections cover the mathematics behind HDC for readers who want
to understand *why* it works, not just *how* to use the library.

The vector space
-----------------

A hypervector is an element of a vector space of dimension :math:`D`, where
:math:`D` is typically 1,000 to 50,000. PyHDC supports three specific spaces:

* :math:`\mathbb{R}^D` : real-valued (continuous) vectors (MAP_C, HRR, FHRR, VTB, MBAT)
* :math:`\{0, 1\}^D` : binary vectors (BSC, BSDC family, MAP_B)
* :math:`\{-1, +1\}^D` : bipolar vectors (MAP_I, MAP_C after quantisation)

Near-orthogonality and capacity
---------------------------------

The central mathematical fact that makes HDC work is **near-orthogonality**
of random vectors in high-dimensional spaces.

**Theorem:** Let :math:`\mathbf{u}` and :math:`\mathbf{v}` be
drawn uniformly at random from the unit sphere in :math:`\mathbb{R}^D`. Then:

.. math::

   \mathbb{E}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = 0

.. math::

   \text{Var}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = \frac{1}{D}

As :math:`D \to \infty`, the standard deviation shrinks to zero. At
:math:`D = 10{,}000`, :math:`\sigma \approx 0.01`, meaning that two random
unit vectors have cosine similarity within ±0.03 of zero with 99.7%
probability (three sigma).

This near-orthogonality gives a "vocabulary" of essentially distinct
representations whose size grows combinatorially with :math:`D`. The
*capacity* of an item memory (the number of distinct items that can be stored
and correctly retrieved) is approximately:

.. math::

   C \approx O(N \times ln(M))

at a false-positive rate of about 1% (the exact value depends on the encoding
family and similarity threshold).  At :math:`D = 10{,}000`, this gives
capacity of roughly 2,000 items.

Bundling: algebraic requirements
----------------------------------

A bundling operation :math:`\oplus : \mathbb{V}^n \to \mathbb{V}` must
satisfy:

1. **Similarity to inputs**: :math:`\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{a}) \gg 0`
2. **Commutativity**: :math:`\mathbf{a} \oplus \mathbf{b} = \mathbf{b} \oplus \mathbf{a}`
3. **Dissimilarity to non-inputs**: :math:`\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{c}) \approx 0`
   for random :math:`\mathbf{c}` not involved in the bundle

The first property enables membership queries and the third property ensures that the bundle
does not falsely match unrelated items.

In practice, bundling is *lossy*: the similarity to each component decreases
as more components are added. For :math:`n` equal-weight components bundled
with addition, the expected similarity to each component is approximately
:math:`1/\sqrt{n}` (for normalised vectors).

Binding: algebraic requirements
---------------------------------

A binding operation :math:`\otimes : \mathbb{V}^2 \to \mathbb{V}` must satisfy:

1. **Dissimilarity to operands**: :math:`\text{sim}(\mathbf{a} \otimes \mathbf{b},\, \mathbf{a}) \approx 0`
2. **Approximate invertibility**: there exists an unbinding operation
   :math:`\oslash` such that :math:`(\mathbf{a} \otimes \mathbf{b}) \oslash \mathbf{b} \approx \mathbf{a}`
3. **Distributivity over bundling**: :math:`(\mathbf{a} \oplus \mathbf{b}) \otimes \mathbf{c} \approx (\mathbf{a} \otimes \mathbf{c}) \oplus (\mathbf{b} \otimes \mathbf{c})`

The first property ensures that the bound result is "new", it does not resemble
either input directly. The second property is what makes records queryable. Property 3
(approximate) allows factored representations.

Note that property 2 is approximate, in general the recovered vector is
similar but not identical to the original. Some kind of cleanup memory such as
similarity search against a codebook is required to identify the exact match.

Sequence and structure encoding
---------------------------------

**Sequences**

A sequence :math:`[a, b, c]` can be encoded by binding each element to a
position hypervector and bundling the results:

.. math::

   S = (a \otimes p_0) \oplus (b \otimes p_1) \oplus (c \otimes p_2)

where :math:`p_0, p_1, p_2` are distinct, random "position" hypervectors.
Alternatively, the BSDC_S encoding uses a circular shift of one position per
element, so :math:`p_k = \text{shift}^k(\mathbf{1})`.

**Records (role-filler bindings)**

A record :math:`\{role_1: value_1, role_2: value_2\}` is encoded as:

.. math::

   R = (role_1 \otimes value_1) \oplus (role_2 \otimes value_2)

Querying field :math:`role_1` from the record:

.. math::

   R \oslash role_1 \approx value_1

This approximate result is then looked up in the value codebook by nearest
neighbour.

**Trees**

Trees are encoded recursively by binding each subtree's representation with
its parent node's label:

.. math::

   T = label \otimes (left\text{-}subtree \oplus right\text{-}subtree)

Similarity metrics: why they differ
--------------------------------------

Different encoding families use different similarity metrics because the
geometry of their respective spaces is different:

* **Cosine similarity** is appropriate for continuous real-valued vectors
  (MAP, HRR): it measures the angle between vectors in :math:`\mathbb{R}^D`.
* **Hamming distance** (normalised and remapped to [-1,1]) is appropriate for
  dense binary vectors (BSC): it counts mismatching bits.
* **Overlap** (normalised and remapped to [-1,1]) is appropriate for sparse
  binary vectors (BSDC): it measures the intersection of two sparse sets as a
  fraction of the minimum density.
* **Angle distance** is appropriate for phase/angle encodings (FHRR): it
  measures the cosine of element-wise angle differences.

Key references
--------------

The following papers are foundational to HDC and underpin the encoding
families implemented in PyHDC:

* Plate, T. A. (1995). *Holographic reduced representations.*
  IEEE Transactions on Neural Networks, 6(3), 623–641.
  *(Foundation of HRR encodings)*

* Kanerva, P. (2009). *Hyperdimensional computing: An introduction to
  computing in distributed representation with high-dimensional random vectors.*
  Cognitive Computation, 1(2), 139–159.
  *(Overview of MAP and BSC encodings)*

* Rachkovskij, D. A., & Kussul, E. M. (2001). *Binding and normalization
  of binary sparse distributed representations by context-dependent thinning.*
  Neural Computation, 13(2), 411–452.
  *(Foundation of BSDC_CDT)*

* Schlegel, K., Neubert, P., & Protzel, P. (2022). *A comparison of vector
  symbolic architectures.*
  Artificial Intelligence Review, 55(6), 4523–4555.
  *(Comparative survey of VSA families)*