HDC Theory and Mathematics =========================== The following sections cover the mathematics behind HDC for readers who want to understand *why* it works, not just *how* to use the library. The vector space ----------------- A hypervector is an element of a vector space of dimension :math:`D`, where :math:`D` is typically 1,000 to 50,000. PyHDC supports three specific spaces: * :math:`\mathbb{R}^D` : real-valued (continuous) vectors (MAP_C, HRR, FHRR, VTB, MBAT) * :math:`\{0, 1\}^D` : binary vectors (BSC, BSDC family, MAP_B) * :math:`\{-1, +1\}^D` : bipolar vectors (MAP_I, MAP_C after quantisation) Near-orthogonality and capacity --------------------------------- The central mathematical fact that makes HDC work is **near-orthogonality** of random vectors in high-dimensional spaces. **Theorem:** Let :math:`\mathbf{u}` and :math:`\mathbf{v}` be drawn uniformly at random from the unit sphere in :math:`\mathbb{R}^D`. Then: .. math:: \mathbb{E}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = 0 .. math:: \text{Var}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = \frac{1}{D} As :math:`D \to \infty`, the standard deviation shrinks to zero. At :math:`D = 10{,}000`, :math:`\sigma \approx 0.01`, meaning that two random unit vectors have cosine similarity within ±0.03 of zero with 99.7% probability (three sigma). This near-orthogonality gives a "vocabulary" of essentially distinct representations whose size grows combinatorially with :math:`D`. The *capacity* of an item memory (the number of distinct items that can be stored and correctly retrieved) is approximately: .. math:: C \approx O(N \times ln(M)) at a false-positive rate of about 1% (the exact value depends on the encoding family and similarity threshold). At :math:`D = 10{,}000`, this gives capacity of roughly 2,000 items. Bundling: algebraic requirements ---------------------------------- A bundling operation :math:`\oplus : \mathbb{V}^n \to \mathbb{V}` must satisfy: 1. **Similarity to inputs**: :math:`\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{a}) \gg 0` 2. **Commutativity**: :math:`\mathbf{a} \oplus \mathbf{b} = \mathbf{b} \oplus \mathbf{a}` 3. **Dissimilarity to non-inputs**: :math:`\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{c}) \approx 0` for random :math:`\mathbf{c}` not involved in the bundle The first property enables membership queries and the third property ensures that the bundle does not falsely match unrelated items. In practice, bundling is *lossy*: the similarity to each component decreases as more components are added. For :math:`n` equal-weight components bundled with addition, the expected similarity to each component is approximately :math:`1/\sqrt{n}` (for normalised vectors). Binding: algebraic requirements --------------------------------- A binding operation :math:`\otimes : \mathbb{V}^2 \to \mathbb{V}` must satisfy: 1. **Dissimilarity to operands**: :math:`\text{sim}(\mathbf{a} \otimes \mathbf{b},\, \mathbf{a}) \approx 0` 2. **Approximate invertibility**: there exists an unbinding operation :math:`\oslash` such that :math:`(\mathbf{a} \otimes \mathbf{b}) \oslash \mathbf{b} \approx \mathbf{a}` 3. **Distributivity over bundling**: :math:`(\mathbf{a} \oplus \mathbf{b}) \otimes \mathbf{c} \approx (\mathbf{a} \otimes \mathbf{c}) \oplus (\mathbf{b} \otimes \mathbf{c})` The first property ensures that the bound result is "new", it does not resemble either input directly. The second property is what makes records queryable. Property 3 (approximate) allows factored representations. Note that property 2 is approximate, in general the recovered vector is similar but not identical to the original. Some kind of cleanup memory such as similarity search against a codebook is required to identify the exact match. Sequence and structure encoding --------------------------------- **Sequences** A sequence :math:`[a, b, c]` can be encoded by binding each element to a position hypervector and bundling the results: .. math:: S = (a \otimes p_0) \oplus (b \otimes p_1) \oplus (c \otimes p_2) where :math:`p_0, p_1, p_2` are distinct, random "position" hypervectors. Alternatively, the BSDC_S encoding uses a circular shift of one position per element, so :math:`p_k = \text{shift}^k(\mathbf{1})`. **Records (role-filler bindings)** A record :math:`\{role_1: value_1, role_2: value_2\}` is encoded as: .. math:: R = (role_1 \otimes value_1) \oplus (role_2 \otimes value_2) Querying field :math:`role_1` from the record: .. math:: R \oslash role_1 \approx value_1 This approximate result is then looked up in the value codebook by nearest neighbour. **Trees** Trees are encoded recursively by binding each subtree's representation with its parent node's label: .. math:: T = label \otimes (left\text{-}subtree \oplus right\text{-}subtree) Similarity metrics: why they differ -------------------------------------- Different encoding families use different similarity metrics because the geometry of their respective spaces is different: * **Cosine similarity** is appropriate for continuous real-valued vectors (MAP, HRR): it measures the angle between vectors in :math:`\mathbb{R}^D`. * **Hamming distance** (normalised and remapped to [-1,1]) is appropriate for dense binary vectors (BSC): it counts mismatching bits. * **Overlap** (normalised and remapped to [-1,1]) is appropriate for sparse binary vectors (BSDC): it measures the intersection of two sparse sets as a fraction of the minimum density. * **Angle distance** is appropriate for phase/angle encodings (FHRR): it measures the cosine of element-wise angle differences. Key references -------------- The following papers are foundational to HDC and underpin the encoding families implemented in PyHDC: * Plate, T. A. (1995). *Holographic reduced representations.* IEEE Transactions on Neural Networks, 6(3), 623–641. *(Foundation of HRR encodings)* * Kanerva, P. (2009). *Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors.* Cognitive Computation, 1(2), 139–159. *(Overview of MAP and BSC encodings)* * Rachkovskij, D. A., & Kussul, E. M. (2001). *Binding and normalization of binary sparse distributed representations by context-dependent thinning.* Neural Computation, 13(2), 411–452. *(Foundation of BSDC_CDT)* * Schlegel, K., Neubert, P., & Protzel, P. (2022). *A comparison of vector symbolic architectures.* Artificial Intelligence Review, 55(6), 4523–4555. *(Comparative survey of VSA families)*