HDC Theory and Mathematics

The following sections cover the mathematics behind HDC for readers who want to understand why it works, not just how to use the library.

The vector space

A hypervector is an element of a vector space of dimension \(D\), where \(D\) is typically 1,000 to 50,000. PyHDC supports three specific spaces:

  • \(\mathbb{R}^D\) : real-valued (continuous) vectors (MAP_C, HRR, FHRR, VTB, MBAT)

  • \(\{0, 1\}^D\) : binary vectors (BSC, BSDC family, MAP_B)

  • \(\{-1, +1\}^D\) : bipolar vectors (MAP_I, MAP_C after quantisation)

Near-orthogonality and capacity

The central mathematical fact that makes HDC work is near-orthogonality of random vectors in high-dimensional spaces.

Theorem: Let \(\mathbf{u}\) and \(\mathbf{v}\) be drawn uniformly at random from the unit sphere in \(\mathbb{R}^D\). Then:

\[\mathbb{E}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = 0\]
\[\text{Var}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = \frac{1}{D}\]

As \(D \to \infty\), the standard deviation shrinks to zero. At \(D = 10{,}000\), \(\sigma \approx 0.01\), meaning that two random unit vectors have cosine similarity within ±0.03 of zero with 99.7% probability (three sigma).

This near-orthogonality gives a “vocabulary” of essentially distinct representations whose size grows combinatorially with \(D\). The capacity of an item memory (the number of distinct items that can be stored and correctly retrieved) is approximately:

\[C \approx O(N \times ln(M))\]

at a false-positive rate of about 1% (the exact value depends on the encoding family and similarity threshold). At \(D = 10{,}000\), this gives capacity of roughly 2,000 items.

Bundling: algebraic requirements

A bundling operation \(\oplus : \mathbb{V}^n \to \mathbb{V}\) must satisfy:

  1. Similarity to inputs: \(\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{a}) \gg 0\)

  2. Commutativity: \(\mathbf{a} \oplus \mathbf{b} = \mathbf{b} \oplus \mathbf{a}\)

  3. Dissimilarity to non-inputs: \(\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{c}) \approx 0\) for random \(\mathbf{c}\) not involved in the bundle

The first property enables membership queries and the third property ensures that the bundle does not falsely match unrelated items.

In practice, bundling is lossy: the similarity to each component decreases as more components are added. For \(n\) equal-weight components bundled with addition, the expected similarity to each component is approximately \(1/\sqrt{n}\) (for normalised vectors).

Binding: algebraic requirements

A binding operation \(\otimes : \mathbb{V}^2 \to \mathbb{V}\) must satisfy:

  1. Dissimilarity to operands: \(\text{sim}(\mathbf{a} \otimes \mathbf{b},\, \mathbf{a}) \approx 0\)

  2. Approximate invertibility: there exists an unbinding operation \(\oslash\) such that \((\mathbf{a} \otimes \mathbf{b}) \oslash \mathbf{b} \approx \mathbf{a}\)

  3. Distributivity over bundling: \((\mathbf{a} \oplus \mathbf{b}) \otimes \mathbf{c} \approx (\mathbf{a} \otimes \mathbf{c}) \oplus (\mathbf{b} \otimes \mathbf{c})\)

The first property ensures that the bound result is “new”, it does not resemble either input directly. The second property is what makes records queryable. Property 3 (approximate) allows factored representations.

Note that property 2 is approximate, in general the recovered vector is similar but not identical to the original. Some kind of cleanup memory such as similarity search against a codebook is required to identify the exact match.

Sequence and structure encoding

Sequences

A sequence \([a, b, c]\) can be encoded by binding each element to a position hypervector and bundling the results:

\[S = (a \otimes p_0) \oplus (b \otimes p_1) \oplus (c \otimes p_2)\]

where \(p_0, p_1, p_2\) are distinct, random “position” hypervectors. Alternatively, the BSDC_S encoding uses a circular shift of one position per element, so \(p_k = \text{shift}^k(\mathbf{1})\).

Records (role-filler bindings)

A record \(\{role_1: value_1, role_2: value_2\}\) is encoded as:

\[R = (role_1 \otimes value_1) \oplus (role_2 \otimes value_2)\]

Querying field \(role_1\) from the record:

\[R \oslash role_1 \approx value_1\]

This approximate result is then looked up in the value codebook by nearest neighbour.

Trees

Trees are encoded recursively by binding each subtree’s representation with its parent node’s label:

\[T = label \otimes (left\text{-}subtree \oplus right\text{-}subtree)\]

Similarity metrics: why they differ

Different encoding families use different similarity metrics because the geometry of their respective spaces is different:

  • Cosine similarity is appropriate for continuous real-valued vectors (MAP, HRR): it measures the angle between vectors in \(\mathbb{R}^D\).

  • Hamming distance (normalised and remapped to [-1,1]) is appropriate for dense binary vectors (BSC): it counts mismatching bits.

  • Overlap (normalised and remapped to [-1,1]) is appropriate for sparse binary vectors (BSDC): it measures the intersection of two sparse sets as a fraction of the minimum density.

  • Angle distance is appropriate for phase/angle encodings (FHRR): it measures the cosine of element-wise angle differences.

Key references

The following papers are foundational to HDC and underpin the encoding families implemented in PyHDC:

  • Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3), 623–641. (Foundation of HRR encodings)

  • Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation, 1(2), 139–159. (Overview of MAP and BSC encodings)

  • Rachkovskij, D. A., & Kussul, E. M. (2001). Binding and normalization of binary sparse distributed representations by context-dependent thinning. Neural Computation, 13(2), 411–452. (Foundation of BSDC_CDT)

  • Schlegel, K., Neubert, P., & Protzel, P. (2022). A comparison of vector symbolic architectures. Artificial Intelligence Review, 55(6), 4523–4555. (Comparative survey of VSA families)