HDC Theory and Mathematics

The following sections cover the mathematics behind HDC for readers who want to understand why it works, not just how to use the library.

The vector space

A hypervector is an element of a vector space of dimension \(D\), where \(D\) is typically 1,000 to 50,000. PyHDC supports three specific spaces:

\(\mathbb{R}^D\) : real-valued (continuous) vectors (MAP_C, HRR, FHRR, VTB, MBAT)
\(\{0, 1\}^D\) : binary vectors (BSC, BSDC family, MAP_B)
\(\{-1, +1\}^D\) : bipolar vectors (MAP_I, MAP_C after quantisation)

Near-orthogonality and capacity

The central mathematical fact that makes HDC work is near-orthogonality of random vectors in high-dimensional spaces.

Theorem: Let \(\mathbf{u}\) and \(\mathbf{v}\) be drawn uniformly at random from the unit sphere in \(\mathbb{R}^D\). Then:

\[\mathbb{E}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = 0\]

\[\text{Var}\!\left[\cos\theta(\mathbf{u},\mathbf{v})\right] = \frac{1}{D}\]

As \(D \to \infty\), the standard deviation shrinks to zero. At \(D = 10{,}000\), \(\sigma \approx 0.01\), meaning that two random unit vectors have cosine similarity within ±0.03 of zero with 99.7% probability (three sigma).

This near-orthogonality gives a “vocabulary” of essentially distinct representations whose size grows combinatorially with \(D\). The capacity of an item memory (the number of distinct items that can be stored and correctly retrieved) is approximately:

\[C \approx O(N \times ln(M))\]

at a false-positive rate of about 1% (the exact value depends on the encoding family and similarity threshold). At \(D = 10{,}000\), this gives capacity of roughly 2,000 items.

Bundling: algebraic requirements

A bundling operation \(\oplus : \mathbb{V}^n \to \mathbb{V}\) must satisfy:

Similarity to inputs: \(\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{a}) \gg 0\)
Commutativity: \(\mathbf{a} \oplus \mathbf{b} = \mathbf{b} \oplus \mathbf{a}\)
Dissimilarity to non-inputs: \(\text{sim}(\mathbf{a} \oplus \mathbf{b},\, \mathbf{c}) \approx 0\) for random \(\mathbf{c}\) not involved in the bundle

The first property enables membership queries and the third property ensures that the bundle does not falsely match unrelated items.

In practice, bundling is lossy: the similarity to each component decreases as more components are added. For \(n\) equal-weight components bundled with addition, the expected similarity to each component is approximately \(1/\sqrt{n}\) (for normalised vectors).

Binding: algebraic requirements

A binding operation \(\otimes : \mathbb{V}^2 \to \mathbb{V}\) must satisfy:

Dissimilarity to operands: \(\text{sim}(\mathbf{a} \otimes \mathbf{b},\, \mathbf{a}) \approx 0\)
Approximate invertibility: there exists an unbinding operation \(\oslash\) such that \((\mathbf{a} \otimes \mathbf{b}) \oslash \mathbf{b} \approx \mathbf{a}\)
Distributivity over bundling: \((\mathbf{a} \oplus \mathbf{b}) \otimes \mathbf{c} \approx (\mathbf{a} \otimes \mathbf{c}) \oplus (\mathbf{b} \otimes \mathbf{c})\)

The first property ensures that the bound result is “new”, it does not resemble either input directly. The second property is what makes records queryable. Property 3 (approximate) allows factored representations.

Note that property 2 is approximate, in general the recovered vector is similar but not identical to the original. Some kind of cleanup memory such as similarity search against a codebook is required to identify the exact match.

Sequence and structure encoding

Sequences

A sequence \([a, b, c]\) can be encoded by binding each element to a position hypervector and bundling the results:

\[S = (a \otimes p_0) \oplus (b \otimes p_1) \oplus (c \otimes p_2)\]

where \(p_0, p_1, p_2\) are distinct, random “position” hypervectors. Alternatively, the BSDC_S encoding uses a circular shift of one position per element, so \(p_k = \text{shift}^k(\mathbf{1})\).

Records (role-filler bindings)

A record \(\{role_1: value_1, role_2: value_2\}\) is encoded as:

\[R = (role_1 \otimes value_1) \oplus (role_2 \otimes value_2)\]

Querying field \(role_1\) from the record:

\[R \oslash role_1 \approx value_1\]

This approximate result is then looked up in the value codebook by nearest neighbour.

Trees

Trees are encoded recursively by binding each subtree’s representation with its parent node’s label:

\[T = label \otimes (left\text{-}subtree \oplus right\text{-}subtree)\]

Similarity metrics: why they differ

Different encoding families use different similarity metrics because the geometry of their respective spaces is different:

Cosine similarity is appropriate for continuous real-valued vectors (MAP, HRR): it measures the angle between vectors in \(\mathbb{R}^D\).
Hamming distance (normalised and remapped to [-1,1]) is appropriate for dense binary vectors (BSC): it counts mismatching bits.
Overlap (normalised and remapped to [-1,1]) is appropriate for sparse binary vectors (BSDC): it measures the intersection of two sparse sets as a fraction of the minimum density.
Angle distance is appropriate for phase/angle encodings (FHRR): it measures the cosine of element-wise angle differences.

Key references

The following papers are foundational to HDC and underpin the encoding families implemented in PyHDC:

Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3), 623–641. (Foundation of HRR encodings)
Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation, 1(2), 139–159. (Overview of MAP and BSC encodings)
Rachkovskij, D. A., & Kussul, E. M. (2001). Binding and normalization of binary sparse distributed representations by context-dependent thinning. Neural Computation, 13(2), 411–452. (Foundation of BSDC_CDT)
Schlegel, K., Neubert, P., & Protzel, P. (2022). A comparison of vector symbolic architectures. Artificial Intelligence Review, 55(6), 4523–4555. (Comparative survey of VSA families)