Tutorial 2: Associative Memory with Key-Value Binding

This tutorial builds a hyperdimensional dictionary: a data structure that stores key-value pairs as a single hypervector and retrieves values by querying with a (possibly noisy) key. The tutorial covers the approximate bind-unbind inverse, multi-field prototypes, similarity-based field queries, and the capacity limits that bound how many pairs a single hypervector can hold.

Prerequisites: Tutorial 1: Encoding Text for Classification

The bind-unbind inverse property

Binding key.bind(value) produces a hypervector dissimilar to both inputs. But if you unbind the result with key, you recover a noisy approximation of value:

import pyhdc

enc   = pyhdc.MAP_C(dimension=10_000)
key   = enc.generate()
value = enc.generate()

prototype    = key.bind(value)
recovered = prototype.unbind(key)

print(recovered.similarity(value))   # ~= 1.0: value recovered
print(recovered.similarity(key))     # ~= 0.0: not key

The recovery is approximate, recovered is not identical to value, but it is similar enough that similarity search against a codebook finds the right answer.

# Codebook of 50 candidate values
candidates = [enc.generate() for _ in range(50)]
candidates[7] = value   # value is at index 7

scores = [recovered.similarity(c) for c in candidates]
best   = scores.index(max(scores))
print(best)   # 7: correct

Building a multi-field prototype

A prototype stores multiple key-value pairs in a single hypervector by bundling the individual bindings:

# Symbolic codebooks
roles   = {r: enc.generate() for r in ['colour', 'shape', 'size']}
colours = {c: enc.generate() for c in ['red', 'green', 'blue']}
shapes  = {s: enc.generate() for s in ['circle', 'square', 'triangle']}
sizes   = {s: enc.generate() for s in ['small', 'medium', 'large']}

# Encode an object: small red circle
obj = pyhdc.bundle(
    roles['colour'].bind(colours['red']),
    roles['shape'].bind(shapes['circle']),
    roles['size'].bind(sizes['small']),
)

The variable obj is a single 10,000-D vector that encodes all three attributes simultaneously.

Querying a field

To retrieve the value of a specific field, unbind with the role hypervector and then find the nearest candidate in the appropriate codebook:

def query_field(prototype, role_hv, codebook):
    """Return the label whose hypervector best matches the unbound result."""
    result = prototype.unbind(role_hv)
    return max(codebook, key=lambda label: result.similarity(codebook[label]))

print(query_field(obj, roles['colour'], colours))   # red
print(query_field(obj, roles['shape'],  shapes))    # circle
print(query_field(obj, roles['size'],   sizes))     # small

Noise tolerance

HDC representations tolerate substantial key corruption. The experiment below corrupts the key before querying to measure how accuracy degrades.

We simulate noise by randomly flipping a fraction of the elements in the key hypervector before unbinding:

import numpy as np

def add_noise(hv, noise_level):
    """Flip noise_level fraction of elements (MAP_B: multiply by -1)."""
    arr   = hv.data.copy()
    n_flip = int(noise_level * len(arr))
    idx    = np.random.choice(len(arr), size=n_flip, replace=False)
    arr[idx] *= -1
    return enc.from_array(arr)

rng = np.random.default_rng(42)

for noise in [0.0, 0.05, 0.10, 0.20, 0.30, 0.40]:
    noisy_role = add_noise(roles['colour'], noise)
    result     = obj.unbind(noisy_role)
    best       = max(colours, key=lambda c: result.similarity(colours[c]))
    score      = result.similarity(colours['red'])
    print(f"noise={noise:.0%}  best={best:6s}  similarity={score:.3f}")

Expected output:

noise=0%   best=red     similarity=0.525
noise=5%   best=red     similarity=0.470
noise=10%  best=red     similarity=0.427
noise=20%  best=red     similarity=0.313
noise=30%  best=red     similarity=0.206
noise=40%  best=red     similarity=0.100

HDC is remarkably noise-tolerant. Even with 40% of the key corrupted, the correct value still has the highest similarity.

Capacity: how many pairs can a prototype hold?

A bundled prototype acts as a superposition of all its bindings. As you add more bindings, the noise floor rises and eventually correct retrieval fails.

A rough rule of thumb: a hypervector of dimension \(D\) can reliably hold approximately \(O(N \times ln(M))\) distinct bindings before accuracy drops.

D = 10_000

# Generate n random key-value pairs and bundle them into one prototype
def build_prototype(n):
    keys   = [enc.generate() for _ in range(n)]
    values = [enc.generate() for _ in range(n)]
    prototype = pyhdc.bundle(*[k.bind(v) for k, v in zip(keys, values)])
    return prototype, keys, values

def check_accuracy(prototype, keys, values):
    correct = 0
    for k, v in zip(keys, values):
        recovered = prototype.unbind(k)
        # Is v the most similar among all values?
        best = max(range(len(values)), key=lambda i: recovered.similarity(values[i]))
        if values[best].similarity(v) > 0.9:
            correct += 1
    return correct / len(keys)

for n in [10, 50, 100, 500, 1000, 2000]:
    prototype, keys, values = build_prototype(n)
    acc = check_accuracy(prototype, keys, values)
    print(f"n={n:5d}  accuracy={acc:.0%}")

Expected pattern:

n=   10  accuracy=100%
n=   50  accuracy=100%
n=  100  accuracy=100%
n=  500  accuracy= 73%
n= 1000  accuracy= 25%
n= 2000  accuracy=  5%

Accuracy starts to degrade noticeably around \(n = 500\) and falls below chance around \(n = 750\). Increasing dimension moves this threshold proportionally.

Which encodings support unbinding?

Not all encoding families support the .unbind() operation:

Encoding family	Unbind support	Notes
MAP_C, MAP_I, MAP_I_Bits, MAP_B	Yes	Element-wise multiplication is self-inverse
HRR, HRR_NoNorm, HRR_ConstNorm	Yes	Circular correlation inverts circular convolution
FHRR	Yes	Angle subtraction inverts angle addition
VTB	Yes	Transpose of the transformation matrix
MBAT	Yes (with metadata)	Requires the matrices stored in `get_metadata()`
BSC	Yes	XOR is self-inverse: `bind(bind(a,b), b) == a` exactly
BSDC_CDT	No	Context-dependent thinning is not invertible
BSDC_S, BSDC_SEG, BSDC_THIN	Yes	Circular shift is inverted by shifting in the opposite direction

Summary

In this tutorial you:

Verified the approximate bind-unbind inverse property numerically
Constructed a multi-field prototype from role-value bindings
Queried specific fields via unbind + codebook similarity search
Measured noise tolerance (correct retrieval up to ≈ 40% key corruption)
Explored capacity limits as a function of dimension

What’s next

Tutorial 3: GPU-Accelerated HDC with PyTorch : run the same ideas on GPU
How to Bind and Unbind Key-Value Pairs : quick reference for all bind/unbind patterns
Binding Operations : deep dive on all binding implementations