Tutorial 2: Associative Memory with Key-Value Binding
This tutorial builds a hyperdimensional dictionary: a data structure that stores key-value pairs as a single hypervector and retrieves values by querying with a (possibly noisy) key. The tutorial covers the approximate bind-unbind inverse, multi-field prototypes, similarity-based field queries, and the capacity limits that bound how many pairs a single hypervector can hold.
Prerequisites: Tutorial 1: Encoding Text for Classification
The bind-unbind inverse property
Binding key.bind(value) produces a hypervector dissimilar to both
inputs. But if you unbind the result with key, you recover a noisy
approximation of value:
import pyhdc
enc = pyhdc.MAP_C(dimension=10_000)
key = enc.generate()
value = enc.generate()
prototype = key.bind(value)
recovered = prototype.unbind(key)
print(recovered.similarity(value)) # ~= 1.0: value recovered
print(recovered.similarity(key)) # ~= 0.0: not key
The recovery is approximate, recovered is not identical to value,
but it is similar enough that similarity search against a codebook finds the
right answer.
# Codebook of 50 candidate values
candidates = [enc.generate() for _ in range(50)]
candidates[7] = value # value is at index 7
scores = [recovered.similarity(c) for c in candidates]
best = scores.index(max(scores))
print(best) # 7: correct
Building a multi-field prototype
A prototype stores multiple key-value pairs in a single hypervector by bundling the individual bindings:
# Symbolic codebooks
roles = {r: enc.generate() for r in ['colour', 'shape', 'size']}
colours = {c: enc.generate() for c in ['red', 'green', 'blue']}
shapes = {s: enc.generate() for s in ['circle', 'square', 'triangle']}
sizes = {s: enc.generate() for s in ['small', 'medium', 'large']}
# Encode an object: small red circle
obj = pyhdc.bundle(
roles['colour'].bind(colours['red']),
roles['shape'].bind(shapes['circle']),
roles['size'].bind(sizes['small']),
)
The variable obj is a single 10,000-D vector that encodes all three
attributes simultaneously.
Querying a field
To retrieve the value of a specific field, unbind with the role hypervector and then find the nearest candidate in the appropriate codebook:
def query_field(prototype, role_hv, codebook):
"""Return the label whose hypervector best matches the unbound result."""
result = prototype.unbind(role_hv)
return max(codebook, key=lambda label: result.similarity(codebook[label]))
print(query_field(obj, roles['colour'], colours)) # red
print(query_field(obj, roles['shape'], shapes)) # circle
print(query_field(obj, roles['size'], sizes)) # small
Noise tolerance
HDC representations tolerate substantial key corruption. The experiment below corrupts the key before querying to measure how accuracy degrades.
We simulate noise by randomly flipping a fraction of the elements in the key hypervector before unbinding:
import numpy as np
def add_noise(hv, noise_level):
"""Flip noise_level fraction of elements (MAP_B: multiply by -1)."""
arr = hv.data.copy()
n_flip = int(noise_level * len(arr))
idx = np.random.choice(len(arr), size=n_flip, replace=False)
arr[idx] *= -1
return enc.from_array(arr)
rng = np.random.default_rng(42)
for noise in [0.0, 0.05, 0.10, 0.20, 0.30, 0.40]:
noisy_role = add_noise(roles['colour'], noise)
result = obj.unbind(noisy_role)
best = max(colours, key=lambda c: result.similarity(colours[c]))
score = result.similarity(colours['red'])
print(f"noise={noise:.0%} best={best:6s} similarity={score:.3f}")
Expected output:
noise=0% best=red similarity=0.525
noise=5% best=red similarity=0.470
noise=10% best=red similarity=0.427
noise=20% best=red similarity=0.313
noise=30% best=red similarity=0.206
noise=40% best=red similarity=0.100
HDC is remarkably noise-tolerant. Even with 40% of the key corrupted, the correct value still has the highest similarity.
Capacity: how many pairs can a prototype hold?
A bundled prototype acts as a superposition of all its bindings. As you add more bindings, the noise floor rises and eventually correct retrieval fails.
A rough rule of thumb: a hypervector of dimension \(D\) can reliably hold approximately \(O(N \times ln(M))\) distinct bindings before accuracy drops.
D = 10_000
# Generate n random key-value pairs and bundle them into one prototype
def build_prototype(n):
keys = [enc.generate() for _ in range(n)]
values = [enc.generate() for _ in range(n)]
prototype = pyhdc.bundle(*[k.bind(v) for k, v in zip(keys, values)])
return prototype, keys, values
def check_accuracy(prototype, keys, values):
correct = 0
for k, v in zip(keys, values):
recovered = prototype.unbind(k)
# Is v the most similar among all values?
best = max(range(len(values)), key=lambda i: recovered.similarity(values[i]))
if values[best].similarity(v) > 0.9:
correct += 1
return correct / len(keys)
for n in [10, 50, 100, 500, 1000, 2000]:
prototype, keys, values = build_prototype(n)
acc = check_accuracy(prototype, keys, values)
print(f"n={n:5d} accuracy={acc:.0%}")
Expected pattern:
n= 10 accuracy=100%
n= 50 accuracy=100%
n= 100 accuracy=100%
n= 500 accuracy= 73%
n= 1000 accuracy= 25%
n= 2000 accuracy= 5%
Accuracy starts to degrade noticeably around \(n = 500\) and falls below chance around \(n = 750\). Increasing dimension moves this threshold proportionally.
Which encodings support unbinding?
Not all encoding families support the .unbind() operation:
Encoding family |
Unbind support |
Notes |
|---|---|---|
MAP_C, MAP_I, MAP_I_Bits, MAP_B |
Yes |
Element-wise multiplication is self-inverse |
HRR, HRR_NoNorm, HRR_ConstNorm |
Yes |
Circular correlation inverts circular convolution |
FHRR |
Yes |
Angle subtraction inverts angle addition |
VTB |
Yes |
Transpose of the transformation matrix |
MBAT |
Yes (with metadata) |
Requires the matrices stored in |
BSC |
Yes |
XOR is self-inverse: |
BSDC_CDT |
No |
Context-dependent thinning is not invertible |
BSDC_S, BSDC_SEG, BSDC_THIN |
Yes |
Circular shift is inverted by shifting in the opposite direction |
Summary
In this tutorial you:
Verified the approximate bind-unbind inverse property numerically
Constructed a multi-field prototype from role-value bindings
Queried specific fields via unbind + codebook similarity search
Measured noise tolerance (correct retrieval up to ≈ 40% key corruption)
Explored capacity limits as a function of dimension
What’s next
Tutorial 3: GPU-Accelerated HDC with PyTorch : run the same ideas on GPU
How to Bind and Unbind Key-Value Pairs : quick reference for all bind/unbind patterns
Binding Operations : deep dive on all binding implementations