Tutorial 2: Associative Memory with Key-Value Binding

This tutorial builds a hyperdimensional dictionary: a data structure that stores key-value pairs as a single hypervector and retrieves values by querying with a (possibly noisy) key. The tutorial covers the approximate bind-unbind inverse, multi-field prototypes, similarity-based field queries, and the capacity limits that bound how many pairs a single hypervector can hold.

Prerequisites: Tutorial 1: Encoding Text for Classification


The bind-unbind inverse property

Binding key.bind(value) produces a hypervector dissimilar to both inputs. But if you unbind the result with key, you recover a noisy approximation of value:

import pyhdc

enc   = pyhdc.MAP_C(dimension=10_000)
key   = enc.generate()
value = enc.generate()

prototype    = key.bind(value)
recovered = prototype.unbind(key)

print(recovered.similarity(value))   # ~= 1.0: value recovered
print(recovered.similarity(key))     # ~= 0.0: not key

The recovery is approximate, recovered is not identical to value, but it is similar enough that similarity search against a codebook finds the right answer.

# Codebook of 50 candidate values
candidates = [enc.generate() for _ in range(50)]
candidates[7] = value   # value is at index 7

scores = [recovered.similarity(c) for c in candidates]
best   = scores.index(max(scores))
print(best)   # 7: correct

Building a multi-field prototype

A prototype stores multiple key-value pairs in a single hypervector by bundling the individual bindings:

# Symbolic codebooks
roles   = {r: enc.generate() for r in ['colour', 'shape', 'size']}
colours = {c: enc.generate() for c in ['red', 'green', 'blue']}
shapes  = {s: enc.generate() for s in ['circle', 'square', 'triangle']}
sizes   = {s: enc.generate() for s in ['small', 'medium', 'large']}

# Encode an object: small red circle
obj = pyhdc.bundle(
    roles['colour'].bind(colours['red']),
    roles['shape'].bind(shapes['circle']),
    roles['size'].bind(sizes['small']),
)

The variable obj is a single 10,000-D vector that encodes all three attributes simultaneously.


Querying a field

To retrieve the value of a specific field, unbind with the role hypervector and then find the nearest candidate in the appropriate codebook:

def query_field(prototype, role_hv, codebook):
    """Return the label whose hypervector best matches the unbound result."""
    result = prototype.unbind(role_hv)
    return max(codebook, key=lambda label: result.similarity(codebook[label]))

print(query_field(obj, roles['colour'], colours))   # red
print(query_field(obj, roles['shape'],  shapes))    # circle
print(query_field(obj, roles['size'],   sizes))     # small

Noise tolerance

HDC representations tolerate substantial key corruption. The experiment below corrupts the key before querying to measure how accuracy degrades.

We simulate noise by randomly flipping a fraction of the elements in the key hypervector before unbinding:

import numpy as np

def add_noise(hv, noise_level):
    """Flip noise_level fraction of elements (MAP_B: multiply by -1)."""
    arr   = hv.data.copy()
    n_flip = int(noise_level * len(arr))
    idx    = np.random.choice(len(arr), size=n_flip, replace=False)
    arr[idx] *= -1
    return enc.from_array(arr)

rng = np.random.default_rng(42)

for noise in [0.0, 0.05, 0.10, 0.20, 0.30, 0.40]:
    noisy_role = add_noise(roles['colour'], noise)
    result     = obj.unbind(noisy_role)
    best       = max(colours, key=lambda c: result.similarity(colours[c]))
    score      = result.similarity(colours['red'])
    print(f"noise={noise:.0%}  best={best:6s}  similarity={score:.3f}")

Expected output:

noise=0%   best=red     similarity=0.525
noise=5%   best=red     similarity=0.470
noise=10%  best=red     similarity=0.427
noise=20%  best=red     similarity=0.313
noise=30%  best=red     similarity=0.206
noise=40%  best=red     similarity=0.100

HDC is remarkably noise-tolerant. Even with 40% of the key corrupted, the correct value still has the highest similarity.


Capacity: how many pairs can a prototype hold?

A bundled prototype acts as a superposition of all its bindings. As you add more bindings, the noise floor rises and eventually correct retrieval fails.

A rough rule of thumb: a hypervector of dimension \(D\) can reliably hold approximately \(O(N \times ln(M))\) distinct bindings before accuracy drops.

D = 10_000

# Generate n random key-value pairs and bundle them into one prototype
def build_prototype(n):
    keys   = [enc.generate() for _ in range(n)]
    values = [enc.generate() for _ in range(n)]
    prototype = pyhdc.bundle(*[k.bind(v) for k, v in zip(keys, values)])
    return prototype, keys, values

def check_accuracy(prototype, keys, values):
    correct = 0
    for k, v in zip(keys, values):
        recovered = prototype.unbind(k)
        # Is v the most similar among all values?
        best = max(range(len(values)), key=lambda i: recovered.similarity(values[i]))
        if values[best].similarity(v) > 0.9:
            correct += 1
    return correct / len(keys)

for n in [10, 50, 100, 500, 1000, 2000]:
    prototype, keys, values = build_prototype(n)
    acc = check_accuracy(prototype, keys, values)
    print(f"n={n:5d}  accuracy={acc:.0%}")

Expected pattern:

n=   10  accuracy=100%
n=   50  accuracy=100%
n=  100  accuracy=100%
n=  500  accuracy= 73%
n= 1000  accuracy= 25%
n= 2000  accuracy=  5%

Accuracy starts to degrade noticeably around \(n = 500\) and falls below chance around \(n = 750\). Increasing dimension moves this threshold proportionally.


Which encodings support unbinding?

Not all encoding families support the .unbind() operation:

Encoding family

Unbind support

Notes

MAP_C, MAP_I, MAP_I_Bits, MAP_B

Yes

Element-wise multiplication is self-inverse

HRR, HRR_NoNorm, HRR_ConstNorm

Yes

Circular correlation inverts circular convolution

FHRR

Yes

Angle subtraction inverts angle addition

VTB

Yes

Transpose of the transformation matrix

MBAT

Yes (with metadata)

Requires the matrices stored in get_metadata()

BSC

Yes

XOR is self-inverse: bind(bind(a,b), b) == a exactly

BSDC_CDT

No

Context-dependent thinning is not invertible

BSDC_S, BSDC_SEG, BSDC_THIN

Yes

Circular shift is inverted by shifting in the opposite direction


Summary

In this tutorial you:

  • Verified the approximate bind-unbind inverse property numerically

  • Constructed a multi-field prototype from role-value bindings

  • Queried specific fields via unbind + codebook similarity search

  • Measured noise tolerance (correct retrieval up to ≈ 40% key corruption)

  • Explored capacity limits as a function of dimension


What’s next