Tutorial 2: Associative Memory with Key-Value Binding ====================================================== This tutorial builds a **hyperdimensional dictionary**: a data structure that stores key-value pairs as a single hypervector and retrieves values by querying with a (possibly noisy) key. The tutorial covers the approximate bind-unbind inverse, multi-field prototypes, similarity-based field queries, and the capacity limits that bound how many pairs a single hypervector can hold. **Prerequisites**: :doc:`tutorial_1_text_classification` ---- The bind-unbind inverse property ---------------------------------- Binding ``key.bind(value)`` produces a hypervector dissimilar to both inputs. But if you unbind the result with ``key``, you recover a noisy approximation of ``value``: .. code-block:: python import pyhdc enc = pyhdc.MAP_C(dimension=10_000) key = enc.generate() value = enc.generate() prototype = key.bind(value) recovered = prototype.unbind(key) print(recovered.similarity(value)) # ~= 1.0: value recovered print(recovered.similarity(key)) # ~= 0.0: not key The recovery is approximate, ``recovered`` is not identical to ``value``, but it is similar enough that similarity search against a codebook finds the right answer. .. code-block:: python # Codebook of 50 candidate values candidates = [enc.generate() for _ in range(50)] candidates[7] = value # value is at index 7 scores = [recovered.similarity(c) for c in candidates] best = scores.index(max(scores)) print(best) # 7: correct ---- Building a multi-field prototype -------------------------------- A *prototype* stores multiple key-value pairs in a single hypervector by bundling the individual bindings: .. code-block:: python # Symbolic codebooks roles = {r: enc.generate() for r in ['colour', 'shape', 'size']} colours = {c: enc.generate() for c in ['red', 'green', 'blue']} shapes = {s: enc.generate() for s in ['circle', 'square', 'triangle']} sizes = {s: enc.generate() for s in ['small', 'medium', 'large']} # Encode an object: small red circle obj = pyhdc.bundle( roles['colour'].bind(colours['red']), roles['shape'].bind(shapes['circle']), roles['size'].bind(sizes['small']), ) The variable ``obj`` is a single 10,000-D vector that encodes all three attributes simultaneously. ---- Querying a field ----------------- To retrieve the value of a specific field, unbind with the role hypervector and then find the nearest candidate in the appropriate codebook: .. code-block:: python def query_field(prototype, role_hv, codebook): """Return the label whose hypervector best matches the unbound result.""" result = prototype.unbind(role_hv) return max(codebook, key=lambda label: result.similarity(codebook[label])) print(query_field(obj, roles['colour'], colours)) # red print(query_field(obj, roles['shape'], shapes)) # circle print(query_field(obj, roles['size'], sizes)) # small ---- Noise tolerance ---------------- HDC representations tolerate substantial key corruption. The experiment below corrupts the key before querying to measure how accuracy degrades. We simulate noise by randomly flipping a fraction of the elements in the key hypervector before unbinding: .. code-block:: python import numpy as np def add_noise(hv, noise_level): """Flip noise_level fraction of elements (MAP_B: multiply by -1).""" arr = hv.data.copy() n_flip = int(noise_level * len(arr)) idx = np.random.choice(len(arr), size=n_flip, replace=False) arr[idx] *= -1 return enc.from_array(arr) rng = np.random.default_rng(42) for noise in [0.0, 0.05, 0.10, 0.20, 0.30, 0.40]: noisy_role = add_noise(roles['colour'], noise) result = obj.unbind(noisy_role) best = max(colours, key=lambda c: result.similarity(colours[c])) score = result.similarity(colours['red']) print(f"noise={noise:.0%} best={best:6s} similarity={score:.3f}") Expected output: .. code-block:: text noise=0% best=red similarity=0.525 noise=5% best=red similarity=0.470 noise=10% best=red similarity=0.427 noise=20% best=red similarity=0.313 noise=30% best=red similarity=0.206 noise=40% best=red similarity=0.100 HDC is remarkably noise-tolerant. Even with 40% of the key corrupted, the correct value still has the highest similarity. ---- Capacity: how many pairs can a prototype hold? ---------------------------------------------- A bundled prototype acts as a superposition of all its bindings. As you add more bindings, the noise floor rises and eventually correct retrieval fails. A rough rule of thumb: a hypervector of dimension :math:`D` can reliably hold approximately :math:`O(N \times ln(M))` distinct bindings before accuracy drops. .. code-block:: python D = 10_000 # Generate n random key-value pairs and bundle them into one prototype def build_prototype(n): keys = [enc.generate() for _ in range(n)] values = [enc.generate() for _ in range(n)] prototype = pyhdc.bundle(*[k.bind(v) for k, v in zip(keys, values)]) return prototype, keys, values def check_accuracy(prototype, keys, values): correct = 0 for k, v in zip(keys, values): recovered = prototype.unbind(k) # Is v the most similar among all values? best = max(range(len(values)), key=lambda i: recovered.similarity(values[i])) if values[best].similarity(v) > 0.9: correct += 1 return correct / len(keys) for n in [10, 50, 100, 500, 1000, 2000]: prototype, keys, values = build_prototype(n) acc = check_accuracy(prototype, keys, values) print(f"n={n:5d} accuracy={acc:.0%}") Expected pattern: .. code-block:: text n= 10 accuracy=100% n= 50 accuracy=100% n= 100 accuracy=100% n= 500 accuracy= 73% n= 1000 accuracy= 25% n= 2000 accuracy= 5% Accuracy starts to degrade noticeably around :math:`n = 500` and falls below chance around :math:`n = 750`. Increasing dimension moves this threshold proportionally. ---- Which encodings support unbinding? ------------------------------------ Not all encoding families support the ``.unbind()`` operation: .. list-table:: :header-rows: 1 :widths: 25 15 60 * - Encoding family - Unbind support - Notes * - MAP_C, MAP_I, MAP_I_Bits, MAP_B - Yes - Element-wise multiplication is self-inverse * - HRR, HRR_NoNorm, HRR_ConstNorm - Yes - Circular correlation inverts circular convolution * - FHRR - Yes - Angle subtraction inverts angle addition * - VTB - Yes - Transpose of the transformation matrix * - MBAT - Yes (with metadata) - Requires the matrices stored in ``get_metadata()`` * - BSC - Yes - XOR is self-inverse: ``bind(bind(a,b), b) == a`` exactly * - BSDC_CDT - No - Context-dependent thinning is not invertible * - BSDC_S, BSDC_SEG, BSDC_THIN - Yes - Circular shift is inverted by shifting in the opposite direction ---- Summary ------- In this tutorial you: * Verified the approximate bind-unbind inverse property numerically * Constructed a multi-field prototype from role-value bindings * Queried specific fields via unbind + codebook similarity search * Measured noise tolerance (correct retrieval up to ≈ 40% key corruption) * Explored capacity limits as a function of dimension ---- What's next ----------- * :doc:`tutorial_3_pytorch_gpu` : run the same ideas on GPU * :doc:`../how_to/bind_unbind` : quick reference for all bind/unbind patterns * :doc:`../user_manual/binding_operations` : deep dive on all binding implementations