Five-Minute Quickstart
The five steps below cover the core PyHDC workflow: generating hypervectors, applying all three primitive operations, and building a tiny item memory, all in under 30 lines of code.
Step 1: Pick an encoding
An encoding defines how hypervectors are generated and how bundle, bind, and
similarity are implemented. PyHDC ships 15 encoding classes; MAP_B is the
recommended starting point; it uses bipolar values in {-1, 1} and supports all
operations including exact unbinding.
import pyhdc
enc = pyhdc.MAP_B(dimension=10_000)
The only required parameter is dimension, the length of every hypervector
produced by this encoding. 10,000 is a common default; lower values (1,000)
are faster but noisier, higher values (50,000) are more accurate but use more
memory.
Step 2: Generate hypervectors
v = enc.generate()
print(v) # Hypervector(shape=(10000,), backend=numpy, encoding=MAP_B)
print(v.shape) # (10000,)
print(v.dtype) # int8
print(v.backend) # numpy
Each call to .generate() draws a fresh random hypervector. Two
independently generated hypervectors are nearly orthogonal by design.
You can generate a batch of hypervectors in one call. Batches are
dimension-first: a batch of N vectors has shape (D, N), one hypervector
per column.
batch = enc.generate(size=(10_000, 100))
print(batch.shape) # (10000, 100)
Step 3: The three operations
Similarity
Returns a scalar in [-1, 1]. Use it to measure how related two hypervectors are (0 ~= unrelated, 1 = identical).
a = enc.generate()
b = enc.generate()
c = a # same object
print(a.similarity(b)) # ~= 0.0 # unrelated
print(a.similarity(c)) # 1.0 # identical
Bundling
Produces a hypervector that is similar to all inputs. Think of it as a fuzzy set union.
x = enc.generate()
y = enc.generate()
z = enc.generate()
bundle = x.bundle(y, z)
print(bundle.similarity(x)) # ~= 0.6
print(bundle.similarity(y)) # ~= 0.6
print(bundle.similarity(z)) # ~= 0.6
Binding and unbinding
Binding produces a hypervector that is dissimilar to both inputs, but from which either input can be recovered if you have the other (unbinding).
key = enc.generate()
value = enc.generate()
record = key.bind(value)
print(record.similarity(key)) # ~= 0.0: dissimilar to both
print(record.similarity(value)) # ~= 0.0
recovered = record.unbind(key)
print(recovered.similarity(value)) # ~= 1.0: value recovered
Step 4: Build a tiny item memory
An item memory (or codebook) is a dictionary mapping labels to hypervectors. Here we encode five colours, bundle three of them into a “palette”, and then query which colours are in it.
colour_names = ['red', 'green', 'blue', 'yellow', 'purple']
codebook = {name: enc.generate() for name in colour_names}
# Bundle three colours into a palette
palette = pyhdc.bundle(codebook['red'], codebook['green'], codebook['blue'])
# Query: which colours are in the palette?
for name, hv in codebook.items():
sim = palette.similarity(hv)
print(f"{name:8s}: {sim:.3f}")
# Output:
# red : 0.573
# green : 0.568
# blue : 0.561
# yellow : 0.012 <- not in palette
# purple : -0.003 <- not in palette
Items in the bundle have noticeably higher similarity than items that were not bundled. This is the fundamental query mechanism of HDC.
Step 5: Switch to PyTorch
The API is identical regardless of backend. Just pass backend="torch"
when creating the encoding:
if pyhdc.TORCH_AVAILABLE:
enc_torch = pyhdc.MAP_B(dimension=10_000, backend="torch")
v = enc_torch.generate()
print(v.backend) # torch
# GPU: requires CUDA
enc_gpu = pyhdc.MAP_B(dimension=10_000, backend="torch", device="cuda")
Or set a process-wide default so every new encoding uses it:
pyhdc.prefer_torch() # or pyhdc.prefer_cuda()
enc = pyhdc.MAP_B(dimension=10_000) # inherits the torch backend
pyhdc.prefer_numpy() # reset to numpy
You can also move an existing hypervector between backends:
v_numpy = enc.generate()
v_torch = v_numpy.to_torch()
v_back = v_torch.to_numpy()
Putting it all together
Here is the complete quickstart script as a single block:
import pyhdc
enc = pyhdc.MAP_B(dimension=10_000)
# Three primitives
a, b = enc.generate(), enc.generate()
print(a.similarity(b)) # ~= 0.0
print(a.bundle(b).similarity(a)) # ~= 0.6
record = a.bind(b)
recovered = record.unbind(b)
print(recovered.similarity(a)) # ~= 1.0
# Item memory
colours = {c: enc.generate() for c in ['red','green','blue','yellow','purple']}
palette = pyhdc.bundle(colours['red'], colours['green'], colours['blue'])
rankings = sorted(colours, key=lambda c: palette.similarity(colours[c]), reverse=True)
print(rankings[:3]) # ['red', 'green', 'blue'] (order may vary)
Tensors and operators
Batches generalize past two axes. Axis 0 is always the dimension D;
every trailing axis is batch. A (D, N, M) tensor holds N * M
hypervectors, one per trailing-axis slice.
cube = enc.generate(size=(10_000, 8, 4)) # (10000, 8, 4): 32 hypervectors
PyHDC also reads as operators. Each one routes to an HDC method:
+ bundles, * binds, / unbinds, >> permutes (cyclic shift along
axis 0), and ~ inverts where the encoding defines an inverse.
a = enc.generate()
b = enc.generate()
bundled = a + b # same as a.bundle(b)
bound = a * b # same as a.bind(b)
shifted = a >> 1 # same as a.permute(shift=1)
inv = ~a # same as a.inverse(); not all encodings define it, MAP-B does
~ raises NotImplementedError on families without an inverse (for
example MAP_C, VTB, BSDC_S); MAP_B defines one. See
How to Use Operator Syntax for the full operator table and
Array Layout: Dimension-First (D, N, M) for the dimension-first axis rules.
Continue here
Tutorials : five end-to-end tutorials, starting with Tutorial 1: Encoding Text for Classification
How to Choose the Right Encoding : how to pick the right encoding for your use case
Encodings Overview : a full comparison of all 15 encoding families