Five-Minute Quickstart

The five steps below cover the core PyHDC workflow: generating hypervectors, applying all three primitive operations, and building a tiny item memory, all in under 30 lines of code.

Step 1: Pick an encoding

An encoding defines how hypervectors are generated and how bundle, bind, and similarity are implemented. PyHDC ships 15 encoding classes; MAP_B is the recommended starting point; it uses bipolar values in {-1, 1} and supports all operations including exact unbinding.

import pyhdc

enc = pyhdc.MAP_B(dimension=10_000)

The only required parameter is dimension, the length of every hypervector produced by this encoding. 10,000 is a common default; lower values (1,000) are faster but noisier, higher values (50,000) are more accurate but use more memory.

Step 2: Generate hypervectors

v = enc.generate()

print(v)            # Hypervector(shape=(10000,), backend=numpy, encoding=MAP_B)
print(v.shape)      # (10000,)
print(v.dtype)      # int8
print(v.backend)    # numpy

Each call to .generate() draws a fresh random hypervector. Two independently generated hypervectors are nearly orthogonal by design.

You can generate a batch of hypervectors in one call. Batches are dimension-first: a batch of N vectors has shape (D, N), one hypervector per column.

batch = enc.generate(size=(10_000, 100))
print(batch.shape)   # (10000, 100)

Step 3: The three operations

Similarity

Returns a scalar in [-1, 1]. Use it to measure how related two hypervectors are (0 ~= unrelated, 1 = identical).

a = enc.generate()
b = enc.generate()
c = a   # same object

print(a.similarity(b))   # ~= 0.0  # unrelated
print(a.similarity(c))   # 1.0     # identical

Bundling

Produces a hypervector that is similar to all inputs. Think of it as a fuzzy set union.

x = enc.generate()
y = enc.generate()
z = enc.generate()

bundle = x.bundle(y, z)

print(bundle.similarity(x))   # ~= 0.6
print(bundle.similarity(y))   # ~= 0.6
print(bundle.similarity(z))   # ~= 0.6

Binding and unbinding

Binding produces a hypervector that is dissimilar to both inputs, but from which either input can be recovered if you have the other (unbinding).

key   = enc.generate()
value = enc.generate()

record = key.bind(value)

print(record.similarity(key))    # ~= 0.0: dissimilar to both
print(record.similarity(value))  # ~= 0.0

recovered = record.unbind(key)
print(recovered.similarity(value))  # ~= 1.0: value recovered

Step 4: Build a tiny item memory

An item memory (or codebook) is a dictionary mapping labels to hypervectors. Here we encode five colours, bundle three of them into a “palette”, and then query which colours are in it.

colour_names = ['red', 'green', 'blue', 'yellow', 'purple']
codebook = {name: enc.generate() for name in colour_names}

# Bundle three colours into a palette
palette = pyhdc.bundle(codebook['red'], codebook['green'], codebook['blue'])

# Query: which colours are in the palette?
for name, hv in codebook.items():
    sim = palette.similarity(hv)
    print(f"{name:8s}: {sim:.3f}")

# Output:
# red     :  0.573
# green   :  0.568
# blue    :  0.561
# yellow  :  0.012   <- not in palette
# purple  : -0.003   <- not in palette

Items in the bundle have noticeably higher similarity than items that were not bundled. This is the fundamental query mechanism of HDC.

Step 5: Switch to PyTorch

The API is identical regardless of backend. Just pass backend="torch" when creating the encoding:

if pyhdc.TORCH_AVAILABLE:
    enc_torch = pyhdc.MAP_B(dimension=10_000, backend="torch")
    v = enc_torch.generate()
    print(v.backend)   # torch

    # GPU: requires CUDA
    enc_gpu = pyhdc.MAP_B(dimension=10_000, backend="torch", device="cuda")

Or set a process-wide default so every new encoding uses it:

pyhdc.prefer_torch()                  # or pyhdc.prefer_cuda()
enc = pyhdc.MAP_B(dimension=10_000)   # inherits the torch backend
pyhdc.prefer_numpy()                  # reset to numpy

You can also move an existing hypervector between backends:

v_numpy = enc.generate()
v_torch = v_numpy.to_torch()
v_back  = v_torch.to_numpy()

Putting it all together

Here is the complete quickstart script as a single block:

import pyhdc

enc = pyhdc.MAP_B(dimension=10_000)

# Three primitives
a, b = enc.generate(), enc.generate()
print(a.similarity(b))              # ~= 0.0
print(a.bundle(b).similarity(a))    # ~= 0.6
record = a.bind(b)
recovered = record.unbind(b)
print(recovered.similarity(a))      # ~= 1.0

# Item memory
colours  = {c: enc.generate() for c in ['red','green','blue','yellow','purple']}
palette  = pyhdc.bundle(colours['red'], colours['green'], colours['blue'])
rankings = sorted(colours, key=lambda c: palette.similarity(colours[c]), reverse=True)
print(rankings[:3])   # ['red', 'green', 'blue'] (order may vary)

Tensors and operators

Batches generalize past two axes. Axis 0 is always the dimension D; every trailing axis is batch. A (D, N, M) tensor holds N * M hypervectors, one per trailing-axis slice.

cube = enc.generate(size=(10_000, 8, 4))   # (10000, 8, 4): 32 hypervectors

PyHDC also reads as operators. Each one routes to an HDC method: + bundles, * binds, / unbinds, >> permutes (cyclic shift along axis 0), and ~ inverts where the encoding defines an inverse.

a = enc.generate()
b = enc.generate()

bundled = a + b          # same as a.bundle(b)
bound   = a * b          # same as a.bind(b)
shifted = a >> 1         # same as a.permute(shift=1)
inv     = ~a             # same as a.inverse(); not all encodings define it, MAP-B does

~ raises NotImplementedError on families without an inverse (for example MAP_C, VTB, BSDC_S); MAP_B defines one. See How to Use Operator Syntax for the full operator table and Array Layout: Dimension-First (D, N, M) for the dimension-first axis rules.

Continue here

Tutorials : five end-to-end tutorials, starting with Tutorial 1: Encoding Text for Classification
How to Choose the Right Encoding : how to pick the right encoding for your use case
Encodings Overview : a full comparison of all 15 encoding families