Tutorial 3: GPU-Accelerated HDC with PyTorch

PyHDC has a PyTorch backend. Every encoding can produce torch.Tensor-backed hypervectors, and all operations (bundle, bind, similarity) work identically on CPU tensors or CUDA tensors. This tutorial covers the backend model, creating and moving GPU hypervectors, batched generation and similarity, and how to measure the speedup.

Prerequisites: Tutorial 1: Encoding Text for Classification (the code will be ported to GPU in this tutorial)

The backend model

A Hypervector wraps either a numpy.ndarray or a torch.Tensor. All HDC operations dispatch to the correct backend automatically, you never call NumPy or PyTorch functions directly. Although, the underlying array is accessible via the .data property, so can be extracted at any time for use with native numpy or torch functions.

import pyhdc

# NumPy backend (default)
enc_np  = pyhdc.MAP_B(dimension=10_000)
hv_np   = enc_np.generate()
print(type(hv_np.data))   # <class 'numpy.ndarray'>

if pyhdc.TORCH_AVAILABLE:
    import torch

    # PyTorch CPU backend
    enc_cpu = pyhdc.MAP_B(dimension=10_000, backend="torch")
    hv_cpu  = enc_cpu.generate()
    print(type(hv_cpu.data))   # <class 'torch.Tensor'>
    print(hv_cpu.device)        # cpu

    # PyTorch GPU backend
    if torch.cuda.is_available():
        enc_gpu = pyhdc.MAP_B(dimension=10_000, backend="torch", device="cuda")
        hv_gpu  = enc_gpu.generate()
        print(hv_gpu.device)    # cuda:0

Creating a GPU encoding (with CPU fallback)

It is good practice to fall back to CPU if CUDA is not available:

import pyhdc
import torch

def make_enc(dimension=10_000):
    if pyhdc.TORCH_AVAILABLE and torch.cuda.is_available():
        device = "cuda"
    elif pyhdc.TORCH_AVAILABLE:
        device = "cpu"
    else:
        return pyhdc.MAP_B(dimension=dimension)   # NumPy
    return pyhdc.MAP_B(dimension=dimension, backend="torch", device=device)

enc = make_enc()
print(enc.backend, enc.device)

Moving hypervectors between backends and devices

You can convert an existing hypervector without recreating the encoding:

hv_numpy = pyhdc.MAP_B(dimension=10_000).generate()

# NumPy -> PyTorch CPU
hv_cpu = hv_numpy.to_torch()

# CPU -> GPU
hv_gpu = hv_cpu.cuda()         # or .to("cuda") or .to("cuda:0")

# GPU -> CPU
hv_back_cpu = hv_gpu.cpu()

# PyTorch -> NumPy
hv_back_np = hv_back_cpu.to_numpy()

These conversions copy the data, so the original hypervector is unchanged.

Batched generation

Instead of generating one hypervector at a time, generate a whole batch at once. On GPU this is substantially faster because the operation is fully vectorised across the entire batch.

enc = make_enc()

# Generate 10,000 hypervectors of dimension 10,000 in one call.
# Hypervectors are dimension-first: each column is one hypervector.
batch = enc.generate(size=(10_000, 10_000))
print(batch.shape)    # (10000, 10000)  # (D, N)
print(batch.backend)  # torch
print(batch.device)   # cuda:0  (if CUDA available)

# Index a column to get a single hypervector
hv0 = batch[:, 0]
print(hv0.shape)      # (10000,)

Batched similarity: three calling conventions

The Encoding.similarity() and Hypervector.similarity() methods support three calling conventions. Hypervectors are dimension-first, so a batch is (D, N) and comparisons run column-wise over axis 0.

1. Two 1-D hypervectors -> scalar

a = enc.generate()   # shape (10000,)
b = enc.generate()   # shape (10000,)

sim = a.similarity(b)   # float

2. Two 2-D batches -> 1-D array (per-column pairs)

batch_a = enc.generate(size=(10_000, 100))   # shape (10000, 100)  # (D, N)
batch_b = enc.generate(size=(10_000, 100))   # shape (10000, 100)  # (D, N)

sims = enc.similarity(batch_a, batch_b)   # shape (100,)
# sims[i] = similarity(batch_a[:, i], batch_b[:, i])

3. Single 2-D batch -> 1-D array (first column vs. rest)

batch = enc.generate(size=(10_000, 101))   # shape (10000, 101)  # (D, N)

sims = enc.similarity(batch)     # shape (100,)
# sims[i] = similarity(batch[:, 0], batch[:, i+1])

Convention 3 is useful for nearest-neighbour search: put the query in column 0 and the codebook in columns 1+.

query = enc.generate()                       # shape (10000,)
codebook = enc.generate(size=(10_000, 50))   # shape (10000, 50)  # (D, N)

batch = pyhdc.stack([query, codebook])    # shape (10000, 51): query is column 0
sims = enc.similarity(batch)              # shape (50,)
best_idx = sims.argmax().item()           # index of closest match in codebook

Porting Tutorial 1 to GPU

The Tutorial 1 text classifier requires only three changes to run on GPU:

import pyhdc, string, torch

# Change 1: GPU encoding
enc = pyhdc.MAP_B(dimension=10_000, backend="torch",
                  device="cuda" if torch.cuda.is_available() else "cpu")

alphabet = string.ascii_lowercase + string.digits + '_'
# Change 2: codebook generation, same API
char_hv  = {ch: enc.generate() for ch in alphabet}

def encode_word(word, enc, char_hv, n=3):
    word = word.lower().ljust(n, '_')
    trigram_hvs = []
    for i in range(len(word) - n + 1):
        t = word[i:i+n]
        hv = char_hv[t[0]].bind(char_hv[t[1]]).bind(char_hv[t[2]])
        trigram_hvs.append(hv)
    return pyhdc.bundle(*trigram_hvs)

python_keywords = ['false', 'none', 'true', 'and', 'for', 'if', 'import',
                    'class', 'return', 'while', 'yield', 'lambda', 'def']
english_nouns   = ['cat', 'dog', 'house', 'river', 'cloud', 'tree', 'book',
                    'chair', 'stone', 'light', 'water', 'music', 'road']

kw_proto   = pyhdc.bundle(*[encode_word(w, enc, char_hv) for w in python_keywords])
noun_proto = pyhdc.bundle(*[encode_word(w, enc, char_hv) for w in english_nouns])

def classify(word):
    hv = encode_word(word, enc, char_hv)
    # Change 3: .to_numpy() before using sklearn / printing
    kw_sim   = float(hv.similarity(kw_proto))
    noun_sim = float(hv.similarity(noun_proto))
    return 'keyword' if kw_sim > noun_sim else 'noun'

for w in ['import', 'lamp', 'yield', 'stone']:
    print(f"{w:10s} -> {classify(w)}")

The only meaningful change is backend="torch", device="cuda" on the encoding constructor. All operations (.bind(), .bundle(), .similarity()) work identically on GPU.

Benchmarking

GPU becomes worthwhile for large batches and high dimensions. Here is a simple timing comparison:

import time, pyhdc, torch

D = 10_000
N = 50_000

enc_np  = pyhdc.MAP_B(dimension=D)
enc_gpu = pyhdc.MAP_B(dimension=D, backend="torch",
                       device="cuda" if torch.cuda.is_available() else "cpu")

# NumPy baseline
t0 = time.perf_counter()
batch_np = enc_np.generate(size=(D, N))
t1 = time.perf_counter()
print(f"NumPy  generate {D}x{N}: {t1-t0:.3f}s")

# PyTorch (CPU or GPU)
t0 = time.perf_counter()
batch_gpu = enc_gpu.generate(size=(D, N))
if torch.cuda.is_available():
    torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Torch  generate {N}x{D}: {t1-t0:.3f}s")

# Batched similarity
q = enc_np.generate()

t0 = time.perf_counter()
_ = enc_np.similarity(q, batch_np)
t1 = time.perf_counter()
print(f"NumPy  similarity 1x{N}: {t1-t0:.3f}s")

q_gpu = q.to_torch(enc_gpu.device)
t0 = time.perf_counter()
_ = enc_gpu.similarity(q_gpu, batch_gpu)
if torch.cuda.is_available():
    torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Torch  similarity 1x{N}: {t1-t0:.3f}s")

Typical observations:

For small batches (< 1,000 vectors), NumPy and PyTorch CPU are comparable, the GPU may even be slower due to launch overhead.
For large batches (> 10,000 vectors), GPU similarity search is 10-100x faster depending on hardware.

Common pitfalls

Backend mismatch

Mixing a NumPy hypervector with a PyTorch hypervector raises ValueError:

hv_np    = pyhdc.MAP_B(dimension=10_000).generate()
hv_torch = pyhdc.MAP_B(dimension=10_000, backend="torch").generate()

hv_np.similarity(hv_torch)   # ValueError: backend mismatch

Fix: convert one of them first with .to_torch() or .to_numpy().

Extracting scalars for Python arithmetic

Similarity on a GPU tensor returns a tensor, not a Python float. Wrap with float() when you need a Python number:

sim = hv_gpu.similarity(hv_gpu2)   # torch.Tensor, shape ()
if float(sim) > 0.8:               # convert explicitly
    ...

Summary

In this tutorial you:

Created GPU encodings with a CPU fallback guard
Moved hypervectors between backends and devices
Used all three batched similarity calling conventions
Ported Tutorial 1 to GPU with three lines changed
Timed the GPU speedup for large batch operations

What’s next

Tutorial 4: (Sparse) Binary Encodings (BSC and BSDC) : binary and sparse encodings
How to Switch Between Backends and Devices : quick reference for all backend/device conversions
Dual Backend Architecture : in-depth explanation of the dual backend architecture