Tutorial 3: GPU-Accelerated HDC with PyTorch
PyHDC has a PyTorch backend. Every encoding can produce
torch.Tensor-backed hypervectors, and all operations (bundle, bind,
similarity) work identically on CPU tensors or CUDA tensors. This tutorial
covers the backend model, creating and moving GPU hypervectors, batched
generation and similarity, and how to measure the speedup.
Prerequisites: Tutorial 1: Encoding Text for Classification (the code will be ported to GPU in this tutorial)
The backend model
A Hypervector wraps either a numpy.ndarray or a torch.Tensor.
All HDC operations dispatch to the correct backend automatically, you never call
NumPy or PyTorch functions directly. Although, the underlying array is accessible
via the .data property, so can be extracted at any time for use with native
numpy or torch functions.
import pyhdc
# NumPy backend (default)
enc_np = pyhdc.MAP_B(dimension=10_000)
hv_np = enc_np.generate()
print(type(hv_np.data)) # <class 'numpy.ndarray'>
if pyhdc.TORCH_AVAILABLE:
import torch
# PyTorch CPU backend
enc_cpu = pyhdc.MAP_B(dimension=10_000, backend="torch")
hv_cpu = enc_cpu.generate()
print(type(hv_cpu.data)) # <class 'torch.Tensor'>
print(hv_cpu.device) # cpu
# PyTorch GPU backend
if torch.cuda.is_available():
enc_gpu = pyhdc.MAP_B(dimension=10_000, backend="torch", device="cuda")
hv_gpu = enc_gpu.generate()
print(hv_gpu.device) # cuda:0
Creating a GPU encoding (with CPU fallback)
It is good practice to fall back to CPU if CUDA is not available:
import pyhdc
import torch
def make_enc(dimension=10_000):
if pyhdc.TORCH_AVAILABLE and torch.cuda.is_available():
device = "cuda"
elif pyhdc.TORCH_AVAILABLE:
device = "cpu"
else:
return pyhdc.MAP_B(dimension=dimension) # NumPy
return pyhdc.MAP_B(dimension=dimension, backend="torch", device=device)
enc = make_enc()
print(enc.backend, enc.device)
Moving hypervectors between backends and devices
You can convert an existing hypervector without recreating the encoding:
hv_numpy = pyhdc.MAP_B(dimension=10_000).generate()
# NumPy -> PyTorch CPU
hv_cpu = hv_numpy.to_torch()
# CPU -> GPU
hv_gpu = hv_cpu.cuda() # or .to("cuda") or .to("cuda:0")
# GPU -> CPU
hv_back_cpu = hv_gpu.cpu()
# PyTorch -> NumPy
hv_back_np = hv_back_cpu.to_numpy()
These conversions copy the data, so the original hypervector is unchanged.
Batched generation
Instead of generating one hypervector at a time, generate a whole batch at once. On GPU this is substantially faster because the operation is fully vectorised across the entire batch.
enc = make_enc()
# Generate 10,000 hypervectors of dimension 10,000 in one call.
# Hypervectors are dimension-first: each column is one hypervector.
batch = enc.generate(size=(10_000, 10_000))
print(batch.shape) # (10000, 10000) # (D, N)
print(batch.backend) # torch
print(batch.device) # cuda:0 (if CUDA available)
# Index a column to get a single hypervector
hv0 = batch[:, 0]
print(hv0.shape) # (10000,)
Batched similarity: three calling conventions
The Encoding.similarity() and Hypervector.similarity() methods
support three calling conventions. Hypervectors are dimension-first, so a
batch is (D, N) and comparisons run column-wise over axis 0.
1. Two 1-D hypervectors -> scalar
a = enc.generate() # shape (10000,)
b = enc.generate() # shape (10000,)
sim = a.similarity(b) # float
2. Two 2-D batches -> 1-D array (per-column pairs)
batch_a = enc.generate(size=(10_000, 100)) # shape (10000, 100) # (D, N)
batch_b = enc.generate(size=(10_000, 100)) # shape (10000, 100) # (D, N)
sims = enc.similarity(batch_a, batch_b) # shape (100,)
# sims[i] = similarity(batch_a[:, i], batch_b[:, i])
3. Single 2-D batch -> 1-D array (first column vs. rest)
batch = enc.generate(size=(10_000, 101)) # shape (10000, 101) # (D, N)
sims = enc.similarity(batch) # shape (100,)
# sims[i] = similarity(batch[:, 0], batch[:, i+1])
Convention 3 is useful for nearest-neighbour search: put the query in column 0 and the codebook in columns 1+.
query = enc.generate() # shape (10000,)
codebook = enc.generate(size=(10_000, 50)) # shape (10000, 50) # (D, N)
batch = pyhdc.stack([query, codebook]) # shape (10000, 51): query is column 0
sims = enc.similarity(batch) # shape (50,)
best_idx = sims.argmax().item() # index of closest match in codebook
Porting Tutorial 1 to GPU
The Tutorial 1 text classifier requires only three changes to run on GPU:
import pyhdc, string, torch
# Change 1: GPU encoding
enc = pyhdc.MAP_B(dimension=10_000, backend="torch",
device="cuda" if torch.cuda.is_available() else "cpu")
alphabet = string.ascii_lowercase + string.digits + '_'
# Change 2: codebook generation, same API
char_hv = {ch: enc.generate() for ch in alphabet}
def encode_word(word, enc, char_hv, n=3):
word = word.lower().ljust(n, '_')
trigram_hvs = []
for i in range(len(word) - n + 1):
t = word[i:i+n]
hv = char_hv[t[0]].bind(char_hv[t[1]]).bind(char_hv[t[2]])
trigram_hvs.append(hv)
return pyhdc.bundle(*trigram_hvs)
python_keywords = ['false', 'none', 'true', 'and', 'for', 'if', 'import',
'class', 'return', 'while', 'yield', 'lambda', 'def']
english_nouns = ['cat', 'dog', 'house', 'river', 'cloud', 'tree', 'book',
'chair', 'stone', 'light', 'water', 'music', 'road']
kw_proto = pyhdc.bundle(*[encode_word(w, enc, char_hv) for w in python_keywords])
noun_proto = pyhdc.bundle(*[encode_word(w, enc, char_hv) for w in english_nouns])
def classify(word):
hv = encode_word(word, enc, char_hv)
# Change 3: .to_numpy() before using sklearn / printing
kw_sim = float(hv.similarity(kw_proto))
noun_sim = float(hv.similarity(noun_proto))
return 'keyword' if kw_sim > noun_sim else 'noun'
for w in ['import', 'lamp', 'yield', 'stone']:
print(f"{w:10s} -> {classify(w)}")
The only meaningful change is backend="torch", device="cuda" on the
encoding constructor. All operations (.bind(), .bundle(),
.similarity()) work identically on GPU.
Benchmarking
GPU becomes worthwhile for large batches and high dimensions. Here is a simple timing comparison:
import time, pyhdc, torch
D = 10_000
N = 50_000
enc_np = pyhdc.MAP_B(dimension=D)
enc_gpu = pyhdc.MAP_B(dimension=D, backend="torch",
device="cuda" if torch.cuda.is_available() else "cpu")
# NumPy baseline
t0 = time.perf_counter()
batch_np = enc_np.generate(size=(D, N))
t1 = time.perf_counter()
print(f"NumPy generate {D}x{N}: {t1-t0:.3f}s")
# PyTorch (CPU or GPU)
t0 = time.perf_counter()
batch_gpu = enc_gpu.generate(size=(D, N))
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Torch generate {N}x{D}: {t1-t0:.3f}s")
# Batched similarity
q = enc_np.generate()
t0 = time.perf_counter()
_ = enc_np.similarity(q, batch_np)
t1 = time.perf_counter()
print(f"NumPy similarity 1x{N}: {t1-t0:.3f}s")
q_gpu = q.to_torch(enc_gpu.device)
t0 = time.perf_counter()
_ = enc_gpu.similarity(q_gpu, batch_gpu)
if torch.cuda.is_available():
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"Torch similarity 1x{N}: {t1-t0:.3f}s")
Typical observations:
For small batches (< 1,000 vectors), NumPy and PyTorch CPU are comparable, the GPU may even be slower due to launch overhead.
For large batches (> 10,000 vectors), GPU similarity search is 10-100x faster depending on hardware.
Common pitfalls
Backend mismatch
Mixing a NumPy hypervector with a PyTorch hypervector raises ValueError:
hv_np = pyhdc.MAP_B(dimension=10_000).generate()
hv_torch = pyhdc.MAP_B(dimension=10_000, backend="torch").generate()
hv_np.similarity(hv_torch) # ValueError: backend mismatch
Fix: convert one of them first with .to_torch() or .to_numpy().
Extracting scalars for Python arithmetic
Similarity on a GPU tensor returns a tensor, not a Python float.
Wrap with float() when you need a Python number:
sim = hv_gpu.similarity(hv_gpu2) # torch.Tensor, shape ()
if float(sim) > 0.8: # convert explicitly
...
Summary
In this tutorial you:
Created GPU encodings with a CPU fallback guard
Moved hypervectors between backends and devices
Used all three batched similarity calling conventions
Ported Tutorial 1 to GPU with three lines changed
Timed the GPU speedup for large batch operations
What’s next
Tutorial 4: (Sparse) Binary Encodings (BSC and BSDC) : binary and sparse encodings
How to Switch Between Backends and Devices : quick reference for all backend/device conversions
Dual Backend Architecture : in-depth explanation of the dual backend architecture