Dual Backend Architecture

PyHDC supports two backends for its array operations: NumPy (the default) and PyTorch. Every encoding, operation, and hypervector works with either backend and the API surface is identical.

Design rationale

NumPy is the universal default because it has no install dependencies beyond Python and works on every platform. PyTorch is supported for three reasons:

GPU acceleration: CUDA tensors enable large-scale HDC (50,000+ dimensions, millions of codebook entries) to run in seconds instead of minutes.
Integration with deep learning pipelines: if you are using HDC as a layer in a neural network or alongside PyTorch models, keeping everything in the same tensor ecosystem avoids data copies.
Vectorised batch operations: PyTorch’s broadcasting and GPU-native matmul make batched bundling and similarity faster for large batches; GPU generation of 10,000 × 10,000 codebooks takes under a second versus several minutes on CPU.

The `TORCH_AVAILABLE` flag

At import time, PyHDC tries to import torch. If it succeeds, pyhdc.TORCH_AVAILABLE is True. If it fails, it is False and all backend="torch" requests raise ImportError.

import pyhdc

print(pyhdc.TORCH_AVAILABLE)   # True or False

The `BackendManager`

The internal BackendManager class is a static utility that dispatches array operations to the correct backend. Advanced users who are extending PyHDC (writing custom encodings or components) may use it directly:

BackendManager.to_numpy(array) : convert to NumPy
BackendManager.to_torch(array, device) : convert to PyTorch tensor
BackendManager.get_device(hypervector) : return the device string

Normal users never need to call BackendManager: the encoding and Hypervector methods handle dispatch automatically.

Backend selection

Set the backend at encoding construction time:

enc_np  = pyhdc.MAP_C(dimension=10_000)                           # numpy
enc_cpu = pyhdc.MAP_C(dimension=10_000, backend="torch")          # torch CPU
enc_gpu = pyhdc.MAP_C(dimension=10_000, backend="torch",
                       device="cuda")                              # torch GPU

All operations on hypervectors generated by enc_gpu run on the GPU.

What changes between backends

From the user’s perspective, almost nothing changes. The only observable differences are:

hv.data returns numpy.ndarray (NumPy) or torch.Tensor (PyTorch)
hv.device returns None (NumPy) or a device string (PyTorch)
hv.backend returns "numpy" or "torch"
PyTorch similarity on a GPU tensor returns a torch.Tensor, not a Python float. Wrap with float() when needed.

PyTorch batching

PyTorch enables larger and faster batched operations:

Batched generation

enc   = pyhdc.MAP_C(dimension=10_000, backend="torch", device="cuda")
batch = enc.generate(size=(1000, 10_000))   # shape (1000, 10000) tensor

Batched similarity (2-D)

a = enc.generate(size=500)   # shape (500, 10000)
b = enc.generate(size=500)   # shape (500, 10000)
sims = enc.similarity(a, b)  # shape (500,)  # one score per row pair

Batched bundling

batch = enc.generate(size=(10, 3))   # 10 groups of 3; same dimension,
                                      # shape is (10, 3, 10000)
result = enc.bundle(batch, batch_dim=1)
print(result.shape)   # (10, 10000): 10 bundled results

Memory layout and data movement

PyHDC generators always produce CPU data first (they are Python-level sequences that get converted to arrays). When you create a GPU encoding, the generated floats are first assembled into a NumPy array and then transferred to the GPU. This means generation is not purely on-GPU.

The data-movement methods on Hypervector make transfers explicit:

.to_torch(device=None) : NumPy → PyTorch CPU (or specified device)
.to_numpy() : PyTorch → NumPy
.cuda(device=None) : move to CUDA (shortcut for .to("cuda"))
.cpu() : move to CPU
.to(device) : move to any device string

These all copy the data; the original hypervector is unchanged.

Limitations

No automatic gradient tracking : HDC operations do not participate in PyTorch autograd. Gradients do not flow through bind, bundle, or similarity.
Mixed backends raise ValueError : you cannot mix NumPy and PyTorch hypervectors in the same operation; convert explicitly first.
Generator output is always CPU-first : even for GPU encodings, generation goes through NumPy before being sent to the GPU device.